Compare commits

..

9 Commits

Author SHA1 Message Date
Max Kotliar
679646a3b3 docs: upd vmestimator docs 2026-06-26 19:01:40 +03:00
Max Kotliar
bb3c038e2f docs: follow-up on 29af0809; fix vmestimator image oversize 2026-06-26 18:48:46 +03:00
Max Kotliar
8f32b6648f docs: follow-up on 29af0809
Attempt to fix image oversize in vmestimator doc
2026-06-26 18:36:12 +03:00
Max Kotliar
df5f11623f docs: publish vmestimator docs
I tried to adopt publish-docs CI job. But it's not possible to add a
content to docs/victoriametrics. CI job from vmestimator overwrite the
whole content.

That's why I'll be copy pasting doc updates from vmestimator repo to VM
manually.
2026-06-26 18:23:29 +03:00
Roman Khavronenko
6c8a41f5ed docs: add missing mentions of new SA output sum_samples_total (#11178)
Follow-up after
16422b2d14

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-06-26 16:49:43 +02:00
Roman Khavronenko
e749a6ce8d docs: update security recommendations (#11136)
- move SBOM page down as general security recommendations must be
mentioned first
- separate httplib security-related flags to a separate section, so it
can be cross-referenced by products that use this lib
- mention that /metrics might require to be protected

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-06-26 16:49:14 +02:00
Hui Wang
615176ad55 doc: correct sharding configuration guidance for stream aggregation (#11167)
Rewrite the guidance for `-remoteWrite.shardByURL.labels` and
`-remoteWrite.shardByURL.ignoreLabels` from the perspective of the
aggregation config, rather than the other way around, since users
typically define aggregation rules first and then configure sharding.

And the previous statements are wrong.
For example, I have 3 raw series
```
{env=1, pod=1, cluster=1}
{env=1, pod=2, cluster=1}
{env=2, pod=1, cluster=2}
```

If I have `by: [env]`, then `-remoteWrite.shardByURL.labels` cannot be
set to `env,pod`, because that could route `{env=1, pod=1, cluster=1}`
and `{env=1, pod=2, cluster=1}` to different aggregators, causing each
aggregator to produce its own partial result for `env=1`.
In this case, `-remoteWrite.shardByURL.labels` can only use labels that
are a subset of `by` as `env`.

If I have `without: [env, pod]`, then
`-remoteWrite.shardByURL.ignoreLabels` cannot be set to just `env`,
because that would still allow `{env=1, pod=1, cluster=1}` and `{env=1,
pod=2, cluster=1}` to be routed to different aggregators, causing each
aggregator to produce its own partial result for `cluster=1`.
In this case, `-remoteWrite.shardByURL.ignoreLabels` must include all
labels listed in `without`as `env,pod`.
2026-06-26 16:48:43 +02:00
Cuong Le
3aec167f00 docs/victoriametrics: add an AI policy section to the contributing guide (#11170)
This PR adds a short `AI policy` section to the contributing guide.

We've received quite a few raw AI-generated PRs on VictoriaLogs
recently, so we'd like to clarify our view on AI usage. In short:

1. You can use AI.
2. You are responsible for what you submit, AI or not.
3. Raw AI slop will be closed without review.
2026-06-26 16:48:31 +02:00
Roman Khavronenko
6f633e5654 docs: update vmagent's multitenancy doc based on user's feedback (#11173)
* mention actual api path for opentlemetry in HowToPush docs for
consistency with other protocols;
* update vmagent=>vminsert remoteWrite.url path to prometheus-rw, as it
is the only one that is supported in vmagent/vminsert communication;
* add links to diagrams, so users could easier get to the corresponding
docs;
* add some visual notes to increase awarness of missconfigs

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
2026-06-26 16:48:09 +02:00
8 changed files with 667 additions and 93 deletions

View File

@@ -54,7 +54,7 @@ jobs:
restore-keys: go-artifacts-${{ runner.os }}-codeql-analyze-${{ steps.go.outputs.go-version }}-
- name: Initialize CodeQL
uses: github/codeql-action/init@8aad20d150bbac5944a9f9d289da16a4b0d87c1e # v4.36.2
uses: github/codeql-action/init@e46ed2cbd01164d986452f91f178727624ae40d7 # v4.35.3
with:
languages: go

View File

@@ -85,6 +85,21 @@ Pull requests requirements:
See a good example of a [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6487).
## AI policy
You are free to use any AI tools when working on a contribution, on code,
documentation, issues, or anything else. You do not need to disclose whether or
how you used them.
With or without the help of AI, you are responsible for the changes you submit.
Take the effort to understand the code base and every change in your pull request,
and clean up any AI slop before sending it. Do not use AI to automate your
responses to maintainers.
We review contributions on their quality, regardless of how they were produced. A
pull request or issue that looks like unreviewed AI output, with low-quality or
broken changes, may be closed without a detailed review or triage.
## Merging Pull Request
The person who merges the Pull Request is responsible for satisfying the requirements below:

View File

@@ -1712,64 +1712,63 @@ The following versions of VictoriaMetrics receive regular security fixes:
| [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/) | ✅ |
| other releases | ❌ |
### Software Bill of Materials (SBOM)
Every VictoriaMetrics container{{% available_from "v1.137.0" %}} image published to
[Docker Hub](https://hub.docker.com/u/victoriametrics) and [Quay.io](https://quay.io/organization/victoriametrics) include an [SPDX](https://spdx.dev/) SBOM attestation generated automatically by BuildKit during `docker buildx build`.
To inspect the SBOM for an image:
```sh
docker buildx imagetools inspect \
docker.io/victoriametrics/victoria-metrics:latest \
--format "{{ json .SBOM }}"
```
To scan an image using its SBOM attestation with [Trivy](https://github.com/aquasecurity/trivy):
```sh
trivy image --sbom-sources oci \
docker.io/victoriametrics/victoria-metrics:latest
```
### Reporting a Vulnerability
Please report any security issues to <security@victoriametrics.com>
### CVE handling policy
**Source code:** Go dependencies are scanned by [govulncheck](https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck) in CI.
All vulnerabilities must be fixed before the next scheduled release and backported to [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/).
**Docker images:** CVE findings in the [Alpine](https://security.alpinelinux.org/) base image pose minimal risk since VictoriaMetrics binaries are statically compiled with no OS dependencies.
When detected, only the Alpine base tag is updated.
Releases proceed as planned even if upstream fixes are not yet available.
For maximum security, hardened [scratch](https://hub.docker.com/_/scratch)-based images are also provided.
All images are continuously scanned by Docker Hub and verified before release using [grype](https://github.com/anchore/grype).
### General security recommendations:
* All the VictoriaMetrics components must run in protected private networks without direct access from untrusted networks such as Internet.
* All VictoriaMetrics components must run in protected private networks without direct access from untrusted networks such as the Internet.
The exception is [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/) and [vmgateway](https://docs.victoriametrics.com/victoriametrics/vmgateway/),
which are intended for serving public requests and performing authorization with [TLS termination](https://en.wikipedia.org/wiki/TLS_termination_proxy).
* All the requests from untrusted networks to VictoriaMetrics components must go through auth proxy such as [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/)
* All the requests from untrusted networks to VictoriaMetrics components must go through an auth proxy, such as [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/)
or [vmgateway](https://docs.victoriametrics.com/victoriametrics/vmgateway/). The proxy must be set up with proper authentication and authorization.
* Prefer using lists of allowed API endpoints, while disallowing access to other endpoints when configuring [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/)
in front of VictoriaMetrics components.
* Set reasonable [`Strict-Transport-Security`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security) header value to all the components to mitigate [MitM attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack), for example: `max-age=31536000; includeSubDomains`. See `-http.header.hsts` flag.
* Set a reasonable [`Strict-Transport-Security`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security) header value on all the components to mitigate [MitM attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack), for example: `max-age=31536000; includeSubDomains`. See `-http.header.hsts` flag.
* Set reasonable [`Content-Security-Policy`](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) header value to mitigate [XSS attacks](https://en.wikipedia.org/wiki/Cross-site_scripting). See `-http.header.csp` flag.
* Set reasonable [`X-Frame-Options`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options) header value to mitigate [clickjacking attacks](https://en.wikipedia.org/wiki/Clickjacking), for example `DENY`. See `-http.header.frameOptions` flag.
VictoriaMetrics provides the following security-related command-line flags:
The following security-related command-line flags are available for all components with HTTP API:
* `-tls`, `-tlsCertFile` and `-tlsKeyFile` for switching from HTTP to HTTPS at `-httpListenAddr` (TCP port 8428 is listened by default).
* `-tls`, `-tlsCertFile` and `-tlsKeyFile` for switching from HTTP to HTTPS at `-httpListenAddr`.
[Enterprise version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/enterprise/) supports automatic issuing of TLS certificates.
See [these docs](#automatic-issuing-of-tls-certificates).
* `-mtls` and `-mtlsCAFile` for enabling [mTLS](https://en.wikipedia.org/wiki/Mutual_authentication) for requests to `-httpListenAddr`. See [these docs](#mtls-protection).
* `-httpAuth.username` and `-httpAuth.password` for protecting all the HTTP endpoints
with [HTTP Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication).
* `-http.header.hsts`, `-http.header.csp`, and `-http.header.frameOptions` for serving `Strict-Transport-Security`, `Content-Security-Policy`
and `X-Frame-Options` HTTP response headers.
### Protecting service endpoints
All VictoriaMetrics components expose internal metrics in Prometheus exposition format at the `/metrics` page for [#Monitoring](https://docs.victoriametrics.com/victoriametrics/#monitoring).
Consider limiting access to the `/metrics` page to trusted networks only.
The following service endpoints may require protection:
* `-deleteAuthKey` for protecting the `/api/v1/admin/tsdb/delete_series` endpoint. See [how to delete time series](#how-to-delete-time-series).
* `-snapshotAuthKey` for protecting the `/snapshot*` endpoints. See [how to work with snapshots](#how-to-work-with-snapshots).
* `-forceFlushAuthKey` for protecting the `/internal/force_flush` endpoint. See [force flush docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#forced-flush).
* `-forceMergeAuthKey` for protecting the `/internal/force_merge` endpoint. See [force merge docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#forced-merge).
* `-search.resetCacheAuthKey` for protecting the `/internal/resetRollupResultCache` endpoint. See [backfilling](#backfilling) for more details.
* `-reloadAuthKey` for protecting the `/-/reload` endpoint, which is used for force reloading of [`-promscrape.config`](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* `-reloadAuthKey` for protecting the `/-/reload` endpoint, which is used to force reload the [`-promscrape.config`](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* `-configAuthKey` for protecting the `/config` endpoint, since it may contain sensitive information such as passwords.
* `-flagsAuthKey` for protecting the `/flags` endpoint.
* `-pprofAuthKey` for protecting the `/debug/pprof/*` endpoints, which can be used for [profiling](#profiling).
* `-metricNamesStatsResetAuthKey` for protecting the `/api/v1/admin/status/metric_names_stats/reset` endpoint, used for [Metric Names Tracker](#track-ingested-metrics-usage).
* `-denyQueryTracing` for disallowing [query tracing](#query-tracing).
* `-http.header.hsts`, `-http.header.csp`, and `-http.header.frameOptions` for serving `Strict-Transport-Security`, `Content-Security-Policy`
and `X-Frame-Options` HTTP response headers.
Explicitly set internal network interface for TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats.
For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<internal_iface_ip>:2003`. This protects from unexpected requests from untrusted network interfaces.
@@ -1777,17 +1776,6 @@ For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<i
See also [security recommendation for VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#security)
and [the general security page at VictoriaMetrics website](https://victoriametrics.com/security/).
### CVE handling policy
**Source code:** Go dependencies are scanned by [govulncheck](https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck) in CI.
All vulnerabilities must be fixed before next scheduled release and backported to [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/).
**Docker images:** CVE findings in [Alpine](https://security.alpinelinux.org/) base image pose minimal risk since VictoriaMetrics binaries are statically compiled with no OS dependencies.
When detected, only the Alpine base tag is updated.
Releases proceed as planned even if upstream fixes are not yet available.
For maximum security, hardened [scratch](https://hub.docker.com/_/scratch)-based images are also provided.
All images are continuously scanned by Docker Hub and verified before release using [grype](https://github.com/anchore/grype).
### mTLS protection
By default `VictoriaMetrics` accepts http requests at `8428` port (this port can be changed via `-httpListenAddr` command-line flags).
@@ -1817,19 +1805,39 @@ This functionality can be evaluated for free according to [these docs](https://d
See also [security recommendations](#security).
### Software Bill of Materials (SBOM)
Every VictoriaMetrics container{{% available_from "v1.137.0" %}} image published to
[Docker Hub](https://hub.docker.com/u/victoriametrics) and [Quay.io](https://quay.io/organization/victoriametrics) include an [SPDX](https://spdx.dev/) SBOM attestation generated automatically by BuildKit during `docker buildx build`.
To inspect the SBOM for an image:
```sh
docker buildx imagetools inspect \
docker.io/victoriametrics/victoria-metrics:latest \
--format "{{ json .SBOM }}"
```
To scan an image using its SBOM attestation with [Trivy](https://github.com/aquasecurity/trivy):
```sh
trivy image --sbom-sources oci \
docker.io/victoriametrics/victoria-metrics:latest
```
## Tuning
* No need in tuning for VictoriaMetrics - it uses reasonable defaults for command-line flags,
* No need to tune for VictoriaMetrics - it uses reasonable defaults for command-line flags,
which are automatically adjusted for the available CPU and RAM resources.
* No need in tuning for Operating System - VictoriaMetrics is optimized for default OS settings.
* No need to tune for Operating System - VictoriaMetrics is optimized for default OS settings.
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a).
The recommendation is not specific for VictoriaMetrics only but also for any service which handles many HTTP connections and stores data on disk.
* VictoriaMetrics is a write-heavy application and its performance depends on disk performance. So be careful with other
The recommendation is not specific to VictoriaMetrics only, but also for any service that handles many HTTP connections and stores data on disk.
* VictoriaMetrics is a write-heavy application, and its performance depends on disk performance. So be careful with other
applications or utilities (like [fstrim](https://manpages.ubuntu.com/manpages/lunar/en/man8/fstrim.8.html))
which could [exhaust disk resources](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1521).
* The recommended filesystem is `ext4`, the recommended persistent storage is [persistent HDD-based disk on GCP](https://cloud.google.com/compute/docs/disks/#pdspecs),
since it is protected from hardware failures via internal replication and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
If you plan to store more than 1TB of data on `ext4` partition, then the following options are recommended to pass to `mkfs.ext4`:
since it is protected from hardware failures via internal replication, and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
If you plan to store more than 1TB of data on an `ext4` partition, then the following options are recommended to pass to `mkfs.ext4`:
```sh
mkfs.ext4 ... -O 64bit,huge_file,extent -T huge

View File

@@ -38,8 +38,9 @@ Released at 2026-06-22
* FEATURE: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): add the `last` value to graph legend statistics. See [#10759](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10759).
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): expose `vm_streamaggr_dedup_dropped_samples_total` to allow tracking dropped old samples during [deduplication](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication).
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): use the aggregation rule interval as the default [staleness_interval](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#staleness) instead of `2*interval`, to reduce spikes when there are gaps between received samples. See [#11102](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11102).
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): add new aggregation output [sum_samples_total](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#sum_samples_total) for summing input delta values into a cumulative counter. See issues [#11002](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11002) and [#4843](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4843).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add a new flag `-remoteWrite.inmemoryQueues` to prioritize recently ingested data over historical data stored at file-based [persistent queue](https://docs.victoriametrics.com/victoriametrics/vmagent/#on-disk-persistence-and-data-processing-order). See [#8833](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8833)
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add `-promscrape.cluster.shardByLabels` command-line flag for selecting target labels used for sharding scrape targets among `vmagent` instances in cluster mode. See [#11044](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11044).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add `-promscrape.cluster.shardByLabels` command-line flag for selecting target labels used for sharding scrape targets among `vmagent` instances in cluster mode. See [#11044](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11044).
* FEATURE: [vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/): add `-vm-headers` and `-vm-bearer-token` flags for authenticating requests to the VictoriaMetrics import destination. The flags are available in `opentsdb`, `influx`, `remote-read`, `prometheus`, `mimir`, and `thanos` vmctl sub-commands. See [#8897](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8897).
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): log calls to [/api/v1/admin/tsdb/delete_series](https://docs.victoriametrics.com/victoriametrics/url-examples/#apiv1admintsdbdelete_series) API handler. This should help to identify events of metrics deletion from the database. See [#11104](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11104).
* FEATURE: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): add the `last` value to graph legend statistics. See [#10759](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10759).

View File

@@ -624,13 +624,12 @@ command line flags. See how to [shard data across remote write destinations](htt
The following requirements must be met for sharded aggregation to work correctly:
- All sharding vmagents should have the same deterministic sharding configuration.
- The sharding configuration must align with the `by` and `without` lists:
- Labels listed in `by` setting should be a subset of shard's routing key `-remoteWrite.shardByURL.labels`.
With `-remoteWrite.shardByURL.labels=env,job` aggregator's `by` should include `by: env`, `by: job` or both: `by: [env, job]`.
This makes sure that all the samples for the same `env` and `job` are aggregated together and produce the complete output.
- Labels listed in `without` setting should be a superset of shard's routing key `--remoteWrite.shardByURL.ignoreLabels`.
With `-remoteWrite.shardByURL.ignoreLabels=env,job` aggegator's `without` should include at least both labels `without: [env,job]`.
This makes sure that `requests_total{env=test, job=foo}` and `requests_total{env=prod, job=foo}` are routed to the same aggregator
and are aggregated together. See also [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5938#issuecomment-2018470324).
- Labels configured in `-remoteWrite.shardByURL.labels` must be a subset of the labels listed in `by`.
For example, if the aggregation config specifies `by: [env, job]`, then `-remoteWrite.shardByURL.labels` may include `env`, `job`, or both.
This ensures that all samples contributing to the same aggregation result are routed to the same aggregator instance and aggregated together to produce a complete output.
- Labels configured in `-remoteWrite.shardByURL.ignoreLabels` must be a superset of the labels listed in `without`.
For example, if the aggregation config specifies `without: [env, pod]`, then `-remoteWrite.shardByURL.ignoreLabels` must include at least `env` and `pod`.
This ensures that labels removed during aggregation are not used for shard routing.
- Aggregating vmagents should not produce collisions: the aggregation output should be unique across all the sharded agents.
For example, `requests_total:5m_without_env_pod_total` produced by both `vmagent-aggr-1` and `vmagent-aggr-2` will collide
unless they have labels uniquely identifying them. These labels should be either preserved during sharding and aggregation config,

View File

@@ -222,6 +222,7 @@ Below are aggregation functions that can be put in the `outputs` list at [stream
* [stddev](#stddev)
* [stdvar](#stdvar)
* [sum_samples](#sum_samples)
* [sum_samples_total](#sum_samples_total)
* [total](#total)
* [total_prometheus](#total_prometheus)
* [unique_samples](#unique_samples)
@@ -505,10 +506,11 @@ See also:
- [count_samples](#count_samples)
- [count_series](#count_series)
- [sum_samples_total](#sum_samples_total)
### `sum_samples_total`
`sum_samples_total` sums input delta values into a cumulative [counter](https://docs.victoriametrics.com/victoriametrics/keyconcepts/index.html#counter) and outputs the result at the given `interval`.
`sum_samples_total` {{% available_from "v1.146.0" %}}. sums input delta values into a cumulative [counter](https://docs.victoriametrics.com/victoriametrics/keyconcepts/index.html#counter) and outputs the result at the given `interval`.
`sum_samples_total` makes sense only for aggregating delta values from clients such as [StatsD counter](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#counting).
The results of `sum_samples_total` is roughly equal to the following [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/) query:
@@ -559,6 +561,7 @@ See also:
- [total_prometheus](#total_prometheus)
- [increase](#increase)
- [increase_prometheus](#increase_prometheus)
- [sum_samples_total](#sum_samples_total)
- [rate_sum](#rate_sum)
- [rate_avg](#rate_avg)
@@ -588,6 +591,7 @@ See also:
- [total](#total)
- [increase](#increase)
- [increase_prometheus](#increase_prometheus)
- [sum_samples_total](#sum_samples_total)
- [rate_sum](#rate_sum)
- [rate_avg](#rate_avg)

View File

@@ -309,11 +309,11 @@ Scraping has additional settings that can be applied before samples are pushed t
`vmagent` supports [the same set of push-based data ingestion protocols as VictoriaMetrics does](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-time-series-data)
in addition to the pull-based Prometheus-compatible targets' scraping:
* DataDog "submit metrics" API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/).
* Datadog "submit metrics" API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/datadog/).
* InfluxDB line protocol via `http://<vmagent>:8429/write`. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/).
* Graphite plaintext protocol if the `-graphiteListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#ingesting).
* OpenTelemetry HTTP API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
* NewRelic API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/newrelic/#sending-data-from-agent).
* OpenTelemetry HTTP API via `http://<vmagent>:8429/opentelemetry/v1/metrics`. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/).
* New Relic API. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/newrelic/#sending-data-from-agent).
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/opentsdb/).
* Zabbix Connector streaming protocol. See [these docs](https://docs.victoriametrics.com/victoriametrics/integrations/zabbixconnector/#send-data-from-zabbix-connector).
* Prometheus remote write protocol via `http://<vmagent>:8429/api/v1/write`.
@@ -481,29 +481,38 @@ by specifying `-remoteWrite.forcePromProto` command-line flag for the correspond
## Multitenancy
By default, `vmagent` collects the data without [tenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) identifiers
and routes it to the remote storage specified via `-remoteWrite.url` command-line flag. The `-remoteWrite.url` can point to `/insert/<tenant_id>/prometheus/api/v1/write` path
at `vminsert` according to [these docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format).
and routes it to the remote storage specified via `-remoteWrite.url` command-line flag. Point `-remoteWrite.url` to vminsert's `/insert/<tenant_id>/prometheus/api/v1/write` path
according to [these docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format).
> Note: the single-node version of VictoriaMetrics doesn't support multitenancy.
```mermaid
flowchart LR
A["requests_total{instance=foo}"] --> |/api/v1/write| V[vmagent]
B["requests_total{instance=bar}"] <--> |scrape| V
V --> |"/insert/#60;tenant_id#62;/#60;suffix#62;"| C[vminsert]
A["requests_total{instance=foo}"] --> |<a href="https://docs.victoriametrics.com/victoriametrics/vmagent/#how-to-push-data-to-vmagent">push</a>| V[vmagent]
B["requests_total{instance=bar}"] <--> |<a href="https://docs.victoriametrics.com/victoriametrics/vmagent/#how-to-collect-metrics-in-prometheus-format">pull</a>| V
V --> |"/insert/#60;tenant_id#62;/prometheus/api/v1/write"| C[vminsert]
```
In this case, all the metrics written to `/insert/tenant_id/prometheus/api/v1/write` will belong to the specified `<tenant_id>` tenant.
In this case, all the metrics written to `/insert/<tenant_id>/prometheus/api/v1/write` will belong to the specified `<tenant_id>` tenant.
### Multitenancy via labels
vmagent can write data to multiple distinct tenants if `-remoteWrite.url` points to [multitenant URL at VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels)
and tenant is specified via [multitenancy labels](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels):
vmagent can write data to **multiple distinct tenants** if `-remoteWrite.url` points to the [multitenant URL in the VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels)
and the tenant is specified via [multitenancy labels](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels):
```mermaid
flowchart LR
A["requests_total{instance=foo, vm_account_id=0}"] --> |/api/v1/write| V[vmagent]
B["requests_total{instance=bar, vm_account_id=1}"] <--> |scrape| V
V --> |"/insert/multitenant/#60;suffix#62;"| C[vminsert]
A["requests_total{instance=foo, vm_account_id=0}"] --> |<a href="https://docs.victoriametrics.com/victoriametrics/vmagent/#how-to-push-data-to-vmagent">push</a>| V[vmagent]
B["requests_total{instance=bar, vm_account_id=1}"] <--> |<a href="https://docs.victoriametrics.com/victoriametrics/vmagent/#how-to-collect-metrics-in-prometheus-format">pull</a>| V
V --> |"/insert/multitenant/prometheus/api/v1/write"| C[vminsert]
```
`<tenant_id>` is extracted from the `vm_account_id` and `vm_project_id` labels.
> A single payload pulled from or pushed to vmagent may contain time series belonging to multiple tenants.
When vminsert receives the data on the `/insert/multitenant` path, it extracts `<tenant_id>` from the `vm_account_id` and `vm_project_id` labels for
each distinct time series.
> If `vm_account_id` or `vm_project_id` labels are missing or invalid, then the corresponding accountID and projectID are set to 0.
The `vm_account_id` and `vm_project_id` labels can be specified via [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) before sending the metrics to `-remoteWrite.url`.
For example, the following relabeling rule instructs sending metrics to `<account_id>:0` tenant defined in the `prometheus.io/account_id` annotation of Kubernetes pod deployment:
@@ -516,11 +525,12 @@ scrape_configs:
target_label: vm_account_id
```
vmagent can get tenant identifier from `__tenant_id__` label at target discovery phase.
It implicitly converts `__tenant_id__` label into `vm_account_id` and `vm_project_id` labels and attaches
it to the scraped metrics and metrics metadata.
vmagent can get the tenant identifier from the `__tenant_id__` label during the target discovery phase.
It implicitly converts the `__tenant_id__` label into `vm_account_id` and `vm_project_id` labels and attaches
them to the scraped metrics and metrics metadata.
For example, the following relabeling rule instructs sending metrics to the `10:5` tenant defined in the `prometheus.io/tenant_id: 10:5` annotation of the Kubernetes pod deployment:
For example, the following relabeling rule instructs sending metrics to the `10:5` tenant defined in the `prometheus.io/tenant_id: 10:5`
annotation of the Kubernetes pod deployment:
```yaml
scrape_configs:
@@ -531,47 +541,54 @@ scrape_configs:
target_label: __tenant_id__
```
vmagent can [enforce adding labels](https://docs.victoriametrics.com/victoriametrics/vmagent/#adding-labels-to-metrics) to all scraped
or forwarded metrics.
vmagent can also [enforce adding labels](https://docs.victoriametrics.com/victoriametrics/vmagent/#adding-labels-to-metrics)
on all scraped or forwarded metrics.
### Multitenancy via path
vmagent can write data to multiple distinct tenants if `-remoteWrite.url` points to [multitenant URL at VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels),
tenant is specified in the [write path](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format), and `-enableMultitenantHandlers` command-line flag is set:
vmagent can write data to multiple distinct tenants if:
* its `-remoteWrite.url` points to the [VictoriaMetrics cluster multitenant URL](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels)
* its `-enableMultitenantHandlers` command-line flag is set
* clients ingest data into vmagent with the tenant specified in the [write path](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format)
```mermaid
flowchart LR
A["requests_total{instance=foo}"] --> |/insert/0/#60;suffix#62;| V[vmagent]
B["requests_total{instance=bar}"] --> |/insert/1/#60;suffix#62;| V
V --> |"/insert/multitenant/#60;suffix#62;"| C[vminsert]
A["requests_total{instance=foo}"] --> |<a href="https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format">/insert/0/#60;suffix#62;</a>| V[vmagent]
B["requests_total{instance=bar}"] --> |<a href="https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format">/insert/1/#60;suffix#62;</a>| V
V --> |"/insert/multitenant/prometheus/api/v1/write"| C[vminsert]
```
In this configuration, vmagent accepts writes via the same multitenant endpoints (`/insert/<accountID>/<suffix>`) [as vminsert does](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format).
For all received data, vmagent will automatically convert tenant identifiers from the URL to `vm_account_id` and `vm_project_id` labels and set tenant info in metadata.
For all the received data, vmagent will automatically convert tenant identifiers in the URL path to `vm_account_id` and `vm_project_id` labels, and set tenant information in metadata.
These tenant labels are added before applying [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) specified via `-remoteWrite.relabelConfig`
and `-remoteWrite.urlRelabelConfig` command-line flags.
These tenant labels are added before applying [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/)
specified via `-remoteWrite.relabelConfig` and `-remoteWrite.urlRelabelConfig` command-line flags.
### Multitenancy via headers
vmagent can write data to multiple distinct tenants if `-remoteWrite.url` points to [multitenant URL at VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels),
tenant is specified [via headers](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-headers) {{% available_from "v1.143.0" %}}, both `-enableMultitenantHandlers` and `-enableMultitenancyViaHeaders` command-line flags are set:
vmagent can write data to multiple distinct tenants if:
* its `-remoteWrite.url` points to the [VictoriaMetrics cluster multitenant URL](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-labels)
* its `-enableMultitenantHandlers` and `-enableMultitenancyViaHeaders` command-line flags are both set
* clients ingest data into vmagent with the tenants specified [via headers](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-headers) {{% available_from "v1.143.0" %}}
```mermaid
flowchart LR
A["requests_total{instance=foo}"] --> |/insert/#60;suffix#62; <br>--header AccountID: 0| V[vmagent]
B["requests_total{instance=bar}"] --> |/insert/#60;suffix#62; <br>--header AccountID: 1| V
V --> |"/insert/multitenant/#60;suffix#62;"| C[vminsert]
A["requests_total{instance=foo}"] --> |<a href="https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-headers">/insert/#60;suffix#62;</a> <br>--header AccountID: 0| V[vmagent]
B["requests_total{instance=bar}"] --> |<a href="https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy-via-headers">/insert/#60;suffix#62;</a> <br>--header AccountID: 1| V
V --> |"/insert/multitenant/prometheus/api/v1/write"| C[vminsert]
```
In this configuration, vmagent accepts writes via the same simplified multitenant endpoints (`/insert/<suffix>`) [as vminsert does](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#url-format).
The tenant information is extracted from the `AccountID` and `ProjectID` HTTP headers, which are expected to be included in all incoming requests. If headers are missing, then the tenant is set to `0:0` as the default.
The tenant information is extracted from the `AccountID` and `ProjectID` HTTP headers, which are expected to be included in all incoming requests.
For all received data, vmagent will automatically convert tenant identifiers from the headers to `vm_account_id` and `vm_project_id` labels and set tenant info in metadata.
These tenant labels are added before applying [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) specified via `-remoteWrite.relabelConfig`
> If headers are missing, then the tenant is set to `0:0` by default.
For all the received data, vmagent will automatically convert tenant identifiers in the headers to `vm_account_id` and `vm_project_id` labels, and set tenant info in metadata.
These tenant labels are added before applying [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) specified via the `-remoteWrite.relabelConfig`
and `-remoteWrite.urlRelabelConfig` command-line flags.
vmauth can [enforce adding headers](https://docs.victoriametrics.com/victoriametrics/vmauth/#modifying-http-headers) to all
forwarded requests via `headers` param in the config file.
forwarded requests via the `headers` parameter in the config file.
## Adding labels to metrics

View File

@@ -0,0 +1,530 @@
---
weight: 13
menu:
docs:
parent: victoriametrics
weight: 13
title: vmestimator
tags:
- metrics
- cardinality
aliases:
- /vmestimator.html
- /vmestimator/index.html
- /vmestimator/
---
`vmestimator` measures metrics cardinality across arbitrary label dimensions and exposes the results as metrics.
## Why measure ?
Consider a setup where metrics are scraped from dozens of Prometheus targets.
One day, a team deploys a new version of their service with a `trace_id` or `user_id` label.
Overnight, that job's cardinality explodes from 500 to 500,000 time series.
Suddenly, VictoriaMetrics consumes 100x more memory and disk.
Ingestion slows down, storage struggles to keep up, and in the worst case becomes unavailable.
By the time someone gets paged, the damage is already done: indexes are bloated, caches are oversized, and observability across the entire system is affected.
`vmestimator` continuously tracks cardinality and exposes the estimation results as [metrics](https://github.com/VictoriaMetrics/vmestimator/blob/main/README.md#cardinality-metrics).
This allows alerting on cardinality spikes within minutes and identifying the offending job directly from the alert.
Instead of discovering the problem after it impacts the infrastructure, it becomes possible to react before it turns into an outage.
Per-job cardinality tracking is the most actionable use case, but its not the only one (see [use cases](https://github.com/VictoriaMetrics/vmestimator/#use-cases)).
`vmestimator` can measure cardinality across arbitrary label dimensions,
enabling use cases such as per-tenant usage analysis, long-term trend tracking, and capacity planning.
## Design
We recommend deploying `vmestimator` close to the metrics source, ideally alongside `vmagent` instances that scrape targets.
Each `vmagent` mirrors all ingested metrics into the estimator.
To reduce overhead, persistent queueing and metadata ingestion can be disabled for the estimator remote write path.
It is safe to send metrics from multiple independent `vmagent` instances into a single `vmestimator`.
Run vmestimator (see [configuration](https://github.com/VictoriaMetrics/vmestimator#configuration)):
```bash
/path/to/vmestimator -config=streams.yaml # -httpListenAddr=:8490
```
Run vmagent:
```bash
/path/to/vmagent \
-remoteWrite.url=http://127.0.0.1:8428/api/v1/write \
-remoteWrite.url=http://127.0.0.1:8490/cardinality/api/v1/write \
-remoteWrite.disableOnDiskQueue=false,true \
-remoteWrite.disableMetadata=false,true
```
The next step is to expose cardinality estimates as metrics.
For this, `vmagent` should scrape the estimator `/metrics` endpoint and forward those metrics to a `vmsingle` instance (or another VictoriaMetrics storage).
<img style="min-width:0;width: 100%" src="https://github.com/user-attachments/assets/e52d9210-b6f9-457b-8d8f-1d6ff6ba1416" />
This setup is straightforward and introduces minimal overhead.
The main drawback is that cardinality data shares the same storage with production metrics.
If that storage becomes unavailable, the visibility into cardinality is lost precisely when it may be most needed.
To mitigate this, we recommend running a separate `vmsingle` instance dedicated to scraping and storing VictoriaMetrics-related monitoring signals only.
This pattern is commonly referred to as a monitoring-of-monitoring (MoM) setup.
In this architecture, `vmestimator` metrics are isolated from production observability storage,
ensuring cardinality visibility remains available even during incidents affecting the primary monitoring system.
The resulting topology looks like this:
<img style="min-width:0;width: 100%" src="https://github.com/user-attachments/assets/e2ca4a69-e931-47a1-9d91-99749382d4a9" />
## Install
To quickly try VictoriaMetrics, just download the VictoriaMetrics docker image from [Docker Hub](https://hub.docker.com/r/victoriametrics/vmestimator) or [Quay](https://quay.io/repository/victoriametrics/vmestimator) and start it with the desired [command-line flags](https://github.com/VictoriaMetrics/vmestimator#command-line-flags).
## Configuration
To run vmestimator a `streams.yaml` config has to be provided:
```bash
/path/to/vmestimator -config=streams.yaml # -httpListenAddr=:8490
```
Config reference:
```yaml
streams:
-
# The measurement window: how long unique series are retained before the HLL sketch resets.
# Increases are always reflected immediately. Interval only controls how fast the estimate
# drops after previously seen series disappear.
#
# Running two streams with different intervals (e.g. 5m and 1h) lets you derive churn rate
# by comparing their estimates. See Use Cases -> Churn Rate
#
# default: 5m
interval: '5m'
# Label names used to split the cardinality estimate into per-combination groups.
# Each distinct combination of values for these labels gets its own estimate metric.
# Omit entirely for a single global estimate across all series.
# Examples:
# - ["job"]
# - ["__name__"]
# - ["vm_account_id","vm_project_id"]
#
# default: none (single global estimate)
group_by: ['job']
# Maximum number of distinct groups (HLL sketches) to track.
# Once the limit is reached, excess groups are counted in a single shared "rejected" sketch
# rather than getting their own entry. Acts as a memory cap and a safeguard against OOM
# when the group_by label values grow unboundedly.
# Memory upper bound per stream:
# group_limit * 2^hll_precision bytes.
#
# default: 10000
group_limit: 10000
# Number of shards used to reduce lock contention during parallel ingestion.
# Slightly increases memory for global streams (no group_by); negligible otherwise.
# Leave at the default unless you have profiled lock contention or have a specific reason to change it.
#
# default: min(64, 2*availableCPUs)
buckets: 64
# HyperLogLog precision p, in range [4..18].
# Determines the number of registers m = 2^p and the relative error 1.04 / sqrt(m):
# p=14 → m=16 384, error ~0.81%, memory ~16 KB per sketch (default, suits most cases)
# p=18 → m=262 144, error ~0.20%, memory ~256 KB per sketch (billing-grade accuracy)
# p=10 → m=1 024, error ~3.25%, memory ~1 KB per sketch (thousands of groups, memory-tight)
# See more in https://research.google.com/pubs/archive/40671.pdf
#
# default: 14
hll_precision: 14
# Whether to use the sparse HyperLogLog representation for low-cardinality groups.
# Sparse mode uses far less memory until a group's cardinality reaches ~2^(p-1),
# at which point it automatically promotes to the dense representation.
# See more in # See more in https://research.google.com/pubs/archive/40671.pdf
#
# default: true
hll_sparse: true
# Static labels attached to every output metric produced by this stream entry.
# Useful when multiple vmestimator instances feed the same storage and you need
# to distinguish their estimates in dashboards and alerts.
labels:
env: 'production'
region: 'eu-central-1'
```
## Cardinality Metrics
Cardinality estimates are exposed as the `cardinality_estimate` metric.
All metrics include `interval`, `group_by_keys`, `group_by_values`, and any static labels defined in the stream config.
For global estimates (no `group_by` configured), `group_by_keys` is `__global__` and `group_by_values` is omitted:
```
cardinality_estimate{interval="1h0m0s",group_by_keys="__global__"} 142300
```
For grouped estimates, one summary line shows the total number of distinct groups `group_by_keys="__group__"`, followed by one line per distinct label value combination.
Each per-group line also includes individual `by_{key}="{val}"` labels:
```
cardinality_estimate{interval="5m0s",group_by_keys="__group__",group_by_values="instance,job"} 2
cardinality_estimate{interval="5m0s",group_by_keys="instance,job",group_by_values="host1:9090,prometheus",by_instance="host1:9090",by_job="prometheus"} 312
cardinality_estimate{interval="5m0s",group_by_keys="instance,job",group_by_values="host2:9100,node",by_instance="host2:9100",by_job="node"} 87
```
Note: the total distinct group count in the summary line may exceed the number of per-group lines when `group_limit` is reached
and excess groups are counted in a single shared "rejected" sketch rather than getting their own entry.
By default, cardinality estimates are merged with the estimator's operational metrics and exposed at `/metrics`.
This is controlled by the `-cardinalityMetrics.exposeAt` flag:
- `-cardinalityMetrics.exposeAt=/metrics` (default): cardinality metrics merged with operational metrics at `/metrics`
- `-cardinalityMetrics.exposeAt=/cardinality/metrics`: cardinality metrics exposed at separate path
- `-cardinalityMetrics.exposeAt=`: cardinality metrics not exposed via HTTP
Computing cardinality estimates is expensive, so results are cached.
Cache duration is controlled by `-cardinalityMetrics.cacheTTL` (default: `30s`).
Set to `0` to disable caching entirely.
## Use Cases
### Basic
Global cardinality:
```yaml
# streams.yaml
- interval: '5m'
```
Per metric name cardinality:
```yaml
# streams.yaml
- interval: '5m'
group_by: ['__name__']
```
Per job label cardinality:
```yaml
# streams.yaml
- interval: '5m'
group_by: ['job']
```
Per tenant cardinality:
```yaml
# streams.yaml
- interval: '5m'
group_by: ['vm_account_id', 'vm_project_id']
```
### Churn rate calculation
[Churn rate](https://valyala.medium.com/prometheus-storage-technical-terms-for-humans-4ab4de6c3d48#churn-rate) measures how quickly time series are created and disappear.
[High churn](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate) means many series appear briefly and are replaced by new ones.
This puts pressure on storage, because each new series must be indexed regardless of how short its lifetime is.
To measure churn, configure two streams with the same `group_by` but different intervals. A short one (`5m`) and a long one (`1h`):
```yaml
# streams.yaml
- interval: '5m'
group_by: ['job']
- interval: '1h'
group_by: ['job']
```
The short interval (`5m`) captures the currently active series.
The long interval (`1h`) retains all series seen over the past hour.
When churn is low, both estimates are roughly equal.
When churn is high, the `1h` estimate grows significantly larger than the `5m` estimate, because the long window accumulates series that have already disappeared.
The following query computes the churn ratio per job:
```
max(cardinality_estimate{group_by_keys="job",interval="1h0m0s"}) without (job)
/
(max(cardinality_estimate{group_by_keys="job",interval="5m0s"}) without (job) * 12)
```
A result near `0` means the series set is stable. The same series were active throughout the entire hour.
A result near `1` means complete churn. Entirely different series appeared each 5-minute window.
Values in between indicate the fraction of maximum possible churn that is occurring.
This helps identify jobs that create the most indexing pressure on storage, even when their current active cardinality appears moderate.
## Alternative solutions
### PromQL
Cardinality can be estimated with PromQL.
Global cardinality:
```
count({__name__=~".*"})
```
Top ten metric names by cardinality:
```
topk(10, count({__name__=~".*"}) by (__name__))
```
Top ten jobs by cardinality:
```
topk(10, count({__name__=~".*"}) by (job))
```
This approach works for small setups but does not scale well, because these queries scan the entire time series set.
Most critically, if the storage is overloaded or unavailable, these queries could not be executed.
### Cardinality explorer
VictoriaMetrics includes a built-in [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer).
It provides per-metric detail beyond raw series counts: query frequency, last access time, day-over-day change, and share of total cardinality.
It is well suited for in-depth, ad-hoc investigation.
For example, finding metrics that are high-cardinality but rarely queried,
so they can be [dropped via relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/#how-to-drop-metrics-during-scrape) or reduce cardinality with [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/).
Both tools serve different purposes and work well together.
Use `vmestimator` for continuous monitoring, alerting, and cross-cluster cardinality tracking.
Use the cardinality explorer when you need to drill into a specific metric or label and understand what is driving its cardinality.
## Cluster
`vmestimator` supports a clustered deployment for high availability or when CPU on a single instance becomes a limiting factor.
Instances are split into two roles: **storage nodes** accept Prometheus remote write and maintain local HyperLogLog sketches; **selector nodes** query all storage nodes, merge their sketches, and expose a unified cardinality estimate. Cardinality estimate results should be scraped from selector nodes.
<img style="min-width:0;width: 100%" src="https://github.com/user-attachments/assets/846e5f77-378a-44dc-a4c8-2a1c64eca9d8" />
**Storage nodes:**
```
vmestimator -config=streams.yaml -httpListenAddr=:8491 -cardinalityMetrics.exposeAt=/cardinality/metrics
vmestimator -config=streams.yaml -httpListenAddr=:8492 -cardinalityMetrics.exposeAt=/cardinality/metrics
vmestimator -config=streams.yaml -httpListenAddr=:8493 -cardinalityMetrics.exposeAt=/cardinality/metrics
```
**Selector nodes:**
```
vmestimator -storageNode=http://vmestimator-storage-1:8491 \
-storageNode=http://vmestimator-storage-2:8492 \
-storageNode=http://vmestimator-storage-3:8493 \
-httpListenAddr=:8490
```
Setting `-cardinalityMetrics.exposeAt=/cardinality/metrics` on storage nodes keeps per-node estimates off the default `/metrics` path. The `/metrics` endpoint then returns only operational metrics, while `/cardinality/metrics` exposes the node's local estimate — useful for inspecting or debugging a specific node.
A selector with `-storageNode` flags and no `-config` runs without local estimators and only merges remote data.
When multiple selector nodes are scraped, each returns a fully merged estimate.
Deduplicate at query time to avoid overcounting:
```
max(cardinality_estimate) without (job)
```
## Operational metrics
When grouping is enabled, vmestimator exposes per-bucket operational metrics at `/metrics`:
- `vmestimator_estimator_group_size{group_by_keys, bucket}` — number of active groups in this bucket after the last rotation
- `vmestimator_estimator_group_rejected_size{group_by_keys}` — estimated number of distinct group values rejected since the last rotation because `group_limit` was reached
- `vmestimator_estimator_group_limit{group_by_keys, bucket}` — configured `group_limit` for this bucket
## Dashboards
Two Grafana dashboards are available in the [dashboards](https://github.com/VictoriaMetrics/vmestimator/tree/main/dashboards) directory:
- `vmestimator.json` — application health: CPU, memory, ingestion rates, concurrent inserts, and group key saturation.
- `cardinality-explorer.json` — cardinality analysis: global estimates, per-group-key series counts, and top-10 highest-cardinality label value combinations.
<img style="min-width:0;width: 100%" src="https://github.com/user-attachments/assets/2bd6a930-1eb5-40ef-8006-8196c1c12397" />
## How to build from sources
It is recommended to use the [docker images](https://hub.docker.com/r/victoriametrics/vmestimator).
### Development build
1. [Install Go](https://golang.org/doc/install).
1. Run `make vmestimator` from the root folder of [the repository](https://github.com/VictoriaMetrics/vmestimator).
It builds `vmestimator` binary and places it into the `bin` folder.
### Production build
1. [Install docker](https://docs.docker.com/install/).
1. Run `make vmestimator-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/vmestimator).
It builds `vmestimator-prod` binary and puts it into the `bin` folder.
### Building docker images
Run `make package-vmestimator`. It builds `victoriametrics/vmestimator:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmestimator`.
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image by setting it via `<ROOT_IMAGE>` environment variable.
For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
```sh
ROOT_IMAGE=scratch make package-vmrestore
```
You can build and publish to your own registry and namespace:
```
DOCKER_REGISTRIES=ghcr.io DOCKER_NAMESPACE=foo make publish-vmagent
```
## Command-line flags
Run `vmestimate -help` in order to see all the available options:
```
Usage of ./bin/vmestimator:
-cardinalityMetrics.cacheTTL duration
Duration for caching cardinality metrics response (default 30s)
-cardinalityMetrics.exposeAt string
HTTP path for exposing cardinality metrics. If set to the default /metrics, cardinality metrics are merged with regular metrics and exposed together. If set to a different path, only cardinality metrics are exposed at that endpoint. If set to an empty value, cardinality metrics are not exposed via HTTP at all. (default "/metrics")
-config string
Path to YAML configuration file
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP are used
-envflag.enable
Whether to enable reading flags from environment variables in addition to the command line. Command line flag values have priority over values from environment vars. Flags are read only from the command line if this flag isn't set. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#environment-variables for more details
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-filestream.disableFadvise
Whether to disable fadvise() syscall when reading large data files. The fadvise() syscall prevents from eviction of recently accessed data from OS page cache during background merges and backups. In some rare cases it is better to disable the syscall if it uses too much CPU
-flagsAuthKey value
Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides -httpAuth.*
Flag value can be read from the given file when using -flagsAuthKey=file:///abs/path/to/file or -flagsAuthKey=file://./relative/path/to/file.
Flag value can be read from the given http/https url when using -flagsAuthKey=http://host/path or -flagsAuthKey=https://host/path
-fs.maxConcurrency int
The maximum number of concurrent goroutines to work with files; smaller values may help reducing Go scheduling latency on systems with small number of CPU cores; higher values may help reducing data ingestion latency on systems with high-latency storage such as NFS or Ceph (default 160)
-http.connTimeout duration
Incoming connections to -httpListenAddr are closed after the configured timeout. This may help evenly spreading load among a cluster of services behind TCP-level load balancer. Zero value disables closing of incoming connections (default 2m0s)
-http.disableCORS
Disable CORS for all origins (*)
-http.disableKeepAlive
Whether to disable HTTP keep-alive for incoming connections at -httpListenAddr
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header, recommended: "default-src 'self'"
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header, recommended: 'max-age=31536000; includeSubDomains'
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password value
Password for HTTP server's Basic Auth. The authentication is disabled if -httpAuth.username is empty
Flag value can be read from the given file when using -httpAuth.password=file:///abs/path/to/file or -httpAuth.password=file://./relative/path/to/file.
Flag value can be read from the given http/https url when using -httpAuth.password=http://host/path or -httpAuth.password=https://host/path
-httpAuth.username string
Username for HTTP server's Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr array
TCP address to listen for incoming HTTP requests
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-insert.maxQueueDuration duration
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringCacheExpireDuration duration
The expiry duration for caches for interned strings. See https://en.wikipedia.org/wiki/String_interning . See also -internStringMaxLen and -internStringDisableCache (default 6m0s)
-internStringDisableCache
Whether to disable caches for interned strings. This may reduce memory usage at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringCacheExpireDuration and -internStringMaxLen
-internStringMaxLen int
The maximum length for strings to intern. A lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringDisableCache and -internStringCacheExpireDuration (default 500)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerMaxArgLen int
The maximum length of a single logged argument. Longer arguments are replaced with 'arg_start..arg_end', where 'arg_start' and 'arg_end' is prefix and suffix of the arg with the length not exceeding -loggerMaxArgLen / 2 (default 5000)
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent insert requests. Set higher value when clients send data over slow networks. Default value depends on the number of available CPU cores. It should work fine in most cases since it minimizes resource usage. See also -insert.maxQueueDuration (default 20)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 33554432)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache which will result in higher disk IO usage (default 60)
-metrics.exposeMetadata
Whether to expose TYPE and HELP metadata at the /metrics page, which is exposed at -httpListenAddr . The metadata may be needed when the /metrics page is consumed by systems, which require this information. For example, Managed Prometheus in Google Cloud - https://cloud.google.com/stackdriver/docs/managed-prometheus/troubleshooting#missing-metric-type
-metricsAuthKey value
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides -httpAuth.*
Flag value can be read from the given file when using -metricsAuthKey=file:///abs/path/to/file or -metricsAuthKey=file://./relative/path/to/file.
Flag value can be read from the given http/https url when using -metricsAuthKey=http://host/path or -metricsAuthKey=https://host/path
-pprofAuthKey value
Auth key for /debug/pprof/* endpoints. It must be passed via authKey query arg. It overrides -httpAuth.*
Flag value can be read from the given file when using -pprofAuthKey=file:///abs/path/to/file or -pprofAuthKey=file://./relative/path/to/file.
Flag value can be read from the given http/https url when using -pprofAuthKey=http://host/path or -pprofAuthKey=https://host/path
-pushmetrics.disableCompression
Whether to disable request body compression when pushing metrics to every -pushmetrics.url
-pushmetrics.extraLabel array
Optional labels to add to metrics pushed to every -pushmetrics.url . For example, -pushmetrics.extraLabel='instance="foo"' adds instance="foo" label to all the metrics pushed to every -pushmetrics.url
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-pushmetrics.header array
Optional HTTP request header to send to every -pushmetrics.url . For example, -pushmetrics.header='Authorization: Basic foobar' adds 'Authorization: Basic foobar' header to every request to every -pushmetrics.url
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-pushmetrics.interval duration
Interval for pushing metrics to every -pushmetrics.url (default 10s)
-pushmetrics.url array
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-secret.flags array
Comma-separated list of flag names with secret values. Values for these flags are hidden in logs and on /metrics page
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-storageNode array
HTTP URLs of remote vmestimator nodes to query for cardinality snapshots, e.g. http://vmestimator-2:8490
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-tls array
Whether to enable TLS for incoming HTTP requests at the given -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set. See also -mtls
Supports array of values separated by comma or specified via multiple flags.
Empty values are set to false.
-tlsCertFile array
Path to file with TLS certificate for the corresponding -httpListenAddr if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower. The provided certificate file is automatically re-read every second, so it can be dynamically updated. See also -tlsAutocertHosts
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-tlsCipherSuites array
Optional list of TLS cipher suites for incoming requests over HTTPS if -tls is set. See the list of supported cipher suites at https://pkg.go.dev/crypto/tls#pkg-constants
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-tlsKeyFile array
Path to file with TLS key for the corresponding -httpListenAddr if -tls is set. The provided key file is automatically re-read every second, so it can be dynamically updated. See also -tlsAutocertHosts
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-tlsMinVersion array
Optional minimum TLS version to use for the corresponding -httpListenAddr if -tls is set. Supported values: TLS10, TLS11, TLS12, TLS13
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-version
Show VictoriaMetrics version
```