mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2026-05-17 08:36:55 +03:00
docs: update stream aggregation docs (#10871)
* add visual mermaid diagram to demonstrate aggregation concept;
* update Recording-rules-alternative:
* * recommend using rate_sum instead of total for better reliability
* * demonstrate how to calculate sliding window, typicall for recording
rules
---------
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
(cherry picked from commit 569197d038)
This commit is contained in:
committed by
hagen1778
parent
a4612edf56
commit
8907caf176
@@ -9,13 +9,29 @@ sitemap:
|
||||
[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/)
|
||||
can aggregate incoming [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples) in streaming mode **by time** and **by labels** before data is written to remote storage
|
||||
(or local storage for single-node VictoriaMetrics).
|
||||
The aggregation is applied to all the metrics received via any [supported data ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-time-series-data)
|
||||
and/or scraped from [Prometheus-compatible targets](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-scrape-prometheus-exporters-such-as-node-exporter),
|
||||
and allows building [flexible processing pipelines](#routing).
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A["requests_total{instance=foo}"] --> V[vmagent]
|
||||
B["requests_total{instance=bar}"] --> V
|
||||
C["requests_total{instance=baz}"] --> V
|
||||
V --> D[requests_total:rate5m]
|
||||
```
|
||||
|
||||
> By default, stream aggregation ignores timestamps associated with the input [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples). It expects that the ingested samples have timestamps close to the current time. See [how to ignore old samples](#ignoring-old-samples).
|
||||
# Features
|
||||
|
||||
> If `-streamAggr.dedupInterval` is enabled, out-of-order samples (older than already received) within the configured interval are treated as duplicates and ignored. See [de-duplication](#deduplication).
|
||||
Stream aggregation has the following features:
|
||||
|
||||
- It can calculate [aggregates](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs) on ingested samples before they're sent to remote destination;
|
||||
- It is applied to all the metric samples received via any [supported data ingestion protocol](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-time-series-data)
|
||||
and/or scraped from [Prometheus-compatible targets](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-scrape-prometheus-exporters-such-as-node-exporter)
|
||||
- It can filter out raw samples matched by aggregation rules, so raw data will never reach the remote destination. See `--streamAggr.keepInput` and `-streamAggr.dropInput` in [aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/);
|
||||
- It allows building [flexible processing pipelines](#routing);
|
||||
|
||||
# Limitations
|
||||
|
||||
- Stream aggregation **ignores timestamps associated with the input [samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples)**.
|
||||
It expects that the ingested samples have timestamps close to the current time. See [how to ignore old samples](#ignoring-old-samples).
|
||||
- Aggregation state is held in the process memory and will be lost on process restart.
|
||||
|
||||
# Use cases
|
||||
|
||||
@@ -41,36 +57,42 @@ and not available for [Statsd metrics format](https://github.com/statsd/statsd/b
|
||||
|
||||
## Recording rules alternative
|
||||
|
||||
Sometimes [alerting queries](https://docs.victoriametrics.com/victoriametrics/vmalert/#alerting-rules) may require non-trivial amounts of CPU, RAM,
|
||||
disk IO and network bandwidth at metrics storage side. For example, if `http_request_duration_seconds` histogram is generated by thousands
|
||||
of application instances, then the alerting query `histogram_quantile(0.99, sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)) > 0.5`
|
||||
can become slow, since it needs to scan too big number of unique [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series)
|
||||
with `http_request_duration_seconds_bucket` name. This alerting query can be accelerated by pre-calculating
|
||||
the `sum(increase(http_request_duration_seconds_bucket[5m])) without (instance)` via [recording rule](https://docs.victoriametrics.com/victoriametrics/vmalert/#recording-rules).
|
||||
But this recording rule may take too much time to execute too. In this case the slow recording rule can be substituted
|
||||
with the following [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config):
|
||||
Sometimes [rules](https://docs.victoriametrics.com/victoriametrics/vmalert/#rules) may require non-trivial amounts of CPU, RAM,
|
||||
disk IO and network bandwidth for processing on the metrics storage side.
|
||||
|
||||
For example, if the `http_request_duration_seconds` histogram is generated by thousands of application instances,
|
||||
then the alerting query `histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[2m])) without (instance)) > 0.5`
|
||||
can become slow, since it needs to scan a large number of unique [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series).
|
||||
|
||||
This alerting query can be accelerated by pre-calculating
|
||||
the `sum(rate(http_request_duration_seconds_bucket[5m])) without (instance)` via [recording rule](https://docs.victoriametrics.com/victoriametrics/vmalert/#recording-rules).
|
||||
But it only shifts slowness from the alerting rule to the recording rule, since calculation still has to happen somewhere.
|
||||
It is better to substitute the slow recording rule with the following [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config):
|
||||
|
||||
```yaml
|
||||
- match: 'http_request_duration_seconds_bucket'
|
||||
interval: 5m
|
||||
interval: 1m
|
||||
without: [instance]
|
||||
outputs: [total]
|
||||
outputs: [rate_sum]
|
||||
```
|
||||
|
||||
This stream aggregation generates `http_request_duration_seconds_bucket:5m_without_instance_total` output series according to [output metric naming](#output-metric-names).
|
||||
> Field `interval` should be set to a value at least several times higher than the matched metrics collection interval.
|
||||
|
||||
This stream aggregation generates `http_request_duration_seconds_bucket:1m_without_instance_rate_sum` output series according to [output metric naming](#output-metric-names).
|
||||
Then these series can be used in [alerting rules](https://docs.victoriametrics.com/victoriametrics/vmalert/#alerting-rules):
|
||||
|
||||
```metricsql
|
||||
histogram_quantile(0.99, last_over_time(http_request_duration_seconds_bucket:5m_without_instance_total[5m])) > 0.5
|
||||
histogram_quantile(0.99, avg_over_time(http_request_duration_seconds_bucket:1m_without_instance_rate_sum[5m])) > 0.5
|
||||
```
|
||||
|
||||
This query is executed much faster than the original query, because it needs to scan much lower number of time series.
|
||||
This query executes much faster than the original one because it needs to scan fewer time series.
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
> `avg_over_time(<aggregate:1m>[5m])` is similar to recording rules calculating rate over a sliding window of `5m` with `1m` interval.
|
||||
If the sliding window isn't important, then simply omit the `avg_over_time` aggregation in the expression.
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
Field `interval` is recommended to be set to a value at least several times higher than your metrics collect interval.
|
||||
|
||||
## Reducing the number of stored samples
|
||||
|
||||
If per-[series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) samples are ingested at high frequency,
|
||||
@@ -89,7 +111,7 @@ to one sample per 5 minutes per each input time series (this operation is also k
|
||||
interval: 5m
|
||||
outputs: [total]
|
||||
|
||||
# Downsample other metrics with `count_samples`, `sum_samples`, `min` and `max` outputs
|
||||
# Downsample other metrics with `count_samples`, `sum_samples`, `min`, and `max` outputs
|
||||
# See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs
|
||||
- match: '{__name__!~".+_total"}'
|
||||
interval: 5m
|
||||
@@ -109,14 +131,14 @@ some_metric:5m_min
|
||||
some_metric:5m_max
|
||||
```
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [aggregating histograms](#aggregating-histograms) and [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
## Reducing the number of stored series
|
||||
|
||||
Sometimes applications may generate too many [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series).
|
||||
For example, the `http_requests_total` metric may have `path` or `user` label with too big number of unique values.
|
||||
In this case the following stream aggregation can be used for reducing the number metrics stored in VictoriaMetrics:
|
||||
For example, the `http_requests_total` metric may have `path` or `user` label with too many unique values.
|
||||
In this case, the following stream aggregation can be used for reducing the number of metrics stored in VictoriaMetrics:
|
||||
|
||||
```yaml
|
||||
- match: 'http_requests_total'
|
||||
@@ -134,17 +156,17 @@ The aggregated output metric has the following name according to [output metric
|
||||
http_requests_total:30s_without_path_user_total
|
||||
```
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [aggregating histograms](#aggregating-histograms).
|
||||
|
||||
## Counting input samples
|
||||
|
||||
If the monitored application generates event-based metrics, then it may be useful to count the number of such metrics
|
||||
at stream aggregation level.
|
||||
at the stream aggregation level.
|
||||
|
||||
For example, if an advertising server generates `hits{some="labels"} 1` and `clicks{some="labels"} 1` metrics
|
||||
per each incoming hit and click, then the following [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config)
|
||||
can be used for counting these metrics per every 30 second interval:
|
||||
can be used for counting these metrics every 30-second interval:
|
||||
|
||||
```yaml
|
||||
- match: '{__name__=~"hits|clicks"}'
|
||||
@@ -160,7 +182,7 @@ hits:30s_count_samples count1
|
||||
clicks:30s_count_samples count2
|
||||
```
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
## Summing input metrics
|
||||
@@ -186,12 +208,12 @@ hits:1m_sum_samples sum1
|
||||
clicks:1m_sum_samples sum2
|
||||
```
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
## Quantiles over input metrics
|
||||
|
||||
If the monitored application generates measurement metrics per each request, then it may be useful to calculate
|
||||
If the monitored application generates measurement metrics for each request, then it may be useful to calculate
|
||||
the pre-defined set of [percentiles](https://en.wikipedia.org/wiki/Percentile) over these measurements.
|
||||
|
||||
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
|
||||
@@ -216,12 +238,12 @@ response_size_bytes:30s_quantiles{quantile="0.50"} value1
|
||||
response_size_bytes:30s_quantiles{quantile="0.99"} value2
|
||||
```
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [histograms over input metrics](#histograms-over-input-metrics) and [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
## Histograms over input metrics
|
||||
|
||||
If the monitored application generates measurement metrics per each request, then it may be useful to calculate
|
||||
If the monitored application generates measurement metrics for each request, then it may be useful to calculate
|
||||
a [histogram](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#histogram) over these metrics.
|
||||
|
||||
For example, if the monitored application generates `request_duration_seconds N` and `response_size_bytes M` metrics
|
||||
@@ -267,7 +289,7 @@ The resulting histogram buckets can be queried with [MetricsQL](https://docs.vic
|
||||
histogram_stddev(sum(increase(request_duration_seconds:60s_histogram_bucket[1h])) by (vmrange))
|
||||
```
|
||||
|
||||
This query uses [histogram_stddev](https://docs.victoriametrics.com/victoriametrics/metricsql/#histogram_stddev) function.
|
||||
This query uses the [histogram_stddev](https://docs.victoriametrics.com/victoriametrics/metricsql/#histogram_stddev) function.
|
||||
|
||||
1. An estimated share of requests with the duration smaller than `0.5s` over the last hour:
|
||||
|
||||
@@ -277,7 +299,7 @@ The resulting histogram buckets can be queried with [MetricsQL](https://docs.vic
|
||||
|
||||
This query uses [histogram_share](https://docs.victoriametrics.com/victoriametrics/metricsql/#histogram_share) function.
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [quantiles over input metrics](#quantiles-over-input-metrics) and [aggregating by labels](#aggregating-by-labels).
|
||||
|
||||
## Aggregating histograms
|
||||
@@ -319,7 +341,7 @@ have no such requirement.
|
||||
|
||||
It's recommended to use [aggregation windows](#aggregation-windows) when aggregating histograms if you observe [accuracy issues](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4580).
|
||||
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at `output` field.
|
||||
See [the list of aggregate output](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs), which can be specified at the `output` field.
|
||||
See also [histograms over input metrics](#histograms-over-input-metrics) and [quantiles over input metrics](#quantiles-over-input-metrics).
|
||||
|
||||
|
||||
@@ -327,9 +349,9 @@ See also [histograms over input metrics](#histograms-over-input-metrics) and [qu
|
||||
|
||||
[Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) supports relabeling,
|
||||
deduplication and stream aggregation for all the received data, scraped or pushed.
|
||||
The processed data is then stored in local storage and **can't be forwarded further**.
|
||||
The processed data is then stored in local storage, and **can't be forwarded further**.
|
||||
|
||||
[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) supports relabeling, deduplication and stream aggregation for all
|
||||
[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) supports relabeling, deduplication, and stream aggregation for all
|
||||
the received data, scraped or pushed. See the [processing order for vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/#life-of-a-sample).
|
||||
|
||||
Typical scenarios for data routing with `vmagent`:
|
||||
@@ -342,42 +364,44 @@ Typical scenarios for data routing with `vmagent`:
|
||||
|
||||
# Deduplication
|
||||
|
||||
[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) supports online [de-duplication](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) of samples
|
||||
before sending them to the configured `-remoteWrite.url`. The de-duplication can be enabled via the following options:
|
||||
If `-streamAggr.dedupInterval` is enabled, out-of-order samples (older than already received) within the configured interval are treated as duplicates and ignored. See [deduplication](#deduplication).
|
||||
|
||||
- By specifying the desired de-duplication interval via `-streamAggr.dedupInterval` command-line flag for all received data
|
||||
[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) supports [deduplication](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) of samples
|
||||
before sending them to the configured `-remoteWrite.url`. The deduplication can be enabled via the following options:
|
||||
|
||||
- By specifying the desired deduplication interval via `-streamAggr.dedupInterval` command-line flag for all received data
|
||||
or via `-remoteWrite.streamAggr.dedupInterval` command-line flag for the particular `-remoteWrite.url` destination.
|
||||
For example, `./vmagent -remoteWrite.url=http://remote-storage/api/v1/write -remoteWrite.streamAggr.dedupInterval=30s` instructs `vmagent` to leave
|
||||
only the last sample per each seen [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) per every 30 seconds.
|
||||
only the last sample for each seen [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) every 30 seconds.
|
||||
The de-deduplication is performed after applying [relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) and
|
||||
before performing the aggregation.
|
||||
|
||||
- By specifying `dedup_interval` option individually per each [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config)
|
||||
- By specifying the `dedup_interval` option individually per each [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config)
|
||||
in `-remoteWrite.streamAggr.config` or `-streamAggr.config` configs.
|
||||
|
||||
[Single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) supports two types of de-duplication:
|
||||
- After storing the duplicate samples to local storage. See [`-dedup.minScrapeInterval`](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) command-line option.
|
||||
- Before storing the duplicate samples to local storage. This type of de-duplication can be enabled via the following options:
|
||||
- By specifying the desired de-duplication interval via `-streamAggr.dedupInterval` command-line flag.
|
||||
For example, `./victoria-metrics -streamAggr.dedupInterval=30s` instructs VictoriaMetrics to leave only the last sample per each
|
||||
- After storing the duplicate samples in local storage. See [`-dedup.minScrapeInterval`](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) command-line option.
|
||||
- Before storing the duplicate samples in local storage. This type of deduplication can be enabled via the following options:
|
||||
- By specifying the desired deduplication interval via the `-streamAggr.dedupInterval` command-line flag.
|
||||
For example, `./victoria-metrics -streamAggr.dedupInterval=30s` instructs VictoriaMetrics to leave only the last sample for each
|
||||
seen [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) per every 30 seconds.
|
||||
The de-duplication is performed after applying `-relabelConfig` [relabeling](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#relabeling).
|
||||
The deduplication is performed after applying `-relabelConfig` [relabeling](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#relabeling).
|
||||
|
||||
- By specifying `dedup_interval` option individually per each [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config) at `-streamAggr.config`.
|
||||
|
||||
It is possible to drop the given labels before applying the de-duplication. See [these docs](#dropping-unneeded-labels).
|
||||
It is possible to drop the given labels before applying the deduplication. See [these docs](#dropping-unneeded-labels).
|
||||
|
||||
The online de-duplication uses the same logic as [`-dedup.minScrapeInterval` command-line flag](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) at VictoriaMetrics.
|
||||
The online deduplication uses the same logic as [`-dedup.minScrapeInterval` command-line flag](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#deduplication) at VictoriaMetrics.
|
||||
|
||||
De-duplication is applied before stream aggregation rules and can drop samples before they get matched for aggregation.
|
||||
Deduplication is applied before stream aggregation rules and can drop samples before they get matched for aggregation.
|
||||
|
||||
# Relabeling
|
||||
|
||||
It is possible to apply [arbitrary relabeling](https://docs.victoriametrics.com/victoriametrics/relabeling/) to input and output metrics
|
||||
during stream aggregation via `input_relabel_configs` and `output_relabel_configs` options in [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
|
||||
Relabeling rules inside `input_relabel_configs` are applied to samples matching the `match` filters before optional [deduplication](#deduplication).
|
||||
Relabeling rules inside `output_relabel_configs` are applied to aggregated samples before sending them to the remote storage.
|
||||
Relabeling rules inside `input_relabel_configs` are applied to samples matching the `match` filters before optional [deduplication](# deduplication).
|
||||
Relabeling rules in `output_relabel_configs` are applied to aggregated samples before they are sent to the remote storage.
|
||||
|
||||
For example, the following config removes the `:1m_sum_samples` suffix added [to the output metric name](#output-metric-names):
|
||||
|
||||
@@ -444,12 +468,12 @@ For example:
|
||||
- if `interval: 1m` is set, then the aggregated data is flushed to the storage at the end of every minute
|
||||
- if `interval: 1h` is set, then the aggregated data is flushed to the storage at the end of every hour
|
||||
|
||||
If you do not need such an alignment, then set `no_align_flush_to_interval: true` option in the [aggregate config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
In this case aggregated data flushes will be aligned to the `vmagent` start time or to [config reload](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#configuration-update) time.
|
||||
If you do not need such an alignment, then set the `no_align_flush_to_interval: true` option in the [aggregate config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
In this case, aggregated data flushes will be aligned to the `vmagent` start time or to [config reload](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#configuration-update) time.
|
||||
|
||||
The aggregated data on the first and the last interval is dropped during `vmagent` start, restart or [config reload](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#configuration-update),
|
||||
since the first and the last aggregation intervals are incomplete, so they usually contain incomplete confusing data.
|
||||
If you need preserving the aggregated data on these intervals, then set `flush_on_shutdown: true` option in the [aggregate config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
The aggregated data on the first and the last interval is dropped during `vmagent` start, restart, or [config reload](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#configuration-update),
|
||||
since the first and last aggregation intervals are incomplete, they usually contain incomplete, confusing data.
|
||||
If you need to preserve the aggregated data on these intervals, then set `flush_on_shutdown: true` option in the [aggregate config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
|
||||
See also:
|
||||
|
||||
@@ -480,7 +504,7 @@ The `keep_metric_names` option can be used if only a single output is set in [`o
|
||||
|
||||
## Aggregating by labels
|
||||
|
||||
All the labels for the input metrics are preserved by default in the output metrics. For example,
|
||||
By default, all labels from the input metrics are preserved in the output metrics. For example,
|
||||
the input metric `foo{app="bar",instance="host1"}` results to the output metric `foo:1m_sum_samples{app="bar",instance="host1"}`
|
||||
when the following [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config) is used:
|
||||
|
||||
@@ -489,7 +513,7 @@ when the following [stream aggregation config](https://docs.victoriametrics.com/
|
||||
outputs: [sum_samples]
|
||||
```
|
||||
|
||||
The input labels can be removed via `without` list specified in the config. For example, the following config
|
||||
The input labels can be removed via a `without` list specified in the config. For example, the following config
|
||||
removes the `instance` label from output metrics by summing input samples across all the instances:
|
||||
|
||||
```yaml
|
||||
@@ -501,7 +525,7 @@ removes the `instance` label from output metrics by summing input samples across
|
||||
In this case the `foo{app="bar",instance="..."}` input metrics are transformed into `foo:1m_without_instance_sum_samples{app="bar"}`
|
||||
output metric according to [output metric naming](#output-metric-names).
|
||||
|
||||
It is possible specifying the exact list of labels in the output metrics via `by` list.
|
||||
It is possible to specify the exact list of labels in the output metrics via the `by` list.
|
||||
For example, the following config sums input samples by the `app` label:
|
||||
|
||||
```yaml
|
||||
@@ -513,7 +537,7 @@ For example, the following config sums input samples by the `app` label:
|
||||
In this case the `foo{app="bar",instance="..."}` input metrics are transformed into `foo:1m_by_app_sum_samples{app="bar"}`
|
||||
output metric according to [output metric naming](#output-metric-names).
|
||||
|
||||
The labels used in `by` and `without` lists can be modified via `input_relabel_configs` section - see [these docs](#relabeling).
|
||||
The labels used in `by` and `without` lists can be modified via the `input_relabel_configs` section - see [these docs](#relabeling).
|
||||
|
||||
See also [aggregation outputs](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-outputs).
|
||||
|
||||
@@ -579,32 +603,32 @@ Below is an example of an `aggr.yaml` configuration that drops the `replica` and
|
||||
|
||||
## Aggregation windows
|
||||
|
||||
By default, stream aggregation and deduplication stores a single state per each aggregation output result.
|
||||
By default, stream aggregation and deduplication store a single state for each aggregation output result.
|
||||
The data for each aggregator is flushed independently once per aggregation interval. But there's no guarantee that
|
||||
incoming samples with timestamps close to the aggregation interval's end will get into it. For example, when aggregating
|
||||
with `interval: 1m` a data sample with timestamp 1739473078 (18:57:59) can fall into aggregation round `18:58:00` or `18:59:00`.
|
||||
It depends on network lag, load, clock synchronization, etc. In most scenarios it doesn't impact aggregation or
|
||||
deduplication results, which are consistent within margin of error. But for metrics represented as a collection of series,
|
||||
with `interval: 1m`, a data sample with timestamp 1739473078 (18:57:59) can fall into the aggregation round `18:58:00` or `18:59:00`.
|
||||
It depends on network lag, load, clock synchronization, etc. In most scenarios, it doesn't impact aggregation or
|
||||
deduplication results, which are consistent within the margin of error. But for metrics represented as a collection of series,
|
||||
like [histograms](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#histogram), such inaccuracy leads to invalid aggregation results.
|
||||
|
||||
For this case, streaming aggregation and deduplication support mode with aggregation windows for current and previous state.
|
||||
With this mode, flush doesn't happen immediately but is shifted by a calculated samples lag that improves correctness for delayed data. {{% available_from "v1.112.0" %}}
|
||||
For this case, streaming aggregation and deduplication support mode with aggregation windows for the current and previous state.
|
||||
With this mode, flush doesn't happen immediately but is shifted by a calculated sample lag that improves correctness for delayed data. {{% available_from "v1.112.0" %}}
|
||||
|
||||
Enabling of this mode has increased resource usage: memory usage is expected to double as aggregation will store two states
|
||||
instead of one. However, this significantly improves accuracy of calculations. Aggregation windows can be enabled via
|
||||
Enabling this mode has increased resource usage: memory usage is expected to double as aggregation will store two states
|
||||
instead of one. However, this significantly improves the accuracy of calculations. Aggregation windows can be enabled via
|
||||
the following settings:
|
||||
|
||||
- `-streamAggr.enableWindows` at [single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/)
|
||||
and [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/). At [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/)
|
||||
`-remoteWrite.streamAggr.enableWindows` flag can be specified individually per each `-remoteWrite.url`.
|
||||
If one of these flags is set, then all aggregators will be using fixed windows. In conjunction with `-remoteWrite.streamAggr.dedupInterval` or
|
||||
`-streamAggr.dedupInterval` fixed aggregation windows are enabled on deduplicator as well.
|
||||
`-remoteWrite.streamAggr.enableWindows` flag can be specified individually for each `-remoteWrite.url`.
|
||||
If one of these flags is set, all aggregators will use fixed windows. In conjunction with `-remoteWrite.streamAggr.dedupInterval` or
|
||||
`-streamAggr.dedupInterval` fixed aggregation windows are enabled on the deduplicator as well.
|
||||
- `enable_windows` option in [aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#stream-aggregation-config).
|
||||
It allows enabling aggregation windows for a specific aggregator.
|
||||
|
||||
## Counter resets
|
||||
|
||||
If counter-specific outputs, such as `total*`, `rate*`, and `increase*`, produce values that are significantly higher than anticipated, then check the `vm_streamaggr_counter_resets_total` metric. This metric increments each time when [counter reset event](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#counter) is happening and could be caused by duplication or collision of raw samples. If you observe duplication or collision - try solving this problem by either fixing the source of these metrics or by [deduplicating](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication) these samples before aggregation.
|
||||
If counter-specific outputs, such as `total*`, `rate*`, and `increase*`, produce values that are significantly higher than anticipated, then check the `vm_streamaggr_counter_resets_total` metric. This metric increments each time when [counter reset event](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#counter) happens and could be caused by duplication or collision of raw samples. If you observe duplication or collision, try solving this problem by either fixing the source of these metrics or by [deduplicating](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication) these samples before aggregation.
|
||||
|
||||
## Staleness
|
||||
|
||||
@@ -618,7 +642,7 @@ The following outputs track the last seen per-series values in order to properly
|
||||
- [total](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#total)
|
||||
- [total_prometheus](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#total_prometheus)
|
||||
|
||||
The last seen per-series value is dropped if no new samples are received for the given time series during two consecutive aggregation
|
||||
The last seen per-series value is dropped if no new samples are received for the given time series during two consecutive aggregations
|
||||
intervals specified in [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config) via `interval` option.
|
||||
If a new sample for the existing time series is received after that, then it is treated as the first sample for a new time series.
|
||||
This may lead to the following issues:
|
||||
@@ -632,16 +656,16 @@ These issues can be fixed in the following ways:
|
||||
- By increasing the `interval` option at [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config), so it covers the expected
|
||||
delays in data ingestion pipelines.
|
||||
- By specifying the `staleness_interval` option at [stream aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config), so it covers the expected
|
||||
delays in data ingestion pipelines. By default, the `staleness_interval` equals to `2 x interval`.
|
||||
delays in data ingestion pipelines. By default, the `staleness_interval` is equal to `2 x interval`.
|
||||
|
||||
## High resource usage
|
||||
|
||||
The following solutions can help reducing memory usage and CPU usage during streaming aggregation:
|
||||
The following solutions can help reduce memory usage and CPU usage during streaming aggregation:
|
||||
|
||||
- To use more specific `match` filters at [streaming aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config), so only the really needed
|
||||
[raw samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples) are aggregated.
|
||||
- To increase aggregation interval by specifying bigger duration for the `interval` option at [streaming aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
- To generate lower number of output time series by using less specific [`by` list](#aggregating-by-labels) or more specific [`without` list](#aggregating-by-labels).
|
||||
- To increase the aggregation interval by specifying a bigger duration for the `interval` option at [streaming aggregation config](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config).
|
||||
- To generate a lower number of output time series by using less specific [`by` list](#aggregating-by-labels) or more specific [`without` list](#aggregating-by-labels).
|
||||
- To drop unneeded long labels in input samples via [input_relabel_configs](#relabeling).
|
||||
|
||||
## Cluster mode
|
||||
@@ -654,32 +678,32 @@ For example, if more than one `vmagent` instance calculates [increase](https://d
|
||||
with `by: [path]` option, then all the `vmagent` instances will aggregate samples to the same set of time series with different `path` labels.
|
||||
The proper fix would be [adding a unique label](https://docs.victoriametrics.com/victoriametrics/vmagent/#adding-labels-to-metrics) for all the output samples
|
||||
produced by each `vmagent`, so they are aggregated into distinct sets of [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series).
|
||||
These time series then can be aggregated later as needed during querying.
|
||||
These time series can then be aggregated later as needed during querying.
|
||||
|
||||
If `vmagent` instances run in Docker or Kubernetes, then you can refer `POD_NAME` or `HOSTNAME` environment variables
|
||||
If `vmagent` instances run in Docker or Kubernetes, then you can refer to `POD_NAME` or `HOSTNAME` environment variables
|
||||
as a unique label value per each `vmagent` via `-remoteWrite.label=vmagent=%{HOSTNAME}` command-line flag.
|
||||
See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#environment-variables) on how to refer environment variables in VictoriaMetrics components.
|
||||
See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#environment-variables) on how to refer to environment variables in VictoriaMetrics components.
|
||||
|
||||
## Common mistakes
|
||||
|
||||
### Put aggregator behind load balancer
|
||||
|
||||
When configuring the aggregation rule, make sure that `vmagent` receives all the required data to satisfy the `match` rule.
|
||||
If traffic to the vmagent goes through the load balancer, it could happen that vmagent will be receiving only fraction of the data
|
||||
When configuring the aggregation rule, ensure that `vmagent` receives all required data to satisfy the `match` rule.
|
||||
If traffic to the vmagent goes through the load balancer, it could happen that vmagent will be receiving only a fraction of the data
|
||||
and produce incomplete aggregations.
|
||||
|
||||
To keep aggregation results consistent, make sure that vmagent receives all the required data for aggregation. In case if you need to
|
||||
To keep aggregation results consistent, ensure that vmagent receives all required data for aggregation. In case you need to
|
||||
split the load across multiple vmagents, try sharding the traffic among them via metric names or labels.
|
||||
For example, see how vmagent could consistently [shard data across remote write destinations](https://docs.victoriametrics.com/victoriametrics/vmagent/#sharding-among-remote-storages)
|
||||
via `-remoteWrite.shardByURL.labels` or `-remoteWrite.shardByURL.ignoreLabels` cmd-line flags.
|
||||
|
||||
### Create aggregator per each recording rule
|
||||
|
||||
Stream aggregation can be used as alternative for [recording rules](#recording-rules-alternative).
|
||||
But creating an aggregation rule per each recording rule can lead to elevated resource usage on the vmagent,
|
||||
Stream aggregation can be used as an alternative for [recording rules](#recording-rules-alternative).
|
||||
But creating an aggregation rule for each recording rule can lead to elevated resource usage on the vmagent,
|
||||
because the ingestion stream should be matched against every configured aggregation rule.
|
||||
|
||||
To optimize this, we recommend merging together aggregations which only differ in match expressions.
|
||||
To optimize this, we recommend merging together aggregations that only differ in match expressions.
|
||||
For example, let's see the following list of recording rules:
|
||||
|
||||
```yaml
|
||||
@@ -709,7 +733,7 @@ These rules can be effectively converted into a single aggregation rule:
|
||||
replacement: "instance:$1:rate:sum"
|
||||
```
|
||||
|
||||
**Note**: having separate aggregator for a certain `match` expression can only be justified when aggregator cannot keep up with all
|
||||
**Note**: having a separate aggregator for a certain `match` expression can only be justified when the aggregator cannot keep up with all
|
||||
the data pushed to an aggregator within an aggregation interval.
|
||||
|
||||
### Use identical --remoteWrite.streamAggr.config for all remote writes
|
||||
@@ -721,9 +745,9 @@ across multiple `-remoteWrite.url`.
|
||||
|
||||
### Use aggregated metrics like original ones
|
||||
|
||||
Stream aggregation allows keeping original metric names after aggregation by using `keep_metric_names` setting.
|
||||
But the "meaning" of aggregated metrics is usually different to original ones after the aggregation.
|
||||
Make sure that you updated queries in your alerting rules and dashboards accordingly if you used `keep_metric_names` setting.
|
||||
Stream aggregation allows keeping original metric names after aggregation by using the `keep_metric_names` setting.
|
||||
But the "meaning" of aggregated metrics is usually different from that of the original metrics.
|
||||
Make sure that you update queries in your alerting rules and dashboards accordingly if you used the `keep_metric_names` setting.
|
||||
|
||||
### Use different deduplication intervals on storage and vmagent
|
||||
|
||||
@@ -734,7 +758,7 @@ To avoid this, set `-streamAggr.dedupInterval` or `-remoteWrite.streamAggr.dedup
|
||||
|
||||
---
|
||||
|
||||
Section below contains backward-compatible anchors for links that were moved or renamed.
|
||||
The section below contains backward-compatible anchors for links that were moved or renamed.
|
||||
|
||||
###### Configuration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user