### Describe Your Changes Some of the missing doc updates after 1.23.0 release ### Checklist The following checks are **mandatory**: - [x] My change adheres to [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
22 KiB
title, weight, menu, tags, aliases
| title | weight | menu | tags | aliases | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reader | 2 |
|
|
|
VictoriaMetrics Anomaly Detection (vmanomaly) primarily uses VmReader to ingest data. This reader focuses on fetching time-series data directly from VictoriaMetrics with the help of powerful MetricsQL expressions for aggregating, filtering and grouping your data, ensuring seamless integration and efficient data handling.
Future updates will introduce additional readers, expanding the range of data sources vmanomaly can work with.
VM reader
There is backward-compatible change{{% available_from "v1.13.0" anomaly %}} of
queriesarg of VmReader. New format allows to specify per-query parameters, likestepto reduce amount of data read from VictoriaMetrics TSDB and to allow config flexibility. Please see per-query parameters section for the details.
Old format like
# other config sections ...
reader:
class: 'vm'
datasource_url: 'http://localhost:8428' # source victoriametrics/prometheus
sampling_period: "10s" # set it <= min(infer_every) in schedulers section
queries:
# old format {query_alias: query_expr}, prior to 1.13, will be converted to a new format automatically
vmb: 'avg(vm_blocks)'
will be converted to a new one with a warning raised in logs:
# other config sections ...
reader:
class: 'vm'
datasource_url: 'http://localhost:8428' # source victoriametrics/prometheus
sampling_period: '10s'
queries:
# old format {query_alias: query_expr}, prior to 1.13, will be converted to a new format automatically
vmb:
expr: 'avg(vm_blocks)' # initial MetricsQL expression
step: '10s' # individual step for this query, will be filled with `sampling_period` from the root level
data_range: ['-inf', 'inf'] # by default, no constraints applied on data range
tz: 'UTC' # by default, tz-free data is used throughout the model lifecycle
# new query-level arguments will be added in backward-compatible way in future releases
Per-query parameters
There is change{{% available_from "v1.13.0" anomaly %}} of queries arg format. Now each query alias supports the next (sub)fields, which override reader-level parameters, if set:
-
expr(string): MetricsQL/PromQL expression that defines an input for VmReader. As accepted by/query_range?query=%s. i.e.avg(vm_blocks) -
step(string): query-level frequency of the points returned, i.e.30s. Will be converted to/query_range?step=%sparam (in seconds). Useful to optimize total amount of data read from VictoriaMetrics, where different queries may have different frequencies for different machine learning models to run on.If not set explicitly (or if older config style prior to v1.13.0) is used, then it is set to reader-level
sampling_periodarg.Having different individual
stepargs for queries (i.e.30sforq1and2mforq2) is not yet supported for multivariate model if you want to run it on several queries simultaneously (i.e. settingqueriesarg of a model to [q1,q2]). -
data_range{{% available_from "v1.15.1" anomaly %}} (list[float | string]): It allows defining valid data ranges for input per individual query inqueries, resulting in:- High anomaly scores (>1) when the data falls outside the expected range, indicating a data range constraint violation (e.g. improperly configured metricsQL query, sensor malfunction, overflows in underlying metrics, etc.). Anomaly scores can be set to a specific value, like
5, to indicate a strong violation, using theanomaly_score_outside_data_rangearg of a respective model this query is used in. - Lowest anomaly scores (=0) when the model's predictions (
yhat) fall outside the expected range, meaning uncertain predictions that does not really aligh with the data.
Works together with
anomaly_score_outside_data_rangearg of a model to determine the anomaly score for such cases as well as withclip_predictionsarg of a model to clip the predictions to the expected range.If not set explicitly (or if older config style prior to v1.13.0) is used, then it is set to reader-level
data_rangearg{{% available_from "v1.18.1" anomaly %}} - High anomaly scores (>1) when the data falls outside the expected range, indicating a data range constraint violation (e.g. improperly configured metricsQL query, sensor malfunction, overflows in underlying metrics, etc.). Anomaly scores can be set to a specific value, like
-
max_points_per_query{{% available_from "v1.17.0" anomaly %}} (int): Optional arg, overrides howsearch.maxPointsPerTimeseriesflag{{% available_from "v1.14.1" anomaly %}} impactsvmanomalyon splitting longfit_windowqueries into smaller sub-intervals. This helps users avoid hitting thesearch.maxQueryDurationlimit for individual queries by distributing initial query across multiple subquery requests with minimal overhead. Set less thansearch.maxPointsPerTimeseriesif hittingmaxQueryDurationlimits. If set on a query-level, it overrides the globalmax_points_per_query(reader-level). -
tz{{% available_from "v1.18.0" anomaly %}} (string): this optional argument enables timezone specification per query, overriding the reader’s defaulttz. This setting helps to account for local timezone shifts, such as DST, in models that are sensitive to seasonal variations (e.g.,ProphetModelorOnlineQuantileModel). -
tenant_id{{% available_from "v1.19.0" anomaly %}} (string): this optional argument enables tenant-level separation for queries (e.g.query1to get the data from tenant "0:0",query2- from tenant "1:0"). It works as follows:- if not set, inherits reader-level
tenant_id - if set, overrides reader-level
tenant_id - raises config validation error, if reader-level is not set and query-level is found (mixing of VictoriaMetrics single-node and cluster is prohibited in a single config)
- raises config validation warning, if
writer.tenant_idis not explicitly set tomultitenantwhen reader uses tenants, meaning VictoriaMetrics cluster will be used for data querying. - also raises config validation error if a set of
reader.queriesfor multivariate models has different tenant_ids (meaning tenant data is mixed, and special labels likevm_project_id,vm_account_idwill have ambiguous values)
The recommended approach for using per-query
tenant_ids is to set bothreader.tenant_idandwriter.tenant_idtomultitenant. See this section for more details. Configurations wherereader.tenant_idequalswriter.tenant_idand is notmultitenantare also considered safe, provided there is a single, DISTINCTtenant_iddefined in the reader (either at the reader level or the query level, if set). - if not set, inherits reader-level
Per-query config example
reader:
class: 'vm'
sampling_period: '1m'
max_points_per_query: 10000
data_range: [0, 'inf']
tenant_id: 'multitenant'
# other reader params ...
queries:
ingestion_rate_t1:
expr: 'sum(rate(vm_rows_inserted_total[5m])) by (type) > 0'
step: '2m' # overrides global `sampling_period` of 1m
data_range: [10, 'inf'] # meaning only positive values > 10 are expected, i.e. a value `y` < 10 will trigger anomaly score > 1
max_points_per_query: 5000 # overrides reader-level value of 10000 for `ingestion_rate` query
tz: 'America/New_York' # to override reader-wise `tz`
tenant_id: '1:0' # overriding tenant_id to isolate data
ingestion_rate_t2:
expr: 'sum(rate(vm_rows_inserted_total[5m])) by (type) > 0'
step: '2m' # overrides global `sampling_period` of 1m
data_range: [10, 'inf'] # meaning only positive values > 10 are expected, i.e. a value `y` < 10 will trigger anomaly score > 1
max_points_per_query: 5000 # overrides reader-level value of 10000 for `ingestion_rate` query
tz: 'America/New_York' # to override reader-wise `tz`
tenant_id: '2:0' # overriding tenant_id to isolate data
Config parameters
| Parameter | Example | Description |
|---|---|---|
|
|
|
Name of the class needed to enable reading from VictoriaMetrics or Prometheus. VmReader is the default option, if not specified. |
|
|
See per-query config example above | See per-query config section above |
|
|
|
Datasource URL address |
|
|
|
For VictoriaMetrics Cluster version only, tenants are identified by accountID or accountID:projectID. Starting from v1.16.2, multitenant endpoint is supported, to execute queries over multiple tenants. See VictoriaMetrics Cluster multitenancy docs
|
|
|
1h
|
Frequency of the points returned. Will be converted to /query_range?step=%s param (in seconds). Required since v1.9.0.
|
|
|
|
Performs PromQL/MetricsQL range query |
|
|
|
Absolute or relative URL address where to check availability of the datasource. |
|
|
|
BasicAuth username |
|
|
|
BasicAuth password |
|
|
|
Timeout for the requests, passed as a string |
|
|
|
Verify TLS certificate. If False, it will not verify the TLS certificate.
If True, it will verify the certificate using the system's CA store.
If a path to a CA bundle file (like ca.crt), it will verify the certificate using the provided CA bundle.
|
|
|
|
Path to a file with the client certificate, i.e. client.crt{{% available_from "v1.16.3" anomaly %}}.
|
|
|
|
Path to a file with the client certificate key, i.e. client.key{{% available_from "v1.16.3" anomaly %}}.
|
|
|
|
Token is passed in the standard format with header: Authorization: bearer {token}
|
|
|
|
Path to a file, which contains token, that is passed in the standard format with header: Authorization: bearer {token}{{% available_from "v1.15.9" anomaly %}}.
|
|
|
|
List of strings with series selector. See: Prometheus querying API enhancements |
|
|
|
If True, then query will be performed from the last seen timestamp for a given series. If False, then query will be performed from the start timestamp, based on a schedule period. Defaults to False. Useful for infer stages in case there were skipped infer calls prior to given.
|
|
|
|
It allows overriding the default -search.latencyOffset{{% available_from "v1.15.1" anomaly %}} flag of VictoriaMetrics (30s). The default value is set to 1ms, which should help in cases where sampling_frequency is low (10-60s) and sampling_frequency equals infer_every in the PeriodicScheduler. This prevents users from receiving service - WARNING - [Scheduler [scheduler_alias]] No data available for inference. warnings in logs and allows for consecutive infer calls without gaps. To restore the old behavior, set it equal to your -search.latencyOffset flag value.
|
|
|
|
Optional arg{{% available_from "v1.17.0" anomaly %}} overrides how search.maxPointsPerTimeseries flag{{% available_from "v1.14.1" anomaly %}} impacts vmanomaly on splitting long fit_window queries into smaller sub-intervals. This helps users avoid hitting the search.maxQueryDuration limit for individual queries by distributing initial query across multiple subquery requests with minimal overhead. Set less than search.maxPointsPerTimeseries if hitting maxQueryDuration limits. You can also set it on per-query basis to override this global one.
|
|
|
|
Optional argument{{% available_from "v1.18.0" anomaly %}} specifies the IANA timezone to account for local shifts, like DST, in models sensitive to seasonal patterns (e.g., ProphetModel or OnlineQuantileModel). Defaults to UTC if not set and can be overridden on a per-query basis.
|
|
|
|
Optional argument{{% available_from "v1.18.1" anomaly %}} allows defining valid data ranges for input of all the queries in queries. Defaults to ["-inf", "inf"] if not set and can be overridden on a per-query basis.
|
Config file example:
reader:
class: "vm" # or "reader.vm.VmReader" until v1.13.0
datasource_url: "https://play.victoriametrics.com/"
tenant_id: '0:0'
tz: 'America/New_York'
data_range: [1, 'inf'] # reader-level
queries:
ingestion_rate:
expr: 'sum(rate(vm_rows_inserted_total[5m])) by (type) > 0'
step: '1m' # can override reader-level `sampling_period` on per-query level
data_range: [0, 'inf'] # if set, overrides reader-level data_range
tz: 'Australia/Sydney' # if set, overrides reader-level tz
# tenant_id: '1:0' # if set, overrides reader-level tenant_id
sampling_period: '1m'
query_from_last_seen_timestamp: True # false by default
latency_offset: '1ms'
mTLS protection
vmanomaly supports mutual TLS (mTLS){{% available_from "v1.16.3" anomaly %}} for secure communication across its components, including VmReader, VmWriter, and Monitoring/Push. This allows for mutual authentication between the client and server when querying or writing data to VictoriaMetrics Enterprise, configured for mTLS.
mTLS ensures that both the client and server verify each other's identity using certificates, which enhances security by preventing unauthorized access.
To configure mTLS, the following parameters can be set in the config:
verify_tls: If set to a string, it functions like the-mtlsCAFilecommand-line argument of VictoriaMetrics, specifying the CA bundle to use. Set toTrueto use the system's default certificate store.tls_cert_file: Specifies the path to the client certificate, analogous to the-tlsCertFileargument of VictoriaMetrics.tls_key_file: Specifies the path to the client certificate key, similar to the-tlsKeyFileargument of VictoriaMetrics.
These options allow you to securely interact with mTLS-enabled VictoriaMetrics endpoints.
Example configuration to enable mTLS with custom certificates:
reader:
class: "vm"
datasource_url: "https://your-victoriametrics-instance-with-mtls"
# tenant_id: "0:0" uncomment and set for cluster version
queries:
vm_blocks_example:
expr: 'avg(rate(vm_blocks[5m]))'
step: 30s
sampling_period: 30s
verify_tls: "path/to/ca.crt" # path to CA bundle for TLS verification
tls_cert_file: "path/to/client.crt" # path to the client certificate
tls_key_file: "path/to/client.key" # path to the client certificate key
# additional reader parameters ...
# other config sections, like models, schedulers, writer, ...
Healthcheck metrics
VmReader exposes several healthchecks metrics.