Compare commits

...

97 Commits

Author SHA1 Message Date
Vadim Rutkovsky
f6f2b8b025 deployment/docker/rules: add VMSelectConcurrentQueriesExceedMemoryLimit alert
Warn users when cluster is misconfigured to allow too many concurrent selects
2026-07-01 12:53:03 +02:00
Max Kotliar
8e5ca7e9b0 docs/changelog: fix bugfixs order 2026-07-01 12:32:05 +03:00
Max Kotliar
119ba0fb5b app/vmauth: slow down unauthorized responses to mitigate brute-force attacks (#11195)
Without a delay, an attacker can attempt thousands of authentication requests per second at minimal cost. The only indication of such an attack is the `vmauth_http_request_errors_total{reason="invalid_auth_token"}` metric, which may go unnoticed if it isn't being monitored.

Introduce a random delay of 2–3 seconds before returning 401 Unauthorized responses. This significantly increases the cost of sustaining a brute-force attack while having a negligible impact on legitimate clients, which are not expected to generate unauthorized
requests frequently. 

OWASP Top10 also recommends adding delays on failed auth: https://owasp.org/Top10/2025/A07_2025-Authentication_Failures.

The added delay could itself be abused as part of a DoS attack by tying up request-processing resources. However, such an attack is easier to detect because it generates sustained unauthorized traffic from identifiable source IPs, allowing operators to block or rate-limit the offending clients using specialized tools.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11180

---------

Signed-off-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2026-07-01 12:31:06 +03:00
Benjamin Nichols-Farquhar
c82127b6d4 app/vmagent: Optimize remotewrite hot paths
## Description

We have a simple vmagent setup using `remotewrite.shardByURL` to
horizontally scale stream aggregations, as suggested
[here](https://docs.victoriametrics.com/victoriametrics/vmagent/#sharding-among-remote-storages)
with minimal additional configuration and noticed there was some
optimization opportunities.

## Investigation 

We took a
[pprof](https://github.com/user-attachments/files/28968177/cpu.pprof.zip)
and found that our CPU time was in a few primary places.

1. payload protobuf marshaling 
2. zstd compresssion 
3. Map access in sharding
4. Hashing in sharding
5. Memory copies for timeseries 

## Changes


Zstd compression is pretty intractable(besides using
`remotewrite.vmProtoCompressLevel`, which we already do), but there were
some relatively easy gains for the other paths.

1. inline sov() when l < 128 on marshal. Skips calling a bunch of byte
math in a common case
2. Use slices instead of maps for shard keys, in the common case where
there are less than ~10 keys a slice iteration will be faster
3. use `xxhash.Digest.WriteString` rather than buffering values, avoids
some extra allocations and GC work
4. In the simple case where `insertRows` doesn't modify label names or
values construct the remotewrite request from the passed values instead
of copying them. Avoids a bunch of memory copy and GC work.

## Results

Modest but effective, we saw a ~10% reduction in total cpu cores used by
vmagent with these changes.
<img width="2358" height="709" alt="Screenshot 2026-06-15 at 2 24 43 PM"
src="https://github.com/user-attachments/assets/d2931242-c7ed-447b-9cfd-73b29cd0b893"
/>

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11113
2026-06-30 18:52:03 +02:00
JAYICE
06bc808ddc app/vmagent: support MDX(monitor data exchange) for remote write
This commit adds MDX(Monitoring Data eXchange) feature into vmagent. 
It allows to detect VictoriaMetrics related components and forward metrics only for those instances.

In default state, mdx detects timeseries based on `vm_app_version` metric name and tracks instance based on "job:instance" key. It holds tracked instances in-memory for 1 hour, which must be enough for the most scrape intervals.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10600
2026-06-30 18:31:17 +02:00
Nikolay
a6927c46be follow-up for e3121ad473 (#11162)
Fixes data-race at emwa.go package.
Write and read requests of the shared variable must be guarded by the
mutex lock.
2026-06-30 18:19:59 +02:00
xhe
15a4c31e87 app/vmselect: optimize repeated binary op rollup aggregates
This commit improves performance for sub query requests by properly cache binary operation results.

 For context:

When using binary operator in query, vmselect will executes the left and right expressions in parallel and independently. When the left and right expressions contain the same sub-expression, e.g.

```
count( count(vm_requests_total) by (action,addr,cluster,endpoint) ) by (action,addr,cluster) 
/
count( count(vm_requests_total) by (action,addr,cluster,endpoint) )
```

```count( count(vm_requests_total) by (action,addr,cluster,endpoint) is a sub expression in both left and right side.```

The sub expression will be executed twice (once in each of the left and right expressions). This introduces additional RPC and resource overhead.

 Currently, this feature is hidden behind query request param - `optimize_repeated_binary_op_subexprs`.
Later it could be transited into default behavior.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10575
2026-06-30 17:47:19 +02:00
Andrii Chubatiuk
54f9cd6edd chore(app/vmalert): update TestGroupValidate_Success and TestGroupValidate_Failure tests
This changes is needed to simplify tests logic and reduce diff between OSS and enterprise branches

Related https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11157
2026-06-30 17:44:01 +02:00
Zakhar Bessarab
2e16874e95 app/vmauth: add new default_vm_access_claim field
This commit relaxes JWT token format. Previously  "vm_access" claim was required.
Which may not be a case for some users. For example, if there is no need at request templating.
  The new `jwt` section field `default_vm_access_claim` was added for this purpose.
It's used as fall-back for templating data.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11054

---------

Signed-off-by: Nikolay <nik@victoriametrics.com>
Co-authored-by: f41gh7 <nik@victoriametrics.com>
2026-06-30 17:37:20 +02:00
Phuong Le
81d330f297 lib/httpserver: cancel in-flight requests when graceful shutdown times out
`http.Server.Shutdown()` waits for in-flight requests but never cancels
them, so long-lived ones (e.g. VictoriaLogs live tailing) make graceful
shutdown timeout, and the resulting `logger.Fatalf` -> `os.Exit` skips
the storage flush and loses data. So adding a cancelable `BaseContext`
that is canceled once `-http.maxGracefulShutdownDuration` elapses.

Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/1502

Also notes that VictoriaMetrics query execution is deadline-driven and
ignores ctx, so it's a no-op there.
2026-06-30 17:30:30 +02:00
Raymond Chen
3278ddd170 docs/victoriametrics: fix broken Prometheus WAL remote write blog link
The Grafana blog post about Prometheus 2.8 WAL-based remote write has
moved. The old URL
(`https://grafana.com/blog/2019/03/25/whats-new-in-prometheus-2.8-wal-based-remote-write/`)
now 404s; this updates both occurrences in the docs to the current
location
(`https://grafana.com/blog/whats-new-in-prometheus-2-8-wal-based-remote-write/`),
which returns HTTP 200.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11184

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-30 17:29:41 +02:00
Max Kotliar
cc790c2ea1 deployment: bump Alpine version to v3.24.1 for security updates
See https://www.alpinelinux.org/posts/Alpine-3.24.1-released.html
2026-06-30 16:02:14 +03:00
Max Kotliar
950f38fd6a docs/vmestimator: sync docs 2026-06-30 12:11:27 +03:00
Max Kotliar
ab9db9152f docs/changelog: remove duplicated lines 2026-06-30 11:57:54 +03:00
Max Kotliar
5b89f52c72 docs/vmestimator: sync docs 2026-06-30 11:19:18 +03:00
Max Kotliar
3608ab5b4c docs/vmestimator: fix link 2026-06-29 19:18:08 +03:00
Max Kotliar
5ee1fa70c1 docs/vmestimator: sync docs 2026-06-29 19:12:09 +03:00
Max Kotliar
1b0e843e8f docs/vmestimator: update docs 2026-06-28 11:00:52 +03:00
Max Kotliar
679646a3b3 docs: upd vmestimator docs 2026-06-26 19:01:40 +03:00
Max Kotliar
bb3c038e2f docs: follow-up on 29af0809; fix vmestimator image oversize 2026-06-26 18:48:46 +03:00
Max Kotliar
8f32b6648f docs: follow-up on 29af0809
Attempt to fix image oversize in vmestimator doc
2026-06-26 18:36:12 +03:00
Max Kotliar
df5f11623f docs: publish vmestimator docs
I tried to adopt publish-docs CI job. But it's not possible to add a
content to docs/victoriametrics. CI job from vmestimator overwrite the
whole content.

That's why I'll be copy pasting doc updates from vmestimator repo to VM
manually.
2026-06-26 18:23:29 +03:00
Roman Khavronenko
6c8a41f5ed docs: add missing mentions of new SA output sum_samples_total (#11178)
Follow-up after
16422b2d14

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-06-26 16:49:43 +02:00
Roman Khavronenko
e749a6ce8d docs: update security recommendations (#11136)
- move SBOM page down as general security recommendations must be
mentioned first
- separate httplib security-related flags to a separate section, so it
can be cross-referenced by products that use this lib
- mention that /metrics might require to be protected

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2026-06-26 16:49:14 +02:00
Hui Wang
615176ad55 doc: correct sharding configuration guidance for stream aggregation (#11167)
Rewrite the guidance for `-remoteWrite.shardByURL.labels` and
`-remoteWrite.shardByURL.ignoreLabels` from the perspective of the
aggregation config, rather than the other way around, since users
typically define aggregation rules first and then configure sharding.

And the previous statements are wrong.
For example, I have 3 raw series
```
{env=1, pod=1, cluster=1}
{env=1, pod=2, cluster=1}
{env=2, pod=1, cluster=2}
```

If I have `by: [env]`, then `-remoteWrite.shardByURL.labels` cannot be
set to `env,pod`, because that could route `{env=1, pod=1, cluster=1}`
and `{env=1, pod=2, cluster=1}` to different aggregators, causing each
aggregator to produce its own partial result for `env=1`.
In this case, `-remoteWrite.shardByURL.labels` can only use labels that
are a subset of `by` as `env`.

If I have `without: [env, pod]`, then
`-remoteWrite.shardByURL.ignoreLabels` cannot be set to just `env`,
because that would still allow `{env=1, pod=1, cluster=1}` and `{env=1,
pod=2, cluster=1}` to be routed to different aggregators, causing each
aggregator to produce its own partial result for `cluster=1`.
In this case, `-remoteWrite.shardByURL.ignoreLabels` must include all
labels listed in `without`as `env,pod`.
2026-06-26 16:48:43 +02:00
Cuong Le
3aec167f00 docs/victoriametrics: add an AI policy section to the contributing guide (#11170)
This PR adds a short `AI policy` section to the contributing guide.

We've received quite a few raw AI-generated PRs on VictoriaLogs
recently, so we'd like to clarify our view on AI usage. In short:

1. You can use AI.
2. You are responsible for what you submit, AI or not.
3. Raw AI slop will be closed without review.
2026-06-26 16:48:31 +02:00
Roman Khavronenko
6f633e5654 docs: update vmagent's multitenancy doc based on user's feedback (#11173)
* mention actual api path for opentlemetry in HowToPush docs for
consistency with other protocols;
* update vmagent=>vminsert remoteWrite.url path to prometheus-rw, as it
is the only one that is supported in vmagent/vminsert communication;
* add links to diagrams, so users could easier get to the corresponding
docs;
* add some visual notes to increase awarness of missconfigs

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Pablo Fernandez <46322567+TomFern@users.noreply.github.com>
2026-06-26 16:48:09 +02:00
Fred Navruzov
50a827256a docs/vmanomaly: fill in missing args and links (post v1.29.7 update) (#11165)
Addition of missing links/args and slight refactor of changelog notes
for clarity (post v1.29.7 update)

Follow-up on e30e8be1f4
2026-06-25 10:27:18 +03:00
Fred Navruzov
e30e8be1f4 docs/vmanomaly: update to v1.29.7 (#11163)
Update anomaly detection docs to align with `v1.29.7` release

---------

Signed-off-by: Fred Navruzov <fred-navruzov@users.noreply.github.com>
2026-06-24 21:22:58 +03:00
f41gh7
24ac567a9f fixes tests after dce8193c16 2026-06-24 17:13:04 +02:00
Aliaksandr Valialkin
dce8193c16 vendor: update github.com/VictoriaMetrics/VictoriaLogs from v1.121.1-0.20260616132739-c901a1e31cb3 to v1.51.1-0.20260624061259-dc94972a8708 2026-06-24 09:40:23 +02:00
Immanuel Tikhonov
e196479fb2 apptest: fix invalid go test command in README (#1484)
apptest/README.md points to `go test ./app/apptest`, but that dir is not
in this repo. Fresh clone, instant fail, kinda rough.

This switches it to `go test ./apptest/...` and adds a short note that
`go test ./...` needs the test binaries built first.

How to repro:
1. Fresh clone the repo.
2. Run `go test ./app/apptest`.
3. See `stat .../app/apptest: directory not found`.

---------

Signed-off-by: immanuwell <pchpr.00@list.ru>
Signed-off-by: Immanuel Tikhonov <122638311+immanuwell@users.noreply.github.com>
Co-authored-by: Phuong Le <39565248+func25@users.noreply.github.com>
2026-06-24 08:09:16 +02:00
Hui Wang
1c774564a2 stream aggregation: improve rate_xx results with out of order samples (#11140)
Previously, there could be an unexpected increase in `rate` if an
out-of-order sample was ingested after the previous flush.
Before:
<img width="1503" height="835" alt="image"
src="https://github.com/user-attachments/assets/3fb05f3e-51af-4c5c-989a-ec3da089fc23"
/>
After:
<img width="2546" height="924" alt="image"
src="https://github.com/user-attachments/assets/db8d8b0e-bbc0-4927-947a-713fc1fb4c5f"
/>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2026-06-23 12:46:50 +02:00
Max Kotliar
e1c554d4a6 docs/changelog: follow-up on 3419328f
Follow-up on
3419328f1c
2026-06-23 11:48:23 +03:00
Zhu Jiekun
3419328f1c app/vmselect: propagate cache reset operation to selectNode (#11118)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11112
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11118

Propagate cache reset operation to `-selectNode` when `/internal/resetRollupResultCache` is called. Previously, the propagation only happened when `delete_series` API was called.

To avoid an infinite loop, the propagation happens only when `propagate=1` query arg is set.

Note that if `-search.resetCacheAuthKey` is configured. It will be used as `authKey` query arg while propagating requests to other select instances.

---------

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-23 11:17:02 +03:00
Max Kotliar
e841e45877 dashboards: fix typo "Sey major ..." -> "See major ..."
The credit goes to @marco-m who proposed the change in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11126.
2026-06-23 11:01:20 +03:00
Max Kotliar
d53d8849e7 lib/promrelabel: hide relabel debug cross-link when no target id is present (#11143)
The cross-link between `/metric-relabel-debug` and `/target-relabel-debug` pages has always been rendered, regardless of whether a target `id` query param was present or not. The target ID is responsible for preloading either metrics or the target relabeling config.

The commit hides the cross-link when no target ID is present. Without the ID, pages are essentially the same, so there is no need to link them. It should reduce user confusion.
2026-06-23 10:53:44 +03:00
Max Kotliar
d3641394d9 app/vmctl: fix edit link in vmctl's README 2026-06-22 18:01:23 +03:00
Pablo (Tomas) Fernandez
53a8f4bd47 docs: add docs-check-links target to docs Makefile (#11121)
We have a `check-links` target in the vmdocs repository. This is a good
start but it only detects broken links after changes in docs are merged
into master in this repository.

It would be nice to have a way to check the documentation in this
repository (before creating a PR for instance). This PR introduces a
`docs-check-links` target to the docs/Makefile. This lets us see if we
have any new broken links in the current branch/PR before merging into
main.

Pros:
- We can preview changes in docs before merging to master
- No new tools/big changes required, only add a new target to the
Makefile
- Once we fix all links, we could add a GH Actions check

Cons:
- You need access to the vmdocs repo for this target to work
- Running the check in GH Actions could be potentially complex because
it needs to clone vmdocs, build the container, etc

If this seems like a good idea, I could add the same check to the other
repos (VL, VT, etc).

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11121
2026-06-22 16:10:30 +03:00
Max Kotliar
2b256952c9 go.mod: update metricsql pacakge
v0.87.2 reverts range() function support (PR:
https://github.com/VictoriaMetrics/metricsql/pull/76)

Reason given in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11028#issuecomment-4728207220
2026-06-22 16:06:51 +03:00
Aliaksandr Valialkin
12086e75de docs/victoriametrics/goals.md: typo fix: replace the dot with a comma 2026-06-22 13:54:40 +02:00
f41gh7
d426575622 docs: update flags with actual v1.146.0 binaries
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-22 13:12:38 +02:00
f41gh7
a76b1ce0e3 docs: bump version to v1.146.0
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-22 13:10:24 +02:00
f41gh7
5f49fb7f31 deplyoment/docker: bump version to v1.146.0
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-22 13:09:55 +02:00
f41gh7
80d1104fca update changelog
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-22 13:08:23 +02:00
f41gh7
ae59c2624c docs: forward port LTS v1.122.25 changelog to upstream
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-22 13:06:02 +02:00
f41gh7
4661f69d9f docs: forward port LTS v1.136.12 changelog to upstream
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-22 13:05:43 +02:00
f41gh7
4d9901fbf4 docs/changelog: cut release v1.146.0
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-19 14:19:09 +02:00
f41gh7
9356c2111a docs: update version to v1.146.0
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-19 14:18:31 +02:00
f41gh7
45f0b87150 app/vmselect: run make vmui-update
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-19 14:16:01 +02:00
Nikolay
8480f6b43e app/vmagent: prioritise recently ingested data to the queue
This commit adds a new flag remoteWrite.inmemoryQueues, which starts a dedicated set of workers for processing only in-memory part of vmagent persistentqueue. It should help to mitigate an
issue when stale data at file-based queue prevents from ingestion recent data.

 There is a downside - in case of remote storage is not reachable,
it's possible to get only a part of data ingested and other part queued
into file-based queue. But it should be acceptable, because there is no
strong guarantees for the data ingestion order.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8833
2026-06-19 13:47:49 +02:00
f41gh7
61668f0672 changelog: sort entries
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2026-06-19 13:23:01 +02:00
f41gh7
d1ebbf573c vendor: update metrics to v1.44.0
fixes https://github.com/VictoriaMetrics/metrics/issues/127
2026-06-19 13:16:31 +02:00
Hui Wang
16422b2d14 lib/streamagrr: add sum_sample_total output function
Add `sum_sample_total` aggregation function to sums input delta values
into a cumulative
[counter](https://docs.victoriametrics.com/victoriametrics/keyconcepts/index.html#counter)
and outputs the result at the given `interval`.
`sum_samples_total` makes sense only for aggregating delta values from
clients such as [StatsD
counter](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#counting).

>Note: The aggregator will forget the cumulative counter if it has not
seen input samples for `staleness_interval`(set to `interval` by
default) per output result, so the output counter will start from `0`
the next time it sees the input again. Increase the `staleness_interval`
option if you want to extend the window to tolerate bigger gaps.

Fixes:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11002
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4843.
2026-06-19 13:11:36 +02:00
Roman Khavronenko
0f1ca87611 app/vmselect: skip caching empty responses for tenants discovery
Before, empty tenant response was cached for 5min. Making all subsequent
read queries to return empty response, even if data for the tenants was
properly ingested.

With this change empty response won't be cached.
This change is important for integration tests, where reads and writes
are performed one after another.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10982/
2026-06-19 10:04:58 +02:00
Alexander Frolov
0dd2b2cee6 app/vmselect: fix tenant filtering with long tenant regexp
When tenant filter is a long regexp, its content can be replaced with
`...`, causing tenants to be matched incorrectly.

`applyFiltersToTenants` converts tag filters using `tagFiltersToString`

c497c8c2e9/app/vmselect/netstorage/tenant_filters.go (L106-L112)

Which uses human-readable representation of `TagFilter`

c497c8c2e9/lib/storage/search.go (L390-L397)

This way I can see results from unexpected tenants. See the test, which
fails with
```
--- FAIL: TestApplyFiltersToTenants (0.00s)
    tenant_filters_test.go:18: unexpected tenants result; got [{100 0} {116 0} {1239 0}]; want [{100 0} {108 0} {116 0}]
```

---------

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11096
2026-06-19 09:55:04 +02:00
Hui Wang
7caec5fcb4 lib/streamaggr: expose vm_streamaggr_dedup_dropped_samples_total to track deduplicated samples
Currently, there is no way to determine how many samples are dropped
during
https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication,
and deduplication can also drop out-of-order samples if configured,
without being captured by
vm_streamaggr_ignored_samples_total{reason="too_old"}.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11069
2026-06-19 09:49:09 +02:00
Alexander Frolov
612f8ac8d6 app/vmselect: escape metadata MetricFamilyName
Follow-up for:
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11115
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11120
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11128

Under normal circumstances this is not necessary to escape
`MetricFamilyName`.
I have noticed a few cases when vmselect produced malformed JSON
response due to corrupted `MetricFamilyName` value.

On the other hand, there are pros in keeping it as is — there is a
better chance that the issue would be noticed and reported.
2026-06-19 09:37:49 +02:00
Immanuel Tikhonov
6aa31a09d7 app/vmctl: fix typo in missing-field influx error
`vmctl influx` returns `response doesn't contain filed "value"` when the
response misses the requested field column.

Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10892
2026-06-19 09:36:25 +02:00
Hui Wang
b6e6a50e29 lib/streamaggr: reduce the default staleness interval
Reduce the default staleness_interval from `2*rule_interval` to
`1*rule_interval`, so the lookbehind range in stream aggregation is more
consistent with metricsQL query. Also add a stale sample check during
sample push in case flush hasn't cleaned it in time.

Fixes fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11102
2026-06-19 09:35:22 +02:00
Nikolay
a6d48b6af3 lib/storage: fixes typo at vm_retention_filters_partitions_scheduled_rows
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11138
2026-06-19 09:09:38 +02:00
Max Kotliar
dc4cf5631b follow-up on b3054bba - remove .codex
Follow-up on
b3054bbadd
2026-06-18 23:36:03 +03:00
Max Kotliar
005f133146 follow-up on 710c920d - remove leftovers
Follow-up on
710c920d60
2026-06-18 18:08:29 +03:00
Max Kotliar
35fc595e6f docs/changelog: chore changelog messages 2026-06-18 13:39:34 +03:00
Max Kotliar
710c920d60 app/vmrestore: disallow restoring parts outside the configured -storageDataPath directory (#1051)
A specifically crafted backup source (compromised S3/GCS/Azure bucket) could use `..`
components in object names to write files outside the `-storageDataPath`
directory.

Fix by validating all source parts against the destination directory
before restore begins (restore.go), and adding a defense-in-depth panic
guard in `NewDirectWriteCloser` (fslocal.go).

PR https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/1051

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2026-06-18 13:21:12 +03:00
JAYICE
0ceeb14076 app/vmagent: support sharding targets by label in vmagent cluster mode (#11114)
Previously, targets were sharded among `vmagent` instances by all target labels after relabeling. The commit adds `-promscrape.cluster.shardByLabels` optional flag to shard targets by specified labels.

For example, with `-promscrape.cluster.shardByLabels=service`, the targets with the same `service` label value will be scraped by the same `vmagent` instance, 
which is useful when performing stream aggregation (drop pod label) that requires all metrics with the same `service` label value to be processed on the same `vmagent` instance. 

If none of the specified labels are present in the target labels, then all target labels will be used for sharding.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11044
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11114
2026-06-18 13:13:06 +03:00
Zasda Yusuf Mikail
adc29732f9 app/vmctl: push metrics before exiting on error (#11081)
Call `pushmetrics.StopAndPush()` before `vmctl` exits on `app.Run` errors

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11081

Signed-off-by: Zasda Mikail <zasdaym@gmail.com>
2026-06-18 12:55:25 +03:00
Aliaksandr Valialkin
41ffe23b18 docs/victoriametrics/Articles.md: add https://xata.io/blog/how-we-rebuilt-postgresql-branch-metrics-on-victoriametrics-per-cell 2026-06-18 00:19:21 +02:00
Fred Navruzov
6229a8fe7d docs/vmanomaly: release v1.29.6 (#11132)
Update vmanomaly docs (/anomaly-detection) for patch release v1.29.6

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11132
2026-06-17 20:09:17 +03:00
Max Kotliar
b58c73ac90 app/vmauth: refactor oidc discovery pool creation (#11123)
Initializing `oidcDP` outside or before `parseJWTUsers` would simplify its reuse with SSO implementation https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11122

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11123
2026-06-17 13:52:39 +03:00
Nikolay
77efbb2e36 lib/fs: retry file deletion on NFS error
This commit adds additional file removal retries.
This is follow-up for
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9842.

According to the stack trace, NFS also could return error for
`deleteFilePath`.
This retry is safe, since part already removed from `parts.json` and
performs a skips empty directories on restart.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11060
2026-06-17 09:16:42 +02:00
Alexander Frolov
e388e41430 app/vmagent: properly copy metadata unit value
Previously, Unit value at metrics metadata was incorrectly referenced instead of memory copy:

ed795a8443/app/vmagent/remotewrite/pendingseries.go (L209-L212)

 This commit properly copies Unit into dedicated buffer and prevents memory corruption.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11120
2026-06-17 09:16:04 +02:00
Max Kotliar
ed795a8443 lib/httpserver: add comments and links to tlsErrorSkipLogger
Follow-up on
64e43e59a7

See discussion in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10538#issuecomment-4708196302
2026-06-16 18:39:21 +03:00
f41gh7
94e5955b1f make vendor-update 2026-06-16 17:01:18 +02:00
Phuong Le
5b31a047a5 lib/mergeset: make the flushCallback interval configurable
Add a configurable `flushCallbackInterval` to `mergeset.MustOpenTable`
instead of the hardcoded 10s flushCallback ticker. This lets
VictoriaLogs lower it to 1s so filter-cache invalidation keeps up with
live tailing,

See https://github.com/VictoriaMetrics/VictoriaLogs/issues/1477.
2026-06-16 15:04:39 +02:00
f41gh7
17c3fb5656 app/{vmalert,vmauth}: follow-up for 8b27a36fb5
Add additional graceful shutdown check to prevent possible race between
 configCheckInterval ticker and globalStopChannel close actions.
2026-06-16 14:56:44 +02:00
f41gh7
30133ec182 docs/changelog: update changelog entry 2026-06-16 14:37:15 +02:00
Nicholas Feinberg
8b27a36fb5 lib/promscrape: fix shutdown race
In vminsert and vmagent, on shutdown, the `globalStopCh` channel is
closed. This is intended to cause the `runScraper` loop to exit.

However, nothing gave the `globalStopCh` priority over the input
channels, `sighupCh` or `tickerCh`. This causes unpredictable behavior.
For example, if `configCheckInterval` is set to 5s but `loadConfig`
takes 6s, then there will be a new message on `tickerCh `for every
iteration of the `runScraper` loop. Go will choose one channel to pull
from at random, meaning that shutdown can take arbitrarily long.

To address this, check for `globalStopCh` *first* at the start of each
loop, ensuring that it will be reached at most one loop iteration after
the shutdown starts.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11107/
2026-06-16 14:35:00 +02:00
Alexander Frolov
f33cd8a937 app/vmselect: properly unmarshal metadata response
Previusly `readMetadataRows` incorrectly re-used dataBuf for `metricsmetadata.Row.Unmarshal` call. Which is not safe, since unmarshalled row takes ownership of dataBuf and it cannot be mutated.

 This commit removes dataBuf and allocates a new buffer each time for row.Unmarshal.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11115
2026-06-16 14:06:43 +02:00
Yury Moladau
615e49c983 app/vmui: add last value to graph legend stats (#11110)
Add `last` value to VMUI graph legend statistics.

`last` is intentionally shown only in the legend, not in the tooltip.
The tooltip has limited space and already shows the value for the
hovered timestamp, so the latest value can still be checked by hovering
over the last point.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10759
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11110

<img width="371" height="60" alt="image"
src="https://github.com/user-attachments/assets/b57ed7e3-a6a1-44ac-ab60-89911f71eb6b"
/>


<!-- This is an auto-generated description by cubic. -->
<a
href="https://cubic.dev/pr/VictoriaMetrics/VictoriaMetrics/pull/11110?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>
<!-- End of auto-generated description by cubic. -->

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-15 21:45:33 +03:00
Max Kotliar
eb1b4c6df4 fix golangci-lint errorcheck issues
Follow up on
1df805e23b
2026-06-15 20:40:26 +03:00
Rudransh Shrivastava
ca71127158 .github: add explicit persist-credentials for checkout action (#11085)
Add explicit `persist-credentials` as it's `true` by default.

cc: @arkid15r

Signed-off-by: Rudransh Shrivastava <rudransh@victoriametrics.com>
2026-06-15 20:05:17 +03:00
Phuong Le
1df805e23b .github: enable golangci-lint errorlint and fix the remaining error wrapping (#11086)
Enables `errorlint` in golangci-lint and fixes all the findings. 

Related to https://github.com/VictoriaMetrics/VictoriaLogs/pull/1490.
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11086
2026-06-15 19:30:53 +03:00
Zhu Jiekun
dfc459eb38 dics: add testing section to contributing guide (#11082)
Fixes https://github.com/VictoriaMetrics/VictoriaLogs/issues/1485
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11082

---------

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
2026-06-15 17:23:48 +03:00
Max Kotliar
83ef694e9c docs/changelog: add links to bugfixes 2026-06-15 15:53:44 +03:00
Max Kotliar
f6830298dc docs/changelog: add link to PR into feature 2026-06-15 15:53:44 +03:00
Max Kotliar
f16bcb1355 app/vmctl: add headers and bearer token flags for vm import destination (#11089)
Commit adds `--vm-headers` and `--vm-bearer-token` flags. The flags are
added to vmctl sub-commands: opentsdb, influx, remote-read, prometheus, mimir, thanos. vm-native sub-command already supports similar flags. The flags are useful when vmctl imports data to a VictoriaMetrics instance
protected by authentication.

Previously, to work around this limitation, a vmagent with remote write auth configured had to be spun up, and vmctl would write via it. This change allows vmctl to write directly.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8897
PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11089
2026-06-15 15:45:13 +03:00
Aliaksandr Valialkin
22802101e0 lib/timeutil: allow parsing time values below 1970 year at ParseTimeAt() and ParseTimeMsec()
Negative timestamps are supported by VictoriaLogs and VictoriaTraces.
2026-06-15 13:20:17 +02:00
Aliaksandr Valialkin
00420e16f9 lib/vmalertproxy: extract the common code for proxying requests to vmalert
This code is going to be used by single-node VictoriaMetrics, by vmselect,
by VictoriaLogs and by VictoriaTraces.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1739
2026-06-15 10:59:49 +02:00
Andrii Chubatiuk
6c3c548ddb lib/backup/fsremote: don't fail while listing absent directory
while starting vmbackupmanager with `fs://` destination it's required to create target directories manually.
Ignoring error if target directory is absent to align behavior with remote destinations.
2026-06-15 09:41:10 +02:00
Roman Khavronenko
d52de359d5 app/vmselect: log calls to /api/v1/admin/tsdb/delete_series
The log message will display when deletion API was called, how many
series it deleted and what params were used. This should help
identifying events of metrics deletion.

Example:
```
2026-06-12T13:02:28.006Z        info    VictoriaMetrics/app/vmselect/prometheus/prometheus.go:529       /api/v1/admin/tsdb/delete_series has been called for "[{__name__=\"vm_http_request_errors_total\"}]". Deleted 0 series.
```
2026-06-15 09:07:39 +02:00
Zasda Yusuf Mikail
892f4aced2 lib/httpserver: allow disabling server hostname header
When responding to an HTTP request, VictoriaMetrics components include the X-Server-Hostname. 
While this may be useful for debugging, it also leaks the hostname.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11067
2026-06-15 09:05:36 +02:00
Andrii Chubatiuk
05903c8acd chore(lib/streamaggr): move increase and increase_prometheus outputs to a separate struct (#11093)
move increase and increase_prometheus outputs to a separate struct
to simplify the code. 

Before, increase, increase_prometheus, total_prometheus 
were implemented within one aggregator making it harder to use or update it.
2026-06-12 11:47:55 +02:00
Pablo (Tomas) Fernandez
a9fae230ae docs: Update "Multi Retention Setup within VictoriaMetrics Cluster" Guide (#11055)
This PR updates the [Multi Retention Setup within VictoriaMetrics
Cluster](https://docs.victoriametrics.com/guides/guide-vmcluster-multiple-retention-setup/)
guide.

Changes:
- Rewrote the introduction and added an overview
- Added step-by-step example setup for Kubernetes
- Expanded on alternative ways to route traffic

I've tested the configuration on a K3s cluster and it works, but I don't
know if the way I routed traffic into the retention tiers makes sense,
so any feedback is very much appreciated.

---------

Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2026-06-12 11:46:26 +02:00
Fred Navruzov
19fac13418 docs/vmanomaly: release v1.29.5 (#11097)
### Description

- Documentation updates to reflect v1.29.5 vmanomaly release changes
- Documentation/internal logic alignment in argument descriptions (such
as `decay` arg for models)
2026-06-11 22:49:30 +03:00
Artem Fetishev
3a6054f8a2 vmsingle: refactor vmstorage type (#11076)
Move vmstorage_single_node.go code into vmstorage.go. This way the diff
between master and cluster branches is easier to view.

This change is for single node only. The change must not be
cherry-picked to cluster branches.

Follow-up for 89db66573b (#11046)

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-06-11 16:19:24 +02:00
Artem Fetishev
6653f6a5e7 apptest: rename client fields to generic cli (#11088)
Initially, the fields were given a different name because I mistakenly
believed that the fields with the same name will clash when the
corresponding types are embedded in some other type (for example,
vmselectClient and vminsertClient are embedded into vmsingle).

But it appears not to be the case. Thus, changing the field name to
something more generic.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-06-10 17:01:44 +02:00
900 changed files with 55397 additions and 21803 deletions

0
.codex
View File

View File

@@ -66,6 +66,8 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Setup Go
id: go

View File

@@ -17,6 +17,7 @@ jobs:
with:
# needed for proper diff
fetch-depth: 0
persist-credentials: false
- name: 'Validate that changelog changes are under ## tip'
run: |

View File

@@ -15,6 +15,7 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # we need full history for commit verification
persist-credentials: false
- name: Check commit signatures
run: |

View File

@@ -18,6 +18,8 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Setup Go
id: go

View File

@@ -32,6 +32,8 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Set up Go
id: go

View File

@@ -21,6 +21,7 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
path: __vm
persist-credentials: false
- name: Checkout private code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -28,6 +29,7 @@ jobs:
repository: VictoriaMetrics/vmdocs
token: ${{ secrets.VM_BOT_GH_TOKEN }}
path: __vm-docs
persist-credentials: true
- name: Import GPG key
uses: crazy-max/ghaction-import-gpg@2dc316deee8e90f13e1a351ab510b4d5bc0c82cd # v7.0.0

View File

@@ -35,6 +35,8 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Setup Go
id: go
@@ -78,6 +80,8 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Setup Go
id: go
@@ -103,6 +107,8 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Setup Go
id: go

View File

@@ -33,6 +33,8 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Cache node_modules
id: cache

View File

@@ -1,9 +1,18 @@
version: "2"
linters:
enable:
- errorlint
settings:
errcheck:
exclude-functions:
- (net/http.ResponseWriter).Write
errorlint:
errorf: true
# Do not enable `comparison` and `asserts`: they produce false positives,
# since many call sites intentionally compare sentinel errors directly (e.g. err == io.EOF)
# when the producer is documented to return them unwrapped. See https://github.com/VictoriaMetrics/VictoriaLogs/pull/1490
comparison: false
asserts: false
exclusions:
generated: lax
presets:

View File

@@ -35,6 +35,9 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
}
func insertRows(at *auth.Token, timeseries []prompb.TimeSeries, mms []prompb.MetricMetadata, extraLabels []prompb.Label) error {
if len(extraLabels) == 0 && !prommetadata.IsEnabled() && at == nil {
return insertRowsFast(at, timeseries)
}
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -102,3 +105,17 @@ func insertRows(at *auth.Token, timeseries []prompb.TimeSeries, mms []prompb.Met
rowsPerInsert.Update(float64(rowsTotal))
return nil
}
func insertRowsFast(at *auth.Token, timeseries []prompb.TimeSeries) error {
rowsTotal := 0
for i := range timeseries {
rowsTotal += len(timeseries[i].Samples)
}
wr := &prompb.WriteRequest{Timeseries: timeseries}
if !remotewrite.TryPush(at, wr) {
return remotewrite.ErrQueueFullHTTPRetry
}
rowsInserted.Add(rowsTotal)
rowsPerInsert.Update(float64(rowsTotal))
return nil
}

View File

@@ -187,7 +187,7 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
return c
}
func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
func (c *client) init(argIdx int, sanitizedURL string) {
limitReached := metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_rate_limit_reached_total{url=%q}`, c.sanitizedURL))
if bytesPerSec := rateLimit.GetOptionalArg(argIdx); bytesPerSec > 0 {
logger.Infof("applying %d bytes per second rate limit for -remoteWrite.url=%q", bytesPerSec, sanitizedURL)
@@ -204,11 +204,20 @@ func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
c.sendDuration = metrics.GetOrCreateFloatCounter(fmt.Sprintf(`vmagent_remotewrite_send_duration_seconds_total{url=%q}`, c.sanitizedURL))
metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_queues{url=%q}`, c.sanitizedURL), func() float64 {
return float64(concurrency)
})
for range concurrency {
c.wg.Go(c.runWorker)
workers := queues.GetOptionalArg(argIdx)
if workers <= 0 {
workers = 1
}
inmemoryWorkers := inmemoryQueues.GetOptionalArg(argIdx)
for range inmemoryWorkers {
c.wg.Go(func() {
c.runWorker(c.fq.MustReadInMemoryBlockBlocking)
})
}
for range workers {
c.wg.Go(func() {
c.runWorker(c.fq.MustReadBlock)
})
}
logger.Infof("initialized client for -remoteWrite.url=%q", c.sanitizedURL)
}
@@ -302,12 +311,12 @@ func getAWSAPIConfig(argIdx int) (*awsapi.Config, error) {
return cfg, nil
}
func (c *client) runWorker() {
func (c *client) runWorker(readBlock func(dst []byte) ([]byte, bool)) {
var ok bool
var block []byte
ch := make(chan bool, 1)
for {
block, ok = c.fq.MustReadBlock(block[:0])
block, ok = readBlock(block[:0])
if !ok {
return
}

View File

@@ -209,13 +209,12 @@ func (wr *writeRequest) tryPushMetadata(mms []prompb.MetricMetadata) bool {
func (wr *writeRequest) copyMetadata(dst, src *prompb.MetricMetadata) {
// Direct copy for non-string fields, which are safe by value.
dst.Type = src.Type
dst.Unit = src.Unit
dst.AccountID = src.AccountID
dst.ProjectID = src.ProjectID
// Pre-allocate memory for all string fields.
neededBufLen := len(src.MetricFamilyName) + len(src.Help)
neededBufLen := len(src.MetricFamilyName) + len(src.Help) + len(src.Unit)
bufLen := len(wr.metadatabuf)
wr.metadatabuf = slicesutil.SetLength(wr.metadatabuf, bufLen+neededBufLen)
buf := wr.metadatabuf[:bufLen]
@@ -230,6 +229,11 @@ func (wr *writeRequest) copyMetadata(dst, src *prompb.MetricMetadata) {
buf = append(buf, src.Help...)
dst.Help = bytesutil.ToUnsafeString(buf[bufLen:])
// Copy Unit
bufLen = len(buf)
buf = append(buf, src.Unit...)
dst.Unit = bytesutil.ToUnsafeString(buf[bufLen:])
wr.metadatabuf = buf
}

View File

@@ -12,19 +12,18 @@ import (
"sync/atomic"
"time"
"github.com/cespare/xxhash/v2"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/consistenthash"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/mdx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
@@ -66,6 +65,9 @@ var (
queues = flagutil.NewArrayInt("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. "+
"Default value depends on the number of available CPU cores. It should work fine in most cases since it minimizes resource usage")
inmemoryQueues = flagutil.NewArrayInt("remoteWrite.inmemoryQueues", 0, "The number of additional workers per each -remoteWrite.url, which send only recently ingested data from the in-memory queue, "+
"while the file-based queue at -remoteWrite.tmpDataPath is drained by workers configured via -remoteWrite.queues. "+
"This reduces delivery lag for fresh samples when the file-based queue contains a backlog accumulated during remote storage outages.")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
maxPendingBytesPerURL = flagutil.NewArrayBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
@@ -103,6 +105,9 @@ var (
"cannot be pushed into the configured -remoteWrite.url systems in a timely manner. See https://docs.victoriametrics.com/victoriametrics/vmagent/#disabling-on-disk-persistence")
disableMetadataPerURL = flagutil.NewArrayBool("remoteWrite.disableMetadata", "Whether to disable sending metadata to the corresponding -remoteWrite.url. "+
"By default, metadata sending is controlled by the global -enableMetadata flag")
enableMdx = flagutil.NewArrayBool("remoteWrite.mdx.enable", "Whether to only retain metrics from VictoriaMetrics services before sending them to the corresponding -remoteWrite.url. "+
"Please see https://docs.victoriametrics.com/victoriametrics/vmagent/#monitoring-data-exchange")
)
var (
@@ -159,8 +164,8 @@ func InitSecretFlags() {
}
var (
shardByURLLabelsMap map[string]struct{}
shardByURLIgnoreLabelsMap map[string]struct{}
shardByURLLabelsFilter []string
shardByURLIgnoreLabelsFilter []string
)
// Init initializes remotewrite.
@@ -207,8 +212,8 @@ func Init() {
logger.Fatalf("-remoteWrite.shardByURL.labels and -remoteWrite.shardByURL.ignoreLabels cannot be set simultaneously; " +
"see https://docs.victoriametrics.com/victoriametrics/vmagent/#sharding-among-remote-storages")
}
shardByURLLabelsMap = newMapFromStrings(*shardByURLLabels)
shardByURLIgnoreLabelsMap = newMapFromStrings(*shardByURLIgnoreLabels)
shardByURLLabelsFilter = slices.Clone(*shardByURLLabels)
shardByURLIgnoreLabelsFilter = slices.Clone(*shardByURLIgnoreLabels)
initLabelsGlobal()
@@ -304,6 +309,10 @@ func initRemoteWriteCtxs(urls []string) {
}
fs.RegisterPathFsMetrics(*tmpDataPath)
if slices.Contains(*enableMdx, true) && *shardByURL {
logger.Fatalf("-remoteWrite.mdx.enable and -remoteWrite.shardByURL cannot be set to true simultaneously.")
}
if *shardByURL {
consistentHashNodes := make([]string, 0, len(urls))
for i, url := range urls {
@@ -695,18 +704,18 @@ func shardAmountRemoteWriteCtx(tssBlock []prompb.TimeSeries, shards [][]prompb.T
for _, ts := range tssBlock {
hashLabels := ts.Labels
if len(shardByURLLabelsMap) > 0 {
if len(shardByURLLabelsFilter) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLLabelsMap[label.Name]; ok {
if slices.Contains(shardByURLLabelsFilter, label.Name) {
hashLabels = append(hashLabels, label)
}
}
tmpLabels.Labels = hashLabels
} else if len(shardByURLIgnoreLabelsMap) > 0 {
} else if len(shardByURLIgnoreLabelsFilter) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLIgnoreLabelsMap[label.Name]; !ok {
if !slices.Contains(shardByURLIgnoreLabelsFilter, label.Name) {
hashLabels = append(hashLabels, label)
}
}
@@ -807,34 +816,26 @@ var (
// it omits the '=' separator between label name and value for backward compatibility.
// Changing it would re-shard all series across remoteWrite targets.
func getLabelsHashForShard(labels []prompb.Label) uint64 {
bb := labelsHashBufPool.Get()
b := bb.B[:0]
var d xxhash.Digest
d.Reset()
for _, label := range labels {
b = append(b, label.Name...)
b = append(b, label.Value...)
_, _ = d.WriteString(label.Name)
_, _ = d.WriteString(label.Value)
}
h := xxhash.Sum64(b)
bb.B = b
labelsHashBufPool.Put(bb)
return h
return d.Sum64()
}
func getLabelsHash(labels []prompb.Label) uint64 {
bb := labelsHashBufPool.Get()
b := bb.B[:0]
var d xxhash.Digest
d.Reset()
for _, label := range labels {
b = append(b, label.Name...)
b = append(b, '=')
b = append(b, label.Value...)
_, _ = d.WriteString(label.Name)
_, _ = d.WriteString("=")
_, _ = d.WriteString(label.Value)
}
h := xxhash.Sum64(b)
bb.B = b
labelsHashBufPool.Put(bb)
return h
return d.Sum64()
}
var labelsHashBufPool bytesutil.ByteBufferPool
func logSkippedSeries(labels []prompb.Label, flagName string, flagValue int) {
select {
case <-logSkippedSeriesTicker.C:
@@ -859,6 +860,7 @@ type remoteWriteCtx struct {
sas atomic.Pointer[streamaggr.Aggregators]
deduplicator *streamaggr.Deduplicator
mdxFilter *mdx.Filter
streamAggrKeepInput bool
streamAggrDropInput bool
@@ -873,6 +875,7 @@ type remoteWriteCtx struct {
rowsPushedAfterRelabel *metrics.Counter
rowsDroppedByRelabel *metrics.Counter
mdxRowsPreserved *metrics.Counter
pushFailures *metrics.Counter
metadataDroppedOnPushFailure *metrics.Counter
@@ -906,7 +909,8 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, sanitizedURL string)
}
isPQDisabled := disableOnDiskQueue.GetOptionalArg(argIdx)
queuesSize := queues.GetOptionalArg(argIdx)
inmemoryQueueSize := inmemoryQueues.GetOptionalArg(argIdx)
queuesSize := queues.GetOptionalArg(argIdx) + inmemoryQueueSize
if queuesSize > maxQueues {
queuesSize = maxQueues
} else if queuesSize <= 0 {
@@ -923,7 +927,13 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, sanitizedURL string)
if maxInmemoryBlocks < 2 {
maxInmemoryBlocks = 2
}
fq := persistentqueue.MustOpenFastQueue(queuePath, sanitizedURL, maxInmemoryBlocks, maxPendingBytes, isPQDisabled)
fqOpts := persistentqueue.OpenFastQueueOpts{
MaxInmemoryBlocks: maxInmemoryBlocks,
MaxPendingBytes: maxPendingBytes,
IsPQDisabled: isPQDisabled,
PrioritizeInmemoryData: inmemoryQueueSize > 0,
}
fq := persistentqueue.MustOpenFastQueueWithOpts(queuePath, sanitizedURL, fqOpts)
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_pending_data_bytes{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetPendingBytes())
})
@@ -936,6 +946,9 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, sanitizedURL string)
}
return 0
})
metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_queues{url=%q}`, sanitizedURL), func() float64 {
return float64(queuesSize)
})
var c *client
switch remoteWriteURL.Scheme {
@@ -944,7 +957,7 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, sanitizedURL string)
default:
logger.Fatalf("unsupported scheme: %s for remoteWriteURL: %s, want `http`, `https`", remoteWriteURL.Scheme, sanitizedURL)
}
c.init(argIdx, queuesSize, sanitizedURL)
c.init(argIdx, sanitizedURL)
// Initialize pss
sf := significantFigures.GetOptionalArg(argIdx)
@@ -959,7 +972,6 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, sanitizedURL string)
for i := range pss {
pss[i] = newPendingSeries(fq, &c.useVMProto, sf, rd)
}
rwctx := &remoteWriteCtx{
idx: argIdx,
fq: fq,
@@ -976,6 +988,16 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL *url.URL, sanitizedURL string)
}
rwctx.initStreamAggrConfig()
if enableMdx.GetOptionalArg(argIdx) {
mdxFilter := mdx.NewFilter()
rwctx.mdxFilter = mdxFilter
rwctx.mdxRowsPreserved = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_mdx_rows_preserved_total{path=%q,url=%q}`, queuePath, sanitizedURL))
_ = metrics.NewGauge(fmt.Sprintf(`vmagent_remotewrite_mdx_tracked_instances{path=%q,url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(mdxFilter.VMInstancesCount())
})
}
return rwctx
}
@@ -989,6 +1011,11 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.deduplicator.MustStop()
rwctx.deduplicator = nil
}
if rwctx.mdxFilter != nil {
rwctx.mdxFilter.MustStop()
rwctx.mdxFilter = nil
rwctx.mdxRowsPreserved = nil
}
for _, ps := range rwctx.pss {
ps.MustStop()
@@ -1004,6 +1031,7 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.rowsPushedAfterRelabel = nil
rwctx.rowsDroppedByRelabel = nil
}
// TryPushTimeSeries sends tss series to the configured remote write endpoint
@@ -1011,16 +1039,41 @@ func (rwctx *remoteWriteCtx) MustStop() {
// TryPushTimeSeries doesn't modify tss, so tss can be passed concurrently to TryPush across distinct rwctx instances.
func (rwctx *remoteWriteCtx) TryPushTimeSeries(tss []prompb.TimeSeries, forceDropSamplesOnFailure bool) bool {
var rctx *relabelCtx
var mctx *mdx.Ctx
var v *[]prompb.TimeSeries
defer func() {
if rctx == nil {
return
if v != nil {
*v = prompb.ResetTimeSeries(tss)
tssPool.Put(v)
}
if rctx != nil {
putRelabelCtx(rctx)
}
if mctx != nil {
mdx.PutContext(mctx)
}
*v = prompb.ResetTimeSeries(tss)
tssPool.Put(v)
putRelabelCtx(rctx)
}()
copyTimeSeriesIfNeeded := func() {
if v == nil {
v := tssPool.Get().(*[]prompb.TimeSeries)
tss = append(*v, tss...)
}
}
if rwctx.mdxFilter != nil {
mctx = mdx.GetContext()
// Make a copy of tss before applying relabeling in order to prevent
// from affecting time series for other remoteWrite.mdx configs.
copyTimeSeriesIfNeeded()
tss = rwctx.mdxFilter.Filter(mctx, tss)
if len(tss) == 0 {
return true
}
rowsCount := getRowsCount(tss)
rwctx.mdxRowsPreserved.Add(rowsCount)
}
// Apply relabeling
rcs := allRelabelConfigs.Load()
pcs := rcs.perURL[rwctx.idx]
@@ -1030,8 +1083,7 @@ func (rwctx *remoteWriteCtx) TryPushTimeSeries(tss []prompb.TimeSeries, forceDro
// from affecting time series for other remoteWrite.url configs.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/467
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/599
v = tssPool.Get().(*[]prompb.TimeSeries)
tss = append(*v, tss...)
copyTimeSeriesIfNeeded()
rowsCountBeforeRelabel := getRowsCount(tss)
tss = rctx.applyRelabeling(tss, pcs)
rowsCountAfterRelabel := getRowsCount(tss)
@@ -1049,8 +1101,7 @@ func (rwctx *remoteWriteCtx) TryPushTimeSeries(tss []prompb.TimeSeries, forceDro
if rctx == nil {
rctx = getRelabelCtx()
// Make a copy of tss before dropping aggregated series
v = tssPool.Get().(*[]prompb.TimeSeries)
tss = append(*v, tss...)
copyTimeSeriesIfNeeded()
}
tss = dropAggregatedSeries(tss, matchIdxs.B, rwctx.streamAggrDropInput)
} else if rwctx.streamAggrDropInput {
@@ -1058,8 +1109,7 @@ func (rwctx *remoteWriteCtx) TryPushTimeSeries(tss []prompb.TimeSeries, forceDro
if rctx == nil {
rctx = getRelabelCtx()
// Make a copy of tss before dropping aggregated series
v = tssPool.Get().(*[]prompb.TimeSeries)
tss = append(*v, tss...)
copyTimeSeriesIfNeeded()
}
tss = dropUnaggregatedSeries(tss, matchIdxs.B)
}
@@ -1178,15 +1228,6 @@ func getRowsCount(tss []prompb.TimeSeries) int {
}
return rowsCount
}
func newMapFromStrings(a []string) map[string]struct{} {
m := make(map[string]struct{}, len(a))
for _, s := range a {
m[s] = struct{}{}
}
return m
}
func getMaxHourlySeries() int {
limit := *maxHourlySeries
if limit == -1 || limit > math.MaxInt32 {

View File

@@ -145,10 +145,10 @@ func TestRuleValidate(t *testing.T) {
}
func TestGroupValidate_Failure(t *testing.T) {
f := func(group *Group, validateExpressions bool, errStrExpected string) {
f := func(data []byte, validateExpressions bool, errStrExpected string) {
t.Helper()
err := group.Validate(nil, validateExpressions)
_, err := parse(map[string][]byte{"test.yaml": data}, nil, validateExpressions)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@@ -158,275 +158,238 @@ func TestGroupValidate_Failure(t *testing.T) {
}
}
f(&Group{}, false, "group name must be set")
f([]byte(`
groups:
- name: ""
`), false, "group name must be set")
f(&Group{
Name: "both record and alert are not set",
Rules: []Rule{
{
Expr: "sum(up == 0 ) by (host)",
For: promutil.NewDuration(10 * time.Millisecond),
},
{
Expr: "sumSeries(time('foo.bar',10))",
},
},
}, false, "invalid rule")
f([]byte(`
groups:
- name: both record and alert are not set
rules:
- expr: "sum(up == 0 ) by (host)"
for: 10ms
- expr: "sumSeries(time('foo.bar',10))"
`), false, "invalid rule")
f(&Group{
Name: "negative interval",
Interval: promutil.NewDuration(-1),
}, false, "interval shouldn't be lower than 0")
f([]byte(`
groups:
- name: negative interval
interval: -1ms
`), false, "interval shouldn't be lower than 0")
f(&Group{
Name: "too big eval_offset",
Interval: promutil.NewDuration(time.Minute),
EvalOffset: promutil.NewDuration(2 * time.Minute),
}, false, "eval_offset should be smaller than interval")
f([]byte(`
groups:
- name: too big eval_offset
interval: 1m
eval_offset: 2m
`), false, "eval_offset should be smaller than interval")
f(&Group{
Name: "too big negative eval_offset",
Interval: promutil.NewDuration(time.Minute),
EvalOffset: promutil.NewDuration(-2 * time.Minute),
}, false, "eval_offset should be smaller than interval")
f([]byte(`
groups:
- name: too big negative eval_offset
interval: 1m
eval_offset: -2m
`), false, "eval_offset should be smaller than interval")
limit := -1
f(&Group{
Name: "wrong limit",
Limit: &limit,
}, false, "invalid limit")
f([]byte(`
groups:
- name: wrong limit
limit: -1
`), false, "invalid limit")
f(&Group{
Name: "wrong concurrency",
Concurrency: -1,
}, false, "invalid concurrency")
f([]byte(`
groups:
- name: wrong concurrency
concurrency: -1
`), false, "invalid concurrency")
f(&Group{
Name: "test",
Rules: []Rule{
{
Alert: "alert",
Expr: "up == 1",
},
{
Alert: "alert",
Expr: "up == 1",
},
},
}, false, "duplicate")
f([]byte(`
groups:
- name: test
rules:
- alert: alert
expr: up == 1
- alert: alert
expr: up == 1
`), false, "duplicate")
f(&Group{
Name: "test",
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
},
}, false, "duplicate")
f([]byte(`
groups:
- name: test
rules:
- alert: alert
expr: up == 1
labels:
summary: "{{ value|query }}"
- alert: alert
expr: up == 1
labels:
summary: "{{ value|query }}"
`), false, "duplicate")
f(&Group{
Name: "test",
Rules: []Rule{
{Record: "record", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
{Record: "record", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
},
}, false, "duplicate")
f([]byte(`
groups:
- name: test
rules:
- record: record
expr: up == 1
labels:
summary: "{{ value|query }}"
- record: record
expr: up == 1
labels:
summary: "{{ value|query }}"
`), false, "duplicate")
f(&Group{
Name: "test",
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "{{ value|query }}",
}},
},
}, false, "duplicate")
f(&Group{
Name: "test",
Rules: []Rule{
{Record: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
}},
},
}, false, "duplicate")
f(&Group{
Name: "test thanos",
Type: NewRawType("thanos"),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "{{ value|query }}",
}},
},
}, true, "unknown datasource type")
f([]byte(`
groups:
- name: test thanos
type: thanos
rules:
- alert: alert
expr: up == 1
labels:
description: "{{ value|query }}"
`), true, "unknown datasource type")
// validate expressions
f(&Group{
Name: "test prometheus expr",
Type: NewPrometheusType(),
Rules: []Rule{
{
Record: "record",
Expr: "up | 0",
},
},
}, true, "bad MetricsQL expr")
f([]byte(`
groups:
- name: test prometheus expr
type: prometheus
rules:
- record: record
expr: "up | 0"
`), true, "bad MetricsQL expr")
f(&Group{
Name: "test graphite expr",
Type: NewGraphiteType(),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "some-description",
}},
},
}, true, "bad GraphiteQL expr")
f([]byte(`
groups:
- name: test graphite expr
type: graphite
rules:
- alert: alert
expr: up == 1
labels:
description: some-description
`), true, "bad GraphiteQL expr")
f(&Group{
Name: "test vlogs expr",
Type: NewVLogsType(),
Rules: []Rule{
{Alert: "alert", Expr: "stats count(*) as requests"},
},
}, true, "bad LogsQL expr")
f([]byte(`
groups:
- name: test vlogs expr
type: vlogs
rules:
- alert: alert
expr: "stats count(*) as requests"
`), true, "bad LogsQL expr")
f(&Group{
Name: "test vlogs expr",
Type: NewVLogsType(),
Rules: []Rule{
{Alert: "alert", Expr: "_time: 1m | stats by (path, _time: 1m) count(*) as requests"},
},
}, true, "bad LogsQL expr")
f([]byte(`
groups:
- name: test vlogs expr multipart
type: vlogs
rules:
- alert: alert
expr: "_time: 1m | stats by (path, _time: 1m) count(*) as requests"
`), true, "bad LogsQL expr")
f(&Group{
Name: "test graphite with prometheus expr",
Type: NewGraphiteType(),
Rules: []Rule{
{
Record: "r1",
ID: 1,
Expr: "sumSeries(time('foo.bar',10))",
For: promutil.NewDuration(10 * time.Millisecond),
},
{
Record: "r2",
ID: 2,
Expr: "sum(up == 0 ) by (host)",
},
},
}, true, "bad GraphiteQL expr")
f([]byte(`
groups:
- name: test graphite with prometheus expr
type: graphite
rules:
- record: r1
expr: "sumSeries(time('foo.bar',10))"
for: 10ms
- record: r2
expr: "sum(up == 0 ) by (host)"
`), true, "bad GraphiteQL expr")
f(&Group{
Name: "test vlogs with prometheus exp",
Type: NewVLogsType(),
Rules: []Rule{
{
Record: "r1",
Expr: "sum(up == 0 ) by (host)",
For: promutil.NewDuration(10 * time.Millisecond),
},
},
}, true, "bad LogsQL expr")
f([]byte(`
groups:
- name: test vlogs with prometheus expr
type: vlogs
rules:
- record: r1
expr: "sum(up == 0 ) by (host)"
for: 10ms
`), true, "bad LogsQL expr")
f(&Group{
Name: "test prometheus with vlogs exp",
Type: NewPrometheusType(),
Rules: []Rule{
{
Record: "r1",
Expr: "* | stats by (path) count()",
For: promutil.NewDuration(10 * time.Millisecond),
},
},
}, true, "bad MetricsQL expr")
f([]byte(`
groups:
- name: test prometheus with vlogs expr
type: prometheus
rules:
- record: r1
expr: "* | stats by (path) count()"
for: 10ms
`), true, "bad MetricsQL expr")
}
func TestGroupValidate_Success(t *testing.T) {
f := func(group *Group, validateAnnotations, validateExpressions bool) {
f := func(data []byte, validateAnnotations, validateExpressions bool) {
t.Helper()
var validateTplFn ValidateTplFn
if validateAnnotations {
validateTplFn = notifier.ValidateTemplates
}
err := group.Validate(validateTplFn, validateExpressions)
_, err := parse(map[string][]byte{"test.yaml": data}, validateTplFn, validateExpressions)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
}
f(&Group{
Name: "test",
Rules: []Rule{
{
Record: "record",
Expr: "up | 0",
},
},
}, false, false)
f([]byte(`
groups:
- name: test
rules:
- record: record
expr: "up | 0"
`), false, false)
f(&Group{
Name: "test",
Rules: []Rule{
{
Alert: "alert",
Expr: "up == 1",
Labels: map[string]string{
"summary": "{{ value|query }}",
},
},
},
}, false, false)
f([]byte(`
groups:
- name: test
rules:
- alert: alert
expr: up == 1
labels:
summary: "{{ value|query }}"
`), false, false)
// validate annotations
f(&Group{
Name: "test",
Rules: []Rule{
{
Alert: "alert",
Expr: "up == 1",
Labels: map[string]string{
"summary": `
{{ with printf "node_memory_MemTotal{job='node',instance='%s'}" "localhost" | query }}
{{ . | first | value | humanize1024 }}B
{{ end }}`,
},
},
},
}, true, false)
f([]byte(`
groups:
- name: test
rules:
- alert: alert
expr: up == 1
labels:
summary: "\n{{ with printf \"node_memory_MemTotal{job='node',instance='%s'}\" \"localhost\" | query }}\n {{ . | first | value | humanize1024 }}B\n{{ end }}"
`), true, false)
// validate expressions
f(&Group{
Name: "test prometheus",
Type: NewPrometheusType(),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "{{ value|query }}",
}},
},
}, false, true)
f(&Group{
Name: "test victorialogs",
Type: NewVLogsType(),
Rules: []Rule{
{Alert: "alert", Expr: " _time: 1m | stats count(*) as requests", Labels: map[string]string{
"description": "{{ value|query }}",
}},
},
}, false, true)
f([]byte(`
groups:
- name: test prometheus
type: prometheus
rules:
- alert: alert
expr: up == 1
labels:
description: "{{ value|query }}"
`), false, true)
f([]byte(`
groups:
- name: test victorialogs
type: vlogs
rules:
- alert: alert
expr: " _time: 1m | stats count(*) as requests"
labels:
description: "{{ value|query }}"
`), false, true)
}
func TestHashRule_NotEqual(t *testing.T) {

View File

@@ -315,6 +315,11 @@ func configReload(ctx context.Context, m *manager, groupsCfg []config.Group, sig
parseFn := config.Parse
for {
select {
case <-ctx.Done():
return
default:
}
select {
case <-ctx.Done():
return

View File

@@ -457,12 +457,10 @@ func TestSetIntervalAsTimeFilter(t *testing.T) {
f(`* | count()`, "vlogs", true)
f(`error OR _time:5m | count()`, "vlogs", true)
f(`(_time: 5m AND error) OR (_time: 5m AND warn) | count()`, "vlogs", true)
f(`* | error OR _time:5m | count()`, "vlogs", true)
f(`_time:5m | count()`, "vlogs", false)
f(`_time:2023-04-25T22:45:59Z | count()`, "vlogs", false)
f(`error AND _time:5m | count()`, "vlogs", false)
f(`* | error AND _time:5m | count()`, "vlogs", false)
}
func TestRecordingRuleExec_Partial(t *testing.T) {

View File

@@ -840,6 +840,11 @@ func authConfigReloader(sighupCh <-chan os.Signal) {
}
for {
select {
case <-stopCh:
return
default:
}
select {
case <-stopCh:
return
@@ -906,7 +911,8 @@ func reloadAuthConfigData(data []byte) (bool, error) {
return false, fmt.Errorf("failed to parse auth config: %w", err)
}
jui, oidcDP, err := parseJWTUsers(ac)
oidcDP := &oidcDiscovererPool{}
jui, err := parseJWTUsers(ac, oidcDP)
if err != nil {
return false, fmt.Errorf("failed to parse JWT users from auth config: %w", err)
}

View File

@@ -140,6 +140,18 @@ users:
- "ProjectID: {{.MetricsProjectID}}"
url_prefix: "http://vminsert:8480/insert/prometheus"
# JWT-based routing that relies solely on custom claims.
# The `vm_access` claim is missing, default value will be used.
# e.g. {"role": "admin"}.
- name: jwt-custom-claims
jwt:
skip_verify: true
vm_default_access_claim:
metrics_account_id: 1
match_claims:
role: admin
url_prefix: "http://vmselect-admin:8481/select/0/prometheus"
# Requests without Authorization header are proxied according to `unauthorized_user` section.
# Requests are proxied in round-robin fashion between `url_prefix` backends.
# The deny_partial_response query arg is added to all the proxied requests.

View File

@@ -65,6 +65,8 @@ type JWTConfig struct {
MatchClaims map[string]string `yaml:"match_claims,omitempty"`
parsedMatchClaims []*jwt.Claim
DefaultVMAccessClaim *jwt.VMAccessClaim `yaml:"default_vm_access_claim,omitempty"`
// verifierPool is used to verify JWT tokens.
// It is initialized from PublicKeys and/or PublicKeyFiles.
// In this case, it is initialized once at config reload and never updated until next reload
@@ -72,9 +74,8 @@ type JWTConfig struct {
verifierPool atomic.Pointer[jwt.VerifierPool]
}
func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
func parseJWTUsers(ac *AuthConfig, oidcDP *oidcDiscovererPool) ([]*UserInfo, error) {
jui := make([]*UserInfo, 0, len(ac.Users))
oidcDP := &oidcDiscovererPool{}
uniqClaims := make(map[string]*UserInfo)
var sortedClaims []string
@@ -85,10 +86,10 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
}
if ui.AuthToken != "" || ui.BearerToken != "" || ui.Username != "" || ui.Password != "" {
return nil, nil, fmt.Errorf("auth_token, bearer_token, username and password cannot be specified if jwt is set")
return nil, fmt.Errorf("auth_token, bearer_token, username and password cannot be specified if jwt is set")
}
if len(jwtToken.PublicKeys) == 0 && len(jwtToken.PublicKeyFiles) == 0 && !jwtToken.SkipVerify && jwtToken.OIDC == nil {
return nil, nil, fmt.Errorf("jwt must contain at least a single public key, public_key_files, oidc or have skip_verify=true")
return nil, fmt.Errorf("jwt must contain at least a single public key, public_key_files, oidc or have skip_verify=true")
}
var claimsString string
sortedClaims = sortedClaims[:0]
@@ -97,7 +98,7 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
sortedClaims = append(sortedClaims, fmt.Sprintf("%s=%s", ck, cv))
pc, err := jwt.NewClaim(ck, cv)
if err != nil {
return nil, nil, fmt.Errorf("incorrect match claim, key=%q, value regex=%q: %w", ck, cv, err)
return nil, fmt.Errorf("incorrect match claim, key=%q, value regex=%q: %w", ck, cv, err)
}
parsedClaims = append(parsedClaims, pc)
}
@@ -106,7 +107,7 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
claimsString = strings.Join(sortedClaims, ",")
if oldUI, ok := uniqClaims[claimsString]; ok {
return nil, nil, fmt.Errorf("duplicate match claims=%q found for name=%q at idx=%d; the previous one is set for name=%q", claimsString, ui.Name, idx, oldUI.Name)
return nil, fmt.Errorf("duplicate match claims=%q found for name=%q at idx=%d; the previous one is set for name=%q", claimsString, ui.Name, idx, oldUI.Name)
}
uniqClaims[claimsString] = &ui
if len(jwtToken.PublicKeys) > 0 || len(jwtToken.PublicKeyFiles) > 0 {
@@ -115,7 +116,7 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
for i := range jwtToken.PublicKeys {
k, err := jwt.ParseKey([]byte(jwtToken.PublicKeys[i]))
if err != nil {
return nil, nil, err
return nil, err
}
keys = append(keys, k)
}
@@ -123,52 +124,52 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
for _, filePath := range jwtToken.PublicKeyFiles {
keyData, err := os.ReadFile(filePath)
if err != nil {
return nil, nil, fmt.Errorf("cannot read public key from file %q: %w", filePath, err)
return nil, fmt.Errorf("cannot read public key from file %q: %w", filePath, err)
}
k, err := jwt.ParseKey(keyData)
if err != nil {
return nil, nil, fmt.Errorf("cannot parse public key from file %q: %w", filePath, err)
return nil, fmt.Errorf("cannot parse public key from file %q: %w", filePath, err)
}
keys = append(keys, k)
}
vp, err := jwt.NewVerifierPool(keys)
if err != nil {
return nil, nil, err
return nil, err
}
jwtToken.verifierPool.Store(vp)
}
if jwtToken.OIDC != nil {
if len(jwtToken.PublicKeys) > 0 || len(jwtToken.PublicKeyFiles) > 0 || jwtToken.SkipVerify {
return nil, nil, fmt.Errorf("jwt with oidc cannot contain public keys or have skip_verify=true")
return nil, fmt.Errorf("jwt with oidc cannot contain public keys or have skip_verify=true")
}
if jwtToken.OIDC.Issuer == "" {
return nil, nil, fmt.Errorf("oidc issuer cannot be empty")
return nil, fmt.Errorf("oidc issuer cannot be empty")
}
isserURL, err := url.Parse(jwtToken.OIDC.Issuer)
if err != nil {
return nil, nil, fmt.Errorf("oidc issuer %q must be a valid URL", jwtToken.OIDC.Issuer)
return nil, fmt.Errorf("oidc issuer %q must be a valid URL", jwtToken.OIDC.Issuer)
}
if isserURL.Scheme != "https" && isserURL.Scheme != "http" {
return nil, nil, fmt.Errorf("oidc issuer %q must have http or https scheme", jwtToken.OIDC.Issuer)
return nil, fmt.Errorf("oidc issuer %q must have http or https scheme", jwtToken.OIDC.Issuer)
}
oidcDP.createOrAdd(ui.JWT.OIDC.Issuer, &ui.JWT.verifierPool)
}
if err := parseJWTPlaceholdersForUserInfo(&ui, true); err != nil {
return nil, nil, err
return nil, err
}
if err := ui.initURLs(); err != nil {
return nil, nil, err
return nil, err
}
metricLabels, err := ui.getMetricLabels()
if err != nil {
return nil, nil, fmt.Errorf("cannot parse metric_labels: %w", err)
return nil, fmt.Errorf("cannot parse metric_labels: %w", err)
}
ui.requests = ac.ms.GetOrCreateCounter(`vmauth_user_requests_total` + metricLabels)
ui.requestErrors = ac.ms.GetOrCreateCounter(`vmauth_user_request_errors_total` + metricLabels)
@@ -187,7 +188,7 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
rt, err := newRoundTripper(ui.TLSCAFile, ui.TLSCertFile, ui.TLSKeyFile, ui.TLSServerName, ui.TLSInsecureSkipVerify)
if err != nil {
return nil, nil, fmt.Errorf("cannot initialize HTTP RoundTripper: %w", err)
return nil, fmt.Errorf("cannot initialize HTTP RoundTripper: %w", err)
}
ui.rt = rt
@@ -200,7 +201,7 @@ func parseJWTUsers(ac *AuthConfig) ([]*UserInfo, *oidcDiscovererPool, error) {
return len(jui[i].JWT.MatchClaims) > len(jui[j].JWT.MatchClaims)
})
return jui, oidcDP, nil
return jui, nil
}
var tokenPool sync.Pool
@@ -433,7 +434,6 @@ func validateJWTPlaceholdersForURL(up *URLPrefix, isAllowed bool) error {
}
if strings.Contains(p, placeholderPrefix) {
return fmt.Errorf("invalid placeholder found in URL request path: %q, supported values are: %s", bu.Path, strings.Join(allPlaceholders, ", "))
}
}
for param, values := range bu.Query() {
@@ -488,7 +488,6 @@ func hasAnyPlaceholders(u *url.URL) bool {
return true
}
}
}
return false
}

View File

@@ -39,16 +39,14 @@ XOtclIk1uhc03oL9nOQ=
}
return
}
users, oidcDP, err := parseJWTUsers(ac)
oidcDP := &oidcDiscovererPool{}
users, err := parseJWTUsers(ac, oidcDP)
if err == nil {
t.Fatalf("expecting non-nil error; got %v", users)
}
if expErr != err.Error() {
t.Fatalf("unexpected error; got\n%q\nwant \n%q", err.Error(), expErr)
}
if oidcDP != nil {
t.Fatalf("expecting nil oidcDP; got %v", oidcDP)
}
}
// unauthorized_user cannot be used with jwt
@@ -326,7 +324,8 @@ XOtclIk1uhc03oL9nOQ=
t.Fatalf("unexpected error: %s", err)
}
jui, oidcDP, err := parseJWTUsers(ac)
oidcDP := &oidcDiscovererPool{}
jui, err := parseJWTUsers(ac, oidcDP)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}

View File

@@ -6,6 +6,7 @@ import (
"flag"
"fmt"
"io"
"math/rand/v2"
"net"
"net/http"
"net/textproto"
@@ -31,6 +32,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
)
var (
@@ -190,6 +192,10 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if tkn == nil {
logger.Panicf("BUG: unexpected nil jwt token for user %q", ui.name())
}
if !tkn.HasVMAccessClaim() && ui.JWT.DefaultVMAccessClaim == nil {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return true
}
defer putToken(tkn)
processUserRequest(w, r, ui, tkn)
return true
@@ -202,6 +208,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
invalidAuthTokenRequests.Inc()
slowdownUnauthorizedResponse(r)
if *logInvalidAuthTokens {
err := fmt.Errorf("cannot authorize request with auth tokens %q", ats)
err = &httpserver.ErrorWithStatusCode{
@@ -424,8 +431,12 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *j
}
targetURL := bu.url
if tkn != nil {
vmac := tkn.VMAccess()
if !tkn.HasVMAccessClaim() {
vmac = ui.JWT.DefaultVMAccessClaim
}
// for security reasons allow templating only for configured url values and headers
targetURL, hc = replaceJWTPlaceholders(bu, hc, tkn.VMAccess())
targetURL, hc = replaceJWTPlaceholders(bu, hc, vmac)
}
if isDefault {
// Don't change path and add request_path query param for default route.
@@ -881,3 +892,20 @@ func debugInfo(u *url.URL, r *http.Request) string {
fmt.Fprint(s, ")")
return s.String()
}
// slowdownUnauthorizedResponse adds a random delay in the [2..3] seconds range before returning an unauthorized response.
// This reduces the effectiveness of brute-force.
//
// Recommended by OWASP Top10:
// https://owasp.org/Top10/2025/A07_2025-Authentication_Failures
func slowdownUnauthorizedResponse(r *http.Request) {
d := 2*time.Second + time.Duration(rand.IntN(1000))*time.Millisecond
t := timerpool.Get(d)
select {
case <-t.C:
case <-r.Context().Done():
}
timerpool.Put(t)
}

View File

@@ -739,6 +739,12 @@ users:
"vm_access": map[string]any{},
}, false)
// token without vm_access claim, but with a custom claim usable for routing
roleToken := genToken(t, map[string]any{
"exp": time.Now().Add(10 * time.Minute).Unix(),
"role": "admin",
}, true)
fullToken := genToken(t, map[string]any{
"exp": time.Now().Add(10 * time.Minute).Unix(),
"vm_access": map[string]any{
@@ -779,6 +785,26 @@ statusCode=401
Unauthorized`
f(simpleCfgStr, request, responseExpected)
// token without vm_access claim is accepted when it
request = httptest.NewRequest(`GET`, "http://some-host.com/abc", nil)
request.Header.Set(`Authorization`, `Bearer `+roleToken)
responseExpected = `
statusCode=200
path: /foo/abc
query:
headers:`
f(fmt.Sprintf(`
users:
- jwt:
public_keys:
- %q
default_vm_access_claim:
metrics_account_id: 10
metrics_project_id: 10
match_claims:
role: admin
url_prefix: {BACKEND}/foo`, string(publicKeyPEM)), request, responseExpected)
// expired token
request = httptest.NewRequest(`GET`, "http://some-host.com/abc", nil)
request.Header.Set(`Authorization`, `Bearer `+expiredToken)

View File

@@ -1,3 +1,3 @@
See vmctl docs [here](https://docs.victoriametrics.com/victoriametrics/vmctl/).
vmctl docs can be edited at [docs/vmctl.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmctl.md).
vmctl docs can be edited at [docs/vmctl.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/victoriametrics/vmctl/vmctl.md).

View File

@@ -131,16 +131,13 @@ func (ac *authContext) initFromBasicAuthConfig(ba *BasicAuthConfig) error {
if ba.Username == "" {
return fmt.Errorf("missing `username` in `basic_auth` section")
}
if ba.Password != "" {
ac.getAuthHeader = func() string {
// See https://en.wikipedia.org/wiki/Basic_access_authentication
token := ba.Username + ":" + ba.Password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
}
ac.authDigest = fmt.Sprintf("basic(username=%q, password=%q)", ba.Username, ba.Password)
return nil
ac.getAuthHeader = func() string {
// See https://en.wikipedia.org/wiki/Basic_access_authentication
token := ba.Username + ":" + ba.Password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
}
ac.authDigest = fmt.Sprintf("basic(username=%q, password=%q)", ba.Username, ba.Password)
return nil
}

View File

@@ -69,6 +69,8 @@ const (
vmAddr = "vm-addr"
vmUser = "vm-user"
vmPassword = "vm-password"
vmHeaders = "vm-headers"
vmBearerToken = "vm-bearer-token"
vmAccountID = "vm-account-id"
vmConcurrency = "vm-concurrency"
vmCompress = "vm-compress"
@@ -112,6 +114,16 @@ var (
Usage: "VictoriaMetrics password for basic auth",
EnvVars: []string{"VM_PASSWORD"},
},
&cli.StringFlag{
Name: vmHeaders,
Usage: "Optional HTTP headers to send with each request to the corresponding destination address. \n" +
"For example, --vm-headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding destination address. \n" +
"Multiple headers must be delimited by '^^': --vm-headers='header1:value1^^header2:value2'",
},
&cli.StringFlag{
Name: vmBearerToken,
Usage: "Optional bearer auth token to use for the corresponding --vm-addr",
},
&cli.StringFlag{
Name: vmAccountID,
Usage: "AccountID is an arbitrary 32-bit integer identifying namespace for data ingestion (aka tenant). \n" +

View File

@@ -259,7 +259,7 @@ func (cr *ChunkedResponse) Next() ([]int64, []float64, error) {
fieldValues, ok := r.values[cr.field]
if !ok {
return nil, nil, fmt.Errorf("response doesn't contain filed %q", cr.field)
return nil, nil, fmt.Errorf("response doesn't contain field %q", cr.field)
}
values := make([]float64, len(fieldValues))
for i, fv := range fieldValues {

View File

@@ -457,7 +457,7 @@ func main() {
auth.WithBearer(c.String(vmNativeDstBearerToken)),
auth.WithHeaders(c.String(vmNativeDstHeaders)))
if err != nil {
return fmt.Errorf("error initialize auth config for destination: %s", dstAddr)
return fmt.Errorf("error initialize auth config for destination: %s: %w", dstAddr, err)
}
// create TLS config
@@ -563,11 +563,11 @@ func main() {
}()
err = app.Run(os.Args)
pushmetrics.StopAndPush()
if err != nil {
log.Fatalln(err)
}
log.Printf("Total time: %v", time.Since(start))
pushmetrics.StopAndPush()
}
func initConfigVM(c *cli.Context) (vm.Config, error) {
@@ -596,11 +596,18 @@ func initConfigVM(c *cli.Context) (vm.Config, error) {
return vm.Config{}, fmt.Errorf("failed to create backoff object: %w", err)
}
authCfg, err := auth.Generate(
auth.WithBasicAuth(c.String(vmUser), c.String(vmPassword)),
auth.WithBearer(c.String(vmBearerToken)),
auth.WithHeaders(c.String(vmHeaders)))
if err != nil {
return vm.Config{}, fmt.Errorf("error initialize auth config for destination: %s: %w", addr, err)
}
return vm.Config{
Addr: addr,
Transport: tr,
User: c.String(vmUser),
Password: c.String(vmPassword),
AuthCfg: authCfg,
Concurrency: uint8(c.Int(vmConcurrency)),
Compress: c.Bool(vmCompress),
AccountID: c.String(vmAccountID),

View File

@@ -74,9 +74,9 @@ func wrapErr(vmErr *vm.ImportError, verbose bool) error {
verboseMsg = "(enable `--verbose` output to get more details)"
}
if vmErr.Err == nil {
return fmt.Errorf("%s\n\tLatest delivered batch for timestamps range %d - %d %s\n%s",
return fmt.Errorf("%w\n\tLatest delivered batch for timestamps range %d - %d %s\n%s",
vmErr.Err, minTS, maxTS, verboseMsg, errTS)
}
return fmt.Errorf("%s\n\tImporting batch failed for timestamps range %d - %d %s\n%s",
return fmt.Errorf("%w\n\tImporting batch failed for timestamps range %d - %d %s\n%s",
vmErr.Err, minTS, maxTS, verboseMsg, errTS)
}

View File

@@ -12,6 +12,7 @@ import (
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/auth"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/backoff"
@@ -27,6 +28,8 @@ type Config struct {
// --httpListenAddr value for single node version
// --httpListenAddr value of vmselect component for cluster version
Addr string
AuthCfg *auth.Config
// Transport allows specifying custom http.Transport
Transport *http.Transport
// Concurrency defines number of worker
@@ -40,10 +43,6 @@ type Config struct {
// BatchSize defines how many samples
// importer collects before sending the import request
BatchSize int
// User name for basic auth
User string
// Password for basic auth
Password string
// SignificantFigures defines the number of significant figures to leave
// in metric values before importing.
// Zero value saves all the significant decimal places
@@ -65,11 +64,10 @@ type Config struct {
// see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-import-time-series-data
type Importer struct {
addr string
authCfg *auth.Config
client *http.Client
importPath string
compress bool
user string
password string
close chan struct{}
input chan *TimeSeries
@@ -148,8 +146,7 @@ func NewImporter(ctx context.Context, cfg Config) (*Importer, error) {
client: client,
importPath: importPath,
compress: cfg.Compress,
user: cfg.User,
password: cfg.Password,
authCfg: cfg.AuthCfg,
rl: limiter.NewLimiter(cfg.RateLimit),
close: make(chan struct{}),
input: make(chan *TimeSeries, cfg.Concurrency*4),
@@ -304,8 +301,8 @@ func (im *Importer) Ping() error {
if err != nil {
return fmt.Errorf("cannot create request to %q: %w", im.addr, err)
}
if im.user != "" {
req.SetBasicAuth(im.user, im.password)
if im.authCfg != nil {
im.authCfg.SetHeaders(req, true)
}
resp, err := im.client.Do(req)
if err != nil {
@@ -334,8 +331,8 @@ func (im *Importer) Import(tsBatch []*TimeSeries) error {
im.importRequestsErrorsTotal.Inc()
return fmt.Errorf("cannot create request to %q: %w", im.addr, err)
}
if im.user != "" {
req.SetBasicAuth(im.user, im.password)
if im.authCfg != nil {
im.authCfg.SetHeaders(req, true)
}
if im.compress {
req.Header.Set("Content-Encoding", "gzip")

View File

@@ -405,7 +405,16 @@ func buildMatchWithFilter(filter string, metricName string) (string, error) {
if len(tf.Key) == 0 {
continue
}
a = append(a, tf.String())
switch {
case tf.IsNegative && tf.IsRegexp:
a = append(a, fmt.Sprintf("%s!~%q", tf.Key, tf.Value))
case tf.IsNegative:
a = append(a, fmt.Sprintf("%s!=%q", tf.Key, tf.Value))
case tf.IsRegexp:
a = append(a, fmt.Sprintf("%s=~%q", tf.Key, tf.Value))
default:
a = append(a, fmt.Sprintf("%s=%q", tf.Key, tf.Value))
}
}
a = append(a, nameFilter)
filters = append(filters, strings.Join(a, ","))

View File

@@ -6,8 +6,6 @@ import (
"flag"
"fmt"
"net/http"
nethttputil "net/http/httputil"
"net/url"
"strings"
"time"
@@ -29,6 +27,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmalertproxy"
)
var (
@@ -38,7 +37,10 @@ var (
resetCacheAuthKey = flagutil.NewPassword("search.resetCacheAuthKey", "Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call. It could be passed via authKey query arg. It overrides -httpAuth.*")
logSlowQueryDuration = flag.Duration("search.logSlowQueryDuration", 5*time.Second, "Log queries with execution time exceeding this value. Zero disables slow query logging. "+
"See also -search.logQueryMemoryUsage")
vmalertProxyURL = flag.String("vmalert.proxyURL", "", "Optional URL for proxying requests to vmalert. For example, if -vmalert.proxyURL=http://vmalert:8880 , then alerting API requests such as /api/v1/rules from Grafana will be proxied to http://vmalert:8880/api/v1/rules")
vmalertProxyURL = flag.String("vmalert.proxyURL", "", "Optional URL for proxying requests to vmalert. For example, if -vmalert.proxyURL=http://vmalert:8880 , "+
"then alerting API requests such as /api/v1/rules from Grafana will be proxied to http://vmalert:8880/api/v1/rules . "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmalert")
)
var slowQueries = metrics.NewCounter(`vm_slow_queries_total`)
@@ -55,8 +57,8 @@ func Init(vmselectMaxConcurrentRequests int, vmselectMaxQueueDuration time.Durat
concurrencyLimitCh = make(chan struct{}, maxConcurrentRequests)
initVMUIConfig()
initVMAlertProxy()
vmalertproxy.Init(*vmalertProxyURL)
flagutil.RegisterSecretFlag("vmalert.proxyURL")
}
@@ -514,10 +516,11 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
if len(*vmalertProxyURL) == 0 {
w.WriteHeader(http.StatusBadRequest)
w.Header().Set("Content-Type", "application/json")
fmt.Fprintf(w, "%s", `{"status":"error","msg":"for accessing vmalert flag '-vmalert.proxyURL' must be configured"}`)
fmt.Fprintf(w, "%s", `{"status":"error","msg":"the '-vmalert.proxyURL' command-line must be configured; `+
`see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmalert"}`)
return true
}
proxyVMAlertRequests(w, r, path)
vmalertproxy.HandleRequest(w, r, path)
return true
}
@@ -555,7 +558,7 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
case "/api/v1/rules", "/rules":
rulesRequests.Inc()
if len(*vmalertProxyURL) > 0 {
proxyVMAlertRequests(w, r, path)
vmalertproxy.HandleRequest(w, r, path)
return true
}
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#rules
@@ -565,7 +568,7 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
case "/api/v1/alerts", "/alerts":
alertsRequests.Inc()
if len(*vmalertProxyURL) > 0 {
proxyVMAlertRequests(w, r, path)
vmalertproxy.HandleRequest(w, r, path)
return true
}
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#alerts
@@ -575,7 +578,7 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
case "/api/v1/notifiers", "/notifiers":
notifiersRequests.Inc()
if len(*vmalertProxyURL) > 0 {
proxyVMAlertRequests(w, r, path)
vmalertproxy.HandleRequest(w, r, path)
return true
}
w.Header().Set("Content-Type", "application/json")
@@ -722,48 +725,7 @@ var (
metricNamesStatsResetErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/api/v1/admin/status/metric_names_stats/reset"}`)
)
func proxyVMAlertRequests(w http.ResponseWriter, r *http.Request, path string) {
defer func() {
err := recover()
if err == nil || err == http.ErrAbortHandler {
// Suppress http.ErrAbortHandler panic.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
return
}
// Forward other panics to the caller.
panic(err)
}()
req := r.Clone(r.Context())
req.URL.Path = strings.TrimPrefix(path, "prometheus")
req.Host = vmalertProxyHost
if strings.HasPrefix(r.Header.Get(`User-Agent`), `Grafana`) {
// Grafana currently supports only Prometheus-style alerts. If other alert types
// (e.g. logs or traces) are returned, it may fail with "Error loading alerts".
//
// Grafana queries the vmalert API directly, bypassing the VictoriaMetrics datasource,
// so query params (such as datasource_type) cannot be enforced on the Grafana side.
//
// To ensure compatibility, we detect Grafana requests via the User-Agent and enforce
// `datasource_type=prometheus`.
//
// See:
// - https://github.com/VictoriaMetrics/victoriametrics-datasource/issues/329#issuecomment-3847585443
// - https://github.com/VictoriaMetrics/victoriametrics-datasource/issues/59
q := req.URL.Query()
q.Set("datasource_type", "prometheus")
req.URL.RawQuery = q.Encode()
req.RequestURI = ""
}
vmalertProxy.ServeHTTP(w, req)
}
var (
vmalertProxyHost string
vmalertProxy *nethttputil.ReverseProxy
vmuiConfig string
)
var vmuiConfig string
func initVMUIConfig() {
var cfg struct {
@@ -795,16 +757,3 @@ func initVMUIConfig() {
}
vmuiConfig = string(data)
}
// initVMAlertProxy must be called after flag.Parse(), since it uses command-line flags.
func initVMAlertProxy() {
if len(*vmalertProxyURL) == 0 {
return
}
proxyURL, err := url.Parse(*vmalertProxyURL)
if err != nil {
logger.Fatalf("cannot parse -vmalert.proxyURL=%q: %s", *vmalertProxyURL, err)
}
vmalertProxyHost = proxyURL.Host
vmalertProxy = nethttputil.NewSingleHostReverseProxy(proxyURL)
}

View File

@@ -15,7 +15,7 @@ See https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-m
currentItem := 0
%}
{% for _, row := range result %}
"{%s string(row.MetricFamilyName) %}": [
{%q= string(row.MetricFamilyName) %}: [
{
"type": {%q= row.Type.String() %},
{% if len(row.Unit) > 0 -%}

View File

@@ -35,12 +35,10 @@ func StreamMetadataResponse(qw422016 *qt422016.Writer, result []*metricsmetadata
//line app/vmselect/prometheus/metadata_response.qtpl:17
for _, row := range result {
//line app/vmselect/prometheus/metadata_response.qtpl:17
qw422016.N().S(`"`)
//line app/vmselect/prometheus/metadata_response.qtpl:18
qw422016.E().S(string(row.MetricFamilyName))
qw422016.N().Q(string(row.MetricFamilyName))
//line app/vmselect/prometheus/metadata_response.qtpl:18
qw422016.N().S(`": [{"type":`)
qw422016.N().S(`: [{"type":`)
//line app/vmselect/prometheus/metadata_response.qtpl:20
qw422016.N().Q(row.Type.String())
//line app/vmselect/prometheus/metadata_response.qtpl:20

View File

@@ -28,6 +28,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
@@ -525,6 +526,7 @@ func DeleteHandler(startTime time.Time, r *http.Request) error {
if deletedCount > 0 {
promql.ResetRollupResultCache()
}
logger.Infof("/api/v1/admin/tsdb/delete_series has been called for %q. Deleted %d series.", sq.FiltersString(), deletedCount)
return nil
}
@@ -954,6 +956,7 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, w http.Respo
start, end, step int64, r *http.Request, ct int64, etfs [][]storage.TagFilter) error {
deadline := searchutil.GetDeadlineForQuery(r, startTime)
mayCache := !httputil.GetBool(r, "nocache")
optimizeRepeatedBinaryOpSubexprs := httputil.GetBool(r, "optimize_repeated_binary_op_subexprs")
lookbackDelta, err := getMaxLookback(r)
if err != nil {
return err
@@ -975,18 +978,19 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, w http.Respo
}
ec := &promql.EvalConfig{
Start: start,
End: end,
Step: step,
MaxPointsPerSeries: *maxPointsPerTimeseries,
MaxSeries: 0, // let vmstorage use maxUniqueTimeseries by default
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
MayCache: mayCache,
LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilterss: etfs,
CacheTagFilters: etfs,
Start: start,
End: end,
Step: step,
MaxPointsPerSeries: *maxPointsPerTimeseries,
MaxSeries: 0, // let vmstorage use maxUniqueTimeseries by default
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
MayCache: mayCache,
OptimizeRepeatedBinaryOpSubexprs: optimizeRepeatedBinaryOpSubexprs,
LookbackDelta: lookbackDelta,
RoundDigits: getRoundDigits(r),
EnforcedTagFilterss: etfs,
CacheTagFilters: etfs,
GetRequestURI: func() string {
return httpserver.GetRequestURI(r)
},

View File

@@ -132,6 +132,9 @@ type EvalConfig struct {
// Whether the response can be cached.
MayCache bool
// Whether repeated cacheable binary op subexpressions can be optimized.
OptimizeRepeatedBinaryOpSubexprs bool
// LookbackDelta is analog to `-query.lookback-delta` from Prometheus.
LookbackDelta int64
@@ -171,6 +174,7 @@ func copyEvalConfig(src *EvalConfig) *EvalConfig {
ec.MaxPointsPerSeries = src.MaxPointsPerSeries
ec.Deadline = src.Deadline
ec.MayCache = src.MayCache
ec.OptimizeRepeatedBinaryOpSubexprs = src.OptimizeRepeatedBinaryOpSubexprs
ec.LookbackDelta = src.LookbackDelta
ec.RoundDigits = src.RoundDigits
ec.EnforcedTagFilterss = src.EnforcedTagFilterss
@@ -467,7 +471,8 @@ func isAggrFuncWithoutGrouping(e metricsql.Expr) bool {
}
func execBinaryOpArgs(qt *querytracer.Tracer, ec *EvalConfig, exprFirst, exprSecond metricsql.Expr, be *metricsql.BinaryOpExpr) ([]*timeseries, []*timeseries, error) {
if !canPushdownCommonFilters(be) {
canPushdown := canPushdownCommonFilters(be)
if !canPushdown && !shouldOptimizeRepeatedBinaryOpSubexprs(ec, exprFirst, exprSecond) {
// Execute exprFirst and exprSecond in parallel, since it is impossible to pushdown common filters
// from exprFirst to exprSecond.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2886
@@ -500,6 +505,25 @@ func execBinaryOpArgs(qt *querytracer.Tracer, ec *EvalConfig, exprFirst, exprSec
}
return tssFirst, tssSecond, nil
}
if !canPushdown {
qt = qt.NewChild("execute left and right sides of %q sequentially because repeated cacheable subexpression was found", be.Op)
defer qt.Done()
qtFirst := qt.NewChild("expr1")
tssFirst, err := evalExpr(qtFirst, ec, exprFirst)
qtFirst.Done()
if err != nil {
return nil, nil, err
}
qtSecond := qt.NewChild("expr2")
tssSecond, err := evalExpr(qtSecond, ec, exprSecond)
qtSecond.Done()
if err != nil {
return nil, nil, err
}
return tssFirst, tssSecond, nil
}
// Execute binary operation in the following way:
//
@@ -544,6 +568,78 @@ func execBinaryOpArgs(qt *querytracer.Tracer, ec *EvalConfig, exprFirst, exprSec
return tssFirst, tssSecond, nil
}
func shouldOptimizeRepeatedBinaryOpSubexprs(ec *EvalConfig, exprFirst, exprSecond metricsql.Expr) bool {
if !ec.OptimizeRepeatedBinaryOpSubexprs {
return false
}
if ec.Start == ec.End {
return false
}
if !ec.mayCache() {
return false
}
candidatesFirst := make(map[string]struct{}, 1)
var b []byte
visitOptimizedAggrs(exprFirst, func(ae *metricsql.AggrFuncExpr) {
if hasUnseededVolatileFunc(ae) {
return
}
b = ae.AppendString(b[:0])
candidatesFirst[string(b)] = struct{}{}
})
if len(candidatesFirst) == 0 {
return false
}
repeated := false
visitOptimizedAggrs(exprSecond, func(ae *metricsql.AggrFuncExpr) {
if repeated {
return
}
b = ae.AppendString(b[:0])
_, repeated = candidatesFirst[string(b)]
})
return repeated
}
func visitOptimizedAggrs(e metricsql.Expr, f func(ae *metricsql.AggrFuncExpr)) {
metricsql.VisitAll(e, func(expr metricsql.Expr) {
ae, ok := expr.(*metricsql.AggrFuncExpr)
if !ok {
return
}
if getIncrementalAggrFuncCallbacks(ae.Name) == nil {
return
}
fe, _ := tryGetArgRollupFuncWithMetricExpr(ae)
if fe == nil {
return
}
f(ae)
})
}
func hasUnseededVolatileFunc(e metricsql.Expr) bool {
found := false
metricsql.VisitAll(e, func(expr metricsql.Expr) {
if found {
return
}
fe, ok := expr.(*metricsql.FuncExpr)
if !ok {
return
}
switch strings.ToLower(fe.Name) {
case "now":
found = true
case "rand", "rand_normal", "rand_exponential":
found = len(fe.Args) == 0
}
})
return found
}
func getCommonLabelFilters(tss []*timeseries) []metricsql.LabelFilter {
if len(tss) == 0 {
return nil
@@ -1591,6 +1687,10 @@ func assertInstantValues(tss []*timeseries) {
var memoryIntensiveQueries = metrics.NewCounter(`vm_memory_intensive_queries_total`)
var _ = metrics.NewGauge(`vm_max_memory_per_query`, func() float64 {
return float64(maxMemoryPerQuery.N)
})
func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcName string, rf rollupFunc,
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, windowExpr *metricsql.DurationExpr,
) ([]*timeseries, error) {

View File

@@ -170,3 +170,87 @@ func TestGetSumInstantValues(t *testing.T) {
[]*timeseries{ts("foo", 100, 1)},
)
}
func TestShouldOptimizeRepeatedBinaryOpSubexprsGate(t *testing.T) {
e, err := metricsql.Parse(`count(count(vm_requests_total) by (action,addr,cluster,endpoint)) by (action,addr,cluster) / count(count(vm_requests_total) by (action,addr,cluster,endpoint))`)
if err != nil {
t.Fatalf("unexpected error in metricsql.Parse(): %s", err)
}
be, ok := e.(*metricsql.BinaryOpExpr)
if !ok {
t.Fatalf("unexpected expr type; got %T; want *metricsql.BinaryOpExpr", e)
}
f := func(name string, ec *EvalConfig, resultExpected bool) {
t.Helper()
result := shouldOptimizeRepeatedBinaryOpSubexprs(ec, be.Left, be.Right)
if result != resultExpected {
t.Fatalf("unexpected result for %q; got %v; want %v", name, result, resultExpected)
}
}
f("disabled optimization", &EvalConfig{
Start: 1000,
End: 2000,
Step: 1000,
}, false)
f("disabled cache", &EvalConfig{
Start: 1000,
End: 2000,
Step: 1000,
OptimizeRepeatedBinaryOpSubexprs: true,
}, false)
f("instant query", &EvalConfig{
Start: 1000,
End: 1000,
Step: 1000,
MayCache: true,
OptimizeRepeatedBinaryOpSubexprs: true,
}, false)
f("repeated cacheable aggregate subexpression", &EvalConfig{
Start: 1000,
End: 2000,
Step: 1000,
MayCache: true,
OptimizeRepeatedBinaryOpSubexprs: true,
}, true)
f("unaligned range query", &EvalConfig{
Start: 1001,
End: 2000,
Step: 1000,
MayCache: true,
OptimizeRepeatedBinaryOpSubexprs: true,
}, false)
}
func TestShouldOptimizeRepeatedBinaryOpSubexprsExpressions(t *testing.T) {
f := func(name, q string, resultExpected bool) {
t.Helper()
e, err := metricsql.Parse(q)
if err != nil {
t.Fatalf("unexpected error in metricsql.Parse(%q) for %q: %s", q, name, err)
}
be, ok := e.(*metricsql.BinaryOpExpr)
if !ok {
t.Fatalf("unexpected expr type for %q; got %T; want *metricsql.BinaryOpExpr", name, e)
}
ec := &EvalConfig{Start: 1000, End: 2000, Step: 1000, MayCache: true, OptimizeRepeatedBinaryOpSubexprs: true}
result := shouldOptimizeRepeatedBinaryOpSubexprs(ec, be.Left, be.Right)
if result != resultExpected {
t.Fatalf("unexpected result for %q; got %v; want %v; query: %q", name, result, resultExpected, q)
}
}
f("original issue query", `count(count(vm_requests_total) by (action,addr,cluster,endpoint)) by (action,addr,cluster) / count(count(vm_requests_total) by (action,addr,cluster,endpoint))`, true)
f("right side contains repeated count aggregate", `count(foo) by (job) / (count(foo) by (job) + 1)`, true)
f("same sum aggregate", `sum(rate(foo[5m])) by (job) / sum(rate(foo[5m])) by (job)`, true)
f("same inner rollup but different aggregates", `sum(rate(foo[5m])) by (job) / count(rate(foo[5m])) by (job)`, false)
f("different count aggregates", `count(foo) by (job) / count(bar) by (job)`, false)
f("bare metric selector", `foo / foo`, false)
f("bare rollup function", `rate(a[5m]) / rate(a[5m])`, false)
f("now at modifier", `sum(rate(foo[5m] @ now())) by (job) / sum(rate(foo[5m] @ now())) by (job)`, false)
f("unseeded rand at modifier", `sum(rate(foo[5m] @ rand())) by (job) / sum(rate(foo[5m] @ rand())) by (job)`, false)
f("unseeded rand_normal at modifier", `sum(rate(foo[5m] @ rand_normal())) by (job) / sum(rate(foo[5m] @ rand_normal())) by (job)`, false)
f("unseeded rand_exponential at modifier", `sum(rate(foo[5m] @ rand_exponential())) by (job) / sum(rate(foo[5m] @ rand_exponential())) by (job)`, false)
f("seeded rand at modifier", `sum(rate(foo[5m] @ rand(1))) by (job) / sum(rate(foo[5m] @ rand(1))) by (job)`, true)
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1 +0,0 @@
var e=Object.create,t=Object.defineProperty,n=Object.getOwnPropertyDescriptor,r=Object.getOwnPropertyNames,i=Object.getPrototypeOf,a=Object.prototype.hasOwnProperty,o=(e,t)=>()=>(e&&(t=e(e=0)),t),s=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports),c=(e,n)=>{let r={};for(var i in e)t(r,i,{get:e[i],enumerable:!0});return n||t(r,Symbol.toStringTag,{value:`Module`}),r},l=(e,i,o,s)=>{if(i&&typeof i==`object`||typeof i==`function`)for(var c=r(i),l=0,u=c.length,d;l<u;l++)d=c[l],!a.call(e,d)&&d!==o&&t(e,d,{get:(e=>i[e]).bind(null,d),enumerable:!(s=n(i,d))||s.enumerable});return e},u=(n,r,a)=>(a=n==null?{}:e(i(n)),l(r||!n||!n.__esModule?t(a,`default`,{value:n,enumerable:!0}):a,n)),d=e=>a.call(e,`module.exports`)?e[`module.exports`]:l(t({},`__esModule`,{value:!0}),e);export{u as a,d as i,o as n,c as r,s as t};

View File

@@ -0,0 +1 @@
var e=Object.create,t=Object.defineProperty,n=Object.getOwnPropertyDescriptor,r=Object.getOwnPropertyNames,i=Object.getPrototypeOf,a=Object.prototype.hasOwnProperty,o=(e,t)=>()=>(e&&(t=e(e=0)),t),s=(e,t)=>()=>(t||(e((t={exports:{}}).exports,t),e=null),t.exports),c=(e,n)=>{let r={};for(var i in e)t(r,i,{get:e[i],enumerable:!0});return n||t(r,Symbol.toStringTag,{value:`Module`}),r},l=(e,i,o,s)=>{if(i&&typeof i==`object`||typeof i==`function`)for(var c=r(i),l=0,u=c.length,d;l<u;l++)d=c[l],!a.call(e,d)&&d!==o&&t(e,d,{get:(e=>i[e]).bind(null,d),enumerable:!(s=n(i,d))||s.enumerable});return e},u=(n,r,a)=>(a=n==null?{}:e(i(n)),l(r||!n||!n.__esModule?t(a,`default`,{value:n,enumerable:!0}):a,n)),d=e=>a.call(e,`module.exports`)?e[`module.exports`]:l(t({},`__esModule`,{value:!0}),e);export{u as a,d as i,o as n,c as r,s as t};

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -37,9 +37,9 @@
<meta property="og:title" content="UI for VictoriaMetrics">
<meta property="og:url" content="https://victoriametrics.com/">
<meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data">
<script type="module" crossorigin src="./assets/index-CoGukb-x.js"></script>
<link rel="modulepreload" crossorigin href="./assets/rolldown-runtime-COnpUsM8.js">
<link rel="modulepreload" crossorigin href="./assets/vendor-C8Kwp93_.js">
<script type="module" crossorigin src="./assets/index-CusQvJzs.js"></script>
<link rel="modulepreload" crossorigin href="./assets/rolldown-runtime-Cyuzqnbw.js">
<link rel="modulepreload" crossorigin href="./assets/vendor-B83wxFqK.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-CnsZ1jie.css">
<link rel="stylesheet" crossorigin href="./assets/index-BBUnmLOr.css">
</head>

View File

@@ -10,8 +10,6 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
@@ -23,6 +21,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/stringsutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vminsertapi"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmselectapi"
"github.com/VictoriaMetrics/metrics"
)
var (
@@ -153,7 +152,7 @@ func Init(vmselectMaxConcurrentRequests int, resetCacheIfNeeded func(mrs []stora
LogNewSeries: *logNewSeries,
}
strg := storage.MustOpenStorage(*storageDataPath, opts)
vmStorage = newVMStorageSingleNode(strg, vmselectMaxConcurrentRequests, resetCacheIfNeeded)
vmStorage = newVMStorage(strg, vmselectMaxConcurrentRequests, resetCacheIfNeeded)
var m storage.Metrics
strg.UpdateMetrics(&m)
@@ -175,15 +174,15 @@ func Init(vmselectMaxConcurrentRequests int, resetCacheIfNeeded func(mrs []stora
GetSearch = vmStorage.GetSearch
PutSearch = vmStorage.PutSearch
RequestHandler = vmStorage.requestHandler
DebugFlush = vmStorage.vms.s.DebugFlush
DebugFlush = vmStorage.s.DebugFlush
}
var storageMetrics *metrics.Set
var (
// vmStorageSingleNode is an instance of vmstorage used by vminsert and
// vmStorage is an instance of vmstorage used by vminsert and
// vmselect for writing and reading data.
vmStorage *VMStorageSingleNode
vmStorage *VMStorage
VMInsertAPI vminsertapi.API
VMSelectAPI vmselectapi.API
GetSearch func(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (*storage.Search, int, error)
@@ -209,15 +208,12 @@ func Stop() {
logger.Infof("the vmstorage has been stopped")
}
func (vmssn *VMStorageSingleNode) requestHandler(w http.ResponseWriter, r *http.Request) bool {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.requestHandler(w, r)
}
// requestHandler is a storage request handler.
// TODO(@rtm0): Move to a separate file, request_handler.go
func (vms *VMStorage) requestHandler(w http.ResponseWriter, r *http.Request) bool {
vms.wg.Add(1)
defer vms.wg.Done()
path := r.URL.Path
if path == "/internal/force_merge" {
if !httpserver.CheckAuthFlag(w, r, forceMergeAuthKey) {
@@ -361,15 +357,11 @@ var (
snapshotsDeleteAllErrorsTotal = metrics.NewCounter(`vm_http_request_errors_total{path="/snapshot/delete_all"}`)
)
// TODO(@rtm0): Move to metrics.go.
func (vmssn *VMStorageSingleNode) writeStorageMetrics(w io.Writer) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
vmssn.vms.writeStorageMetrics(w)
}
// TODO(@rtm0): Move to metrics.go.
func (vms *VMStorage) writeStorageMetrics(w io.Writer) {
vms.wg.Add(1)
defer vms.wg.Done()
strg := vms.s
var m storage.Metrics
strg.UpdateMetrics(&m)

View File

@@ -1,6 +1,7 @@
package vmstorage
import (
"errors"
"flag"
"fmt"
"sync"
@@ -14,6 +15,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricnamestats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/syncwg"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmselectapi"
)
@@ -34,7 +36,7 @@ var (
// newVMStorage creates a new instance of of VMStorage.
//
// The created VMStorage instance takes ownership of s.
func newVMStorage(s *storage.Storage, vmselectMaxConcurrentRequests int) *VMStorage {
func newVMStorage(s *storage.Storage, vmselectMaxConcurrentRequests int, resetCacheIfNeeded func(mrs []storage.MetricRow)) *VMStorage {
if err := encoding.CheckPrecisionBits(uint8(*precisionBits)); err != nil {
logger.Fatalf("invalid -precisionBits=%d: %s", *precisionBits, err)
}
@@ -49,6 +51,8 @@ func newVMStorage(s *storage.Storage, vmselectMaxConcurrentRequests int) *VMStor
maxUniqueTimeseries: *maxUniqueTimeseries,
maxUniqueTimeSeriesCalculated: maxUniqueTimeseriesCalculated,
staleSnapshotsRemoverCh: make(chan struct{}),
wg: syncwg.WaitGroup{},
resetCacheIfNeeded: resetCacheIfNeeded,
}
vms.initStaleSnapshotsRemover()
return vms
@@ -78,6 +82,17 @@ type VMStorage struct {
maxUniqueTimeSeriesCalculated int
staleSnapshotsRemoverCh chan struct{}
staleSnapshotsRemoverWG sync.WaitGroup
// wg is used to wrap every storage call into wg.Add(1) ... wg.Done()
// for proper graceful shutdown when Stop is called.
//
// Use syncwg instead of sync, since Add is called from concurrent
// goroutines.
wg syncwg.WaitGroup
// resetCacheIfNeeded is a callback for automatic resetting of response
// cache if needed.
resetCacheIfNeeded func(mrs []storage.MetricRow)
}
func (vms *VMStorage) initStaleSnapshotsRemover() {
@@ -103,6 +118,7 @@ func (vms *VMStorage) initStaleSnapshotsRemover() {
func (vms *VMStorage) Stop() {
close(vms.staleSnapshotsRemoverCh)
vms.staleSnapshotsRemoverWG.Wait()
vms.wg.WaitAndBlock()
vms.s.MustClose()
}
@@ -111,6 +127,14 @@ func (vms *VMStorage) Stop() {
// The caller should limit the number of concurrent calls to WriteRows() in
// order to limit memory usage.
func (vms *VMStorage) WriteRows(rows []storage.MetricRow) error {
vms.wg.Add(1)
defer vms.wg.Done()
if vms.s.IsReadOnly() {
return errReadOnly
}
vms.resetCacheIfNeeded(rows)
vms.s.AddRows(rows, uint8(*precisionBits))
return nil
}
@@ -120,26 +144,41 @@ func (vms *VMStorage) WriteRows(rows []storage.MetricRow) error {
// The caller should limit the number of concurrent calls to WriteMetadata() in
// order to limit memory usage.
func (vms *VMStorage) WriteMetadata(rows []metricsmetadata.Row) error {
vms.wg.Add(1)
defer vms.wg.Done()
if vms.s.IsReadOnly() {
return errReadOnly
}
vms.s.AddMetadataRows(rows)
return nil
}
var errReadOnly = errors.New("the storage is in read-only mode; check -storage.minFreeDiskSpaceBytes command-line flag value")
// IsReadOnly returns true is the storage is in read-only mode.
func (vms *VMStorage) IsReadOnly() bool {
vms.wg.Add(1)
defer vms.wg.Done()
return vms.s.IsReadOnly()
}
func (vms *VMStorage) InitSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (vmselectapi.BlockIterator, error) {
vms.wg.Add(1)
tr := sq.GetTimeRange()
maxMetrics := vms.getMaxMetrics(sq.MaxMetrics)
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
vms.wg.Done()
return nil, err
}
if len(tfss) == 0 {
vms.wg.Done()
return nil, fmt.Errorf("missing tag filters")
}
bi := getBlockIterator()
bi.wgDone = vms.wg.Done
bi.sr.Init(qt, vms.s, tfss, tr, maxMetrics, deadline)
if err := bi.sr.Error(); err != nil {
bi.MustClose()
@@ -161,8 +200,9 @@ func (vms *VMStorage) getMaxMetrics(searchQueryLimit int) int {
// blockIterator implements vmselectapi.BlockIterator
type blockIterator struct {
sr storage.Search
mb storage.MetricBlock
sr storage.Search
mb storage.MetricBlock
wgDone func()
}
var blockIteratorsPool sync.Pool
@@ -171,6 +211,8 @@ func (bi *blockIterator) MustClose() {
bi.sr.MustClose()
bi.mb.MetricName = nil
bi.mb.Block.Reset()
bi.wgDone()
bi.wgDone = nil
blockIteratorsPool.Put(bi)
}
@@ -197,8 +239,63 @@ func (bi *blockIterator) Error() error {
return bi.sr.Error()
}
// GetSearch sets up an instance of storage search and returns it to the caller
// along with the max series count that the search can return.
//
// This method is not part of the vmselectapi.API and must only be used by
// vmsingle HTTP handlers.
//
// Callers of this method must call PutSearch() once the search instance is not
// needed anymore.
func (vms *VMStorage) GetSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (*storage.Search, int, error) {
vms.wg.Add(1)
tr := sq.GetTimeRange()
maxMetrics := vms.getMaxMetrics(sq.MaxMetrics)
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
vms.wg.Done()
return nil, 0, err
}
sr := getSearch()
maxSeriesCount := sr.Init(qt, vms.s, tfss, tr, sq.MaxMetrics, deadline)
return sr, maxSeriesCount, nil
}
// PutSearch resets the search once it is not needed anymore and puts it aside
// for future reuse.
//
// This method is not part of the vmselectapi.API and must only be used by
// vmsingle HTTP handlers.
//
// The method must only be used on search instances that have been created with
// GetSearch().
func (vms *VMStorage) PutSearch(sr *storage.Search) {
putSearch(sr)
vms.wg.Done()
}
func getSearch() *storage.Search {
v := ssPool.Get()
if v == nil {
return &storage.Search{}
}
return v.(*storage.Search)
}
func putSearch(sr *storage.Search) {
sr.MustClose()
ssPool.Put(sr)
}
var ssPool sync.Pool
// SearchMetricNames returns metric names for the given tfss on the given tr.
func (vms *VMStorage) SearchMetricNames(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) ([]string, error) {
vms.wg.Add(1)
defer vms.wg.Done()
tr := sq.GetTimeRange()
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
@@ -219,6 +316,9 @@ func (vms *VMStorage) SearchMetricNames(qt *querytracer.Tracer, sq *storage.Sear
// SearchLabelValues searches for label values for the given labelName, tfss and
// tr.
func (vms *VMStorage) LabelValues(qt *querytracer.Tracer, sq *storage.SearchQuery, labelName string, maxLabelValues int, deadline uint64) ([]string, error) {
vms.wg.Add(1)
defer vms.wg.Done()
tr := sq.GetTimeRange()
if maxLabelValues <= 0 || maxLabelValues > *maxTagValues {
maxLabelValues = *maxTagValues
@@ -244,6 +344,9 @@ func (vms *VMStorage) LabelValues(qt *querytracer.Tracer, sq *storage.SearchQuer
// similar APIs.
func (vms *VMStorage) TagValueSuffixes(qt *querytracer.Tracer, _, _ uint32, tr storage.TimeRange, tagKey, tagValuePrefix string, delimiter byte,
maxSuffixes int, deadline uint64) ([]string, error) {
vms.wg.Add(1)
defer vms.wg.Done()
if maxSuffixes <= 0 || maxSuffixes > *maxTagValueSuffixesPerSearch {
maxSuffixes = *maxTagValueSuffixesPerSearch
}
@@ -260,6 +363,9 @@ func (vms *VMStorage) TagValueSuffixes(qt *querytracer.Tracer, _, _ uint32, tr s
// SearchLabelNames searches for tag keys matching the given tfss on tr.
func (vms *VMStorage) LabelNames(qt *querytracer.Tracer, sq *storage.SearchQuery, maxLabelNames int, deadline uint64) ([]string, error) {
vms.wg.Add(1)
defer vms.wg.Done()
tr := sq.GetTimeRange()
if maxLabelNames <= 0 || maxLabelNames > *maxTagKeys {
maxLabelNames = *maxTagKeys
@@ -278,6 +384,8 @@ func (vms *VMStorage) LabelNames(qt *querytracer.Tracer, sq *storage.SearchQuery
}
func (vms *VMStorage) SeriesCount(_ *querytracer.Tracer, _, _ uint32, deadline uint64) (uint64, error) {
vms.wg.Add(1)
defer vms.wg.Done()
return vms.s.GetSeriesCount(deadline)
}
@@ -287,6 +395,9 @@ func (vms *VMStorage) Tenants(_ *querytracer.Tracer, _ storage.TimeRange, _ uint
// GetTSDBStatus returns TSDB status for given filters on the given date.
func (vms *VMStorage) TSDBStatus(qt *querytracer.Tracer, sq *storage.SearchQuery, focusLabel string, topN int, deadline uint64) (*storage.TSDBStatus, error) {
vms.wg.Add(1)
defer vms.wg.Done()
tr := sq.GetTimeRange()
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
@@ -306,6 +417,9 @@ func (vms *VMStorage) TSDBStatus(qt *querytracer.Tracer, sq *storage.SearchQuery
//
// Returns the number of deleted series.
func (vms *VMStorage) DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (int, error) {
vms.wg.Add(1)
defer vms.wg.Done()
tr := sq.GetTimeRange()
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
@@ -324,17 +438,26 @@ func (vms *VMStorage) DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQue
}
func (vms *VMStorage) RegisterMetricNames(qt *querytracer.Tracer, mrs []storage.MetricRow, _ uint64) error {
vms.wg.Add(1)
defer vms.wg.Done()
vms.s.RegisterMetricNames(qt, mrs)
return nil
}
// GetMetricNamesUsageStats returns metric name usage stats.
func (vms *VMStorage) GetMetricNamesUsageStats(qt *querytracer.Tracer, _ *storage.TenantToken, limit, le int, matchPattern string, _ uint64) (metricnamestats.StatsResult, error) {
vms.wg.Add(1)
defer vms.wg.Done()
return vms.s.GetMetricNamesStats(qt, limit, le, matchPattern), nil
}
// ResetMetricNamesStats resets state for metric names usage tracker
func (vms *VMStorage) ResetMetricNamesUsageStats(qt *querytracer.Tracer, _ uint64) error {
vms.wg.Add(1)
defer vms.wg.Done()
vms.s.ResetMetricNamesStats(qt)
return nil
}
@@ -371,6 +494,8 @@ func (vms *VMStorage) setupTfss(qt *querytracer.Tracer, sq *storage.SearchQuery,
}
func (vms *VMStorage) GetMetadataRecords(qt *querytracer.Tracer, _ *storage.TenantToken, limit int, metricName string, _ uint64) ([]*metricsmetadata.Row, error) {
vms.wg.Add(1)
defer vms.wg.Done()
return vms.s.GetMetadataRows(qt, limit, metricName), nil
}

View File

@@ -1,213 +0,0 @@
package vmstorage
import (
"errors"
"fmt"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricnamestats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/syncwg"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmselectapi"
)
// newVMStorageSingleNode creates a new instance of of VMStorage for vmsingle.
func newVMStorageSingleNode(s *storage.Storage, maxConcurrentRequests int, resetCacheIfNeeded func(mrs []storage.MetricRow)) *VMStorageSingleNode {
vms := newVMStorage(s, maxConcurrentRequests)
return &VMStorageSingleNode{
vms: vms,
wg: syncwg.WaitGroup{},
resetCacheIfNeeded: resetCacheIfNeeded,
}
}
type VMStorageSingleNode struct {
vms *VMStorage
// wg is used to wrap every storage call into wg.Add(1) ... wg.Done()
// for proper graceful shutdown when Stop is called.
//
// Use syncwg instead of sync, since Add is called from concurrent
// goroutines.
wg syncwg.WaitGroup
// resetCacheIfNeeded is a callback for automatic resetting of response
// cache if needed.
resetCacheIfNeeded func(mrs []storage.MetricRow)
}
func (vmssn *VMStorageSingleNode) Stop() {
vmssn.wg.WaitAndBlock()
vmssn.vms.Stop()
}
// WriteRows writes metric rows to the storage.
//
// Returns an error if the storage is in read-only mode.
//
// The caller should limit the number of concurrent calls to WriteRows() in
// order to limit memory usage.
func (vmssn *VMStorageSingleNode) WriteRows(rows []storage.MetricRow) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
if vmssn.vms.IsReadOnly() {
return errReadOnly
}
vmssn.resetCacheIfNeeded(rows)
return vmssn.vms.WriteRows(rows)
}
// WriteMetadata writes metrics metadata to storage.
//
// Returns an error if the storage is in read-only mode.
//
// The caller should limit the number of concurrent calls to WriteMetadata() in
// order to limit memory usage.
func (vmssn *VMStorageSingleNode) WriteMetadata(rows []metricsmetadata.Row) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
if vmssn.vms.IsReadOnly() {
return errReadOnly
}
return vmssn.vms.WriteMetadata(rows)
}
var errReadOnly = errors.New("the storage is in read-only mode; check -storage.minFreeDiskSpaceBytes command-line flag value")
func (vmssn *VMStorageSingleNode) IsReadOnly() bool {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.IsReadOnly()
}
func (vmssn *VMStorageSingleNode) InitSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (vmselectapi.BlockIterator, error) {
return nil, fmt.Errorf("not implemented in vmsingle")
}
// GetSearch sets up an instance of storage search and returns it to the caller
// along with the max series count that the search can return.
//
// This method is not part of the vmselectapi.API and must only be used by
// vmsingle HTTP handlers.
//
// Callers of this method must call PutSearch() once the search instance is not
// needed anymore.
func (vmssn *VMStorageSingleNode) GetSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (*storage.Search, int, error) {
vmssn.wg.Add(1)
tr := sq.GetTimeRange()
maxMetrics := vmssn.vms.getMaxMetrics(sq.MaxMetrics)
tfss, err := vmssn.vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
vmssn.wg.Done()
return nil, 0, err
}
sr := getSearch()
maxSeriesCount := sr.Init(qt, vmssn.vms.s, tfss, tr, sq.MaxMetrics, deadline)
return sr, maxSeriesCount, nil
}
// PutSearch resets the search once it is not needed anymore and puts it aside
// for future reuse.
//
// This method is not part of the vmselectapi.API and must only be used by
// vmsingle HTTP handlers.
//
// The method must only be used on search instances that have been created with
// GetSearch().
func (vmssn *VMStorageSingleNode) PutSearch(sr *storage.Search) {
putSearch(sr)
vmssn.wg.Done()
}
func getSearch() *storage.Search {
v := ssPool.Get()
if v == nil {
return &storage.Search{}
}
return v.(*storage.Search)
}
func putSearch(sr *storage.Search) {
sr.MustClose()
ssPool.Put(sr)
}
var ssPool sync.Pool
func (vmssn *VMStorageSingleNode) SearchMetricNames(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.SearchMetricNames(qt, sq, deadline)
}
func (vmssn *VMStorageSingleNode) LabelValues(qt *querytracer.Tracer, sq *storage.SearchQuery, labelName string, maxLabelValues int, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.LabelValues(qt, sq, labelName, maxLabelValues, deadline)
}
func (vmssn *VMStorageSingleNode) TagValueSuffixes(qt *querytracer.Tracer, accountID, projectID uint32, tr storage.TimeRange, tagKey, tagValuePrefix string, delimiter byte, maxSuffixes int, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.TagValueSuffixes(qt, accountID, projectID, tr, tagKey, tagValuePrefix, delimiter, maxSuffixes, deadline)
}
func (vmssn *VMStorageSingleNode) LabelNames(qt *querytracer.Tracer, sq *storage.SearchQuery, maxLabelNames int, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.LabelNames(qt, sq, maxLabelNames, deadline)
}
func (vmssn *VMStorageSingleNode) SeriesCount(qt *querytracer.Tracer, accountID, projectID uint32, deadline uint64) (uint64, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.SeriesCount(qt, accountID, projectID, deadline)
}
func (vmssn *VMStorageSingleNode) Tenants(qt *querytracer.Tracer, tr storage.TimeRange, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.Tenants(qt, tr, deadline)
}
func (vmssn *VMStorageSingleNode) TSDBStatus(qt *querytracer.Tracer, sq *storage.SearchQuery, focusLabel string, topN int, deadline uint64) (*storage.TSDBStatus, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.TSDBStatus(qt, sq, focusLabel, topN, deadline)
}
func (vmssn *VMStorageSingleNode) DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (int, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.DeleteSeries(qt, sq, deadline)
}
func (vmssn *VMStorageSingleNode) RegisterMetricNames(qt *querytracer.Tracer, mrs []storage.MetricRow, deadline uint64) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.RegisterMetricNames(qt, mrs, deadline)
}
func (vmssn *VMStorageSingleNode) GetMetricNamesUsageStats(qt *querytracer.Tracer, tt *storage.TenantToken, limit, le int, matchPattern string, deadline uint64) (metricnamestats.StatsResult, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.GetMetricNamesUsageStats(qt, tt, limit, le, matchPattern, deadline)
}
func (vmssn *VMStorageSingleNode) ResetMetricNamesUsageStats(qt *querytracer.Tracer, deadline uint64) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.ResetMetricNamesUsageStats(qt, deadline)
}
func (vmssn *VMStorageSingleNode) GetMetadataRecords(qt *querytracer.Tracer, tt *storage.TenantToken, limit int, metricName string, deadline uint64) ([]*metricsmetadata.Row, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.GetMetadataRecords(qt, tt, limit, metricName, deadline)
}

View File

@@ -48,7 +48,7 @@ func TestGetMaxMetrics(t *testing.T) {
t.Helper()
*maxUniqueTimeseries = storageMaxUniqueTimeseries
s := storage.MustOpenStorage(t.Name(), storage.OpenOptions{})
vms := newVMStorage(s, maxConcurrentRequests)
vms := newVMStorage(s, maxConcurrentRequests, func(mrs []storage.MetricRow) {})
defer vms.Stop()
maxMetrics := vms.getMaxMetrics(searchQueryLimit)
if maxMetrics != expect {

View File

@@ -6,7 +6,7 @@ COPY web/ /build/
RUN GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o web-amd64 github.com/VictoriMetrics/vmui/ && \
GOOS=windows GOARCH=amd64 CGO_ENABLED=0 go build -o web-windows github.com/VictoriMetrics/vmui/
FROM alpine:3.23.4
FROM alpine:3.24.1
USER root
COPY --from=build-web-stage /build/web-amd64 /app/web

View File

@@ -5,7 +5,7 @@ import uPlot from "uplot";
import Button from "../../Main/Button/Button";
import { CloseIcon, DragIcon } from "../../Main/Icons";
import { SeriesItemStatsFormatted } from "../../../types";
import { STATS_ORDER } from "../../../constants/graph";
import { STATS_ORDER_TOOLTIP } from "../../../constants/graph";
export interface ChartTooltipProps {
u?: uPlot;
@@ -164,7 +164,7 @@ const ChartTooltip: FC<ChartTooltipProps> = ({
</div>
{statsFormatted && (
<table className="vm-chart-tooltip-stats">
{STATS_ORDER.map((key, i) => (
{STATS_ORDER_TOOLTIP.map((key, i) => (
<div
className="vm-chart-tooltip-stats-row"
key={i}

View File

@@ -61,7 +61,7 @@ const LegendConfigs: FC<Props> = ({ data, isCompact }) => {
label: "Hide Statistics",
value: hideStats,
onChange: onChangeStats,
info: "If enabled, hides the display of min, median, and max values.",
info: "If enabled, hides the display of min, median, max, and last values.",
}
];

View File

@@ -5,7 +5,7 @@ import "./style.scss";
import classNames from "classnames";
import { getFreeFields } from "./helpers";
import useCopyToClipboard from "../../../../../hooks/useCopyToClipboard";
import { STATS_ORDER } from "../../../../../constants/graph";
import { STATS_ORDER_LEGEND } from "../../../../../constants/graph";
import { useShowStats } from "../hooks/useShowStats";
import { useLegendFormat } from "../hooks/useLegendFormat";
import { getLabelAlias } from "../../../../../utils/metric";
@@ -80,7 +80,7 @@ const LegendItem: FC<LegendItemProps> = ({ legend, onChange, duplicateFields })
</div>
{!hideStats && showStats && (
<div className="vm-legend-item-stats">
{STATS_ORDER.map((key, i) => (
{STATS_ORDER_LEGEND.map((key, i) => (
<div
className="vm-legend-item-stats-row"
key={i}

View File

@@ -4,11 +4,11 @@ import "./style.scss";
import { LegendItemType } from "../../../../../types";
import { MouseEvent } from "react";
import classNames from "classnames";
import { STATS_ORDER } from "../../../../../constants/graph";
import { STATS_ORDER_LEGEND } from "../../../../../constants/graph";
import { useShowStats } from "../hooks/useShowStats";
import { getValueByPath } from "../../../../../utils/object";
const statsColumns = STATS_ORDER.map(k => ({
const statsColumns = STATS_ORDER_LEGEND.map(k => ({
key: `statsFormatted.${k}`,
title: k
}));

View File

@@ -26,4 +26,5 @@ export const GRAPH_SIZES: GraphSize[] = [
},
];
export const STATS_ORDER: (keyof SeriesItemStatsFormatted)[] = ["min", "median", "max"];
export const STATS_ORDER_LEGEND: (keyof SeriesItemStatsFormatted)[] = ["min", "median", "max", "last"];
export const STATS_ORDER_TOOLTIP: (keyof SeriesItemStatsFormatted)[] = ["min", "median", "max"];

View File

@@ -4,6 +4,7 @@ export interface SeriesItemStatsFormatted {
min: string,
max: string,
median: string,
last: string,
}
export interface SeriesItem extends Series {

View File

@@ -53,6 +53,7 @@ const getSeriesStatistics = (d: MetricResult) => {
min: formatPrettyNumber(min, min, max),
max: formatPrettyNumber(max, min, max),
median: formatPrettyNumber(median, min, max),
last: formatPrettyNumber(values.at(-1), min, max),
},
};
};

View File

@@ -4,7 +4,7 @@ The `apptest` package contains the integration tests for the VictoriaMetrics
applications (such as vmstorage, vminsert, and vmselect).
An integration test aims at verifying the behavior of an application as a whole,
as apposed to a unit test that verifies the behavior of a building block of an
as opposed to a unit test that verifies the behavior of a building block of an
application.
To achieve that an integration test starts an application in a separate process
@@ -19,10 +19,10 @@ work together as a system.
The package provides a collection of helpers to start applications and make
queries to them:
- `app.go` - contains the generic code for staring an application and should
- `app.go` - contains the generic code for starting an application and should
not be used by integration tests directly.
- `{vmstorage,vminsert,etc}.go` - build on top of `app.go` and provide the
code for staring a specific application.
code for starting a specific application.
- `client.go` - provides helper functions for sending HTTP requests to
applications.
@@ -36,7 +36,7 @@ the application binary files to be built and put into the `bin` directory. The
build rule used for running integration tests, `make apptest`,
accounts for that, it builds all application binaries before running the tests.
But if you want to run the tests without `make`, i.e. by executing
`go test ./app/apptest`, you will need to build the binaries first (for example,
`go test ./apptest/tests`, you will need to build the binaries first (for example,
by executing `make all`).
Not all binaries can be built from `master` branch, cluster binaries can be built

View File

@@ -156,14 +156,14 @@ func readAllAndClose(t *testing.T, responseBody io.ReadCloser) string {
//
// This type is expected to be embedded by the apps that serve metrics.
type metricsClient struct {
metricsCli *Client
url string
cli *Client
url string
}
func newMetricsClient(cli *Client, addr string) *metricsClient {
return &metricsClient{
metricsCli: cli,
url: fmt.Sprintf("http://%s/metrics", addr),
cli: cli,
url: fmt.Sprintf("http://%s/metrics", addr),
}
}
@@ -179,7 +179,7 @@ func (c *metricsClient) GetIntMetric(t *testing.T, metricName string) int {
func (c *metricsClient) GetMetric(t *testing.T, metricName string) float64 {
t.Helper()
metrics, statusCode := c.metricsCli.Get(t, c.url, nil)
metrics, statusCode := c.cli.Get(t, c.url, nil)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -205,7 +205,7 @@ func (c *metricsClient) GetMetricsByPrefix(t *testing.T, prefix string) []float6
values := []float64{}
metrics, statusCode := c.metricsCli.Get(t, c.url, nil)
metrics, statusCode := c.cli.Get(t, c.url, nil)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -234,7 +234,7 @@ func (c *metricsClient) GetMetricsByRegexp(t *testing.T, re *regexp.Regexp) []fl
values := []float64{}
metrics, statusCode := c.metricsCli.Get(t, c.url, nil)
metrics, statusCode := c.cli.Get(t, c.url, nil)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -270,7 +270,7 @@ func (c *metricsClient) rpcRowsSentTotal(t *testing.T) int {
}
type vmselectClient struct {
vmselectCli *Client
cli *Client
url func(op, path string, opts QueryOpts) string
metricNamesStatsResetURL string
tenantsURL string
@@ -287,7 +287,7 @@ func (c *vmselectClient) PrometheusAPIV1Export(t *testing.T, query string, opts
values := opts.asURLValues()
values.Add("match[]", query)
values.Add("format", "promapi")
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1QueryResponse(t, res)
}
@@ -302,7 +302,7 @@ func (c *vmselectClient) PrometheusAPIV1ExportNative(t *testing.T, query string,
values := opts.asURLValues()
values.Add("match[]", query)
values.Add("format", "promapi")
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return []byte(res)
}
@@ -315,7 +315,7 @@ func (c *vmselectClient) PrometheusAPIV1Query(t *testing.T, query string, opts Q
url := c.url("select", "prometheus/api/v1/query", opts)
values := opts.asURLValues()
values.Add("query", query)
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1QueryResponse(t, res)
}
@@ -329,7 +329,7 @@ func (c *vmselectClient) PrometheusAPIV1QueryRange(t *testing.T, query string, o
url := c.url("select", "prometheus/api/v1/query_range", opts)
values := opts.asURLValues()
values.Add("query", query)
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1QueryResponse(t, res)
}
@@ -342,7 +342,7 @@ func (c *vmselectClient) PrometheusAPIV1Series(t *testing.T, matchQuery string,
url := c.url("select", "prometheus/api/v1/series", opts)
values := opts.asURLValues()
values.Add("match[]", matchQuery)
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1SeriesResponse(t, res)
}
@@ -354,7 +354,7 @@ func (c *vmselectClient) PrometheusAPIV1SeriesCount(t *testing.T, opts QueryOpts
t.Helper()
url := c.url("select", "prometheus/api/v1/series/count", opts)
values := opts.asURLValues()
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1SeriesCountResponse(t, res)
}
@@ -367,7 +367,7 @@ func (c *vmselectClient) PrometheusAPIV1Labels(t *testing.T, matchQuery string,
url := c.url("select", "prometheus/api/v1/labels", opts)
values := opts.asURLValues()
values.Add("match[]", matchQuery)
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1LabelsResponse(t, res)
}
@@ -382,7 +382,7 @@ func (c *vmselectClient) PrometheusAPIV1LabelValues(t *testing.T, labelName, mat
url := c.url("select", path, opts)
values := opts.asURLValues()
values.Add("match[]", matchQuery)
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1LabelValuesResponse(t, res)
}
@@ -394,7 +394,7 @@ func (c *vmselectClient) PrometheusAPIV1Metadata(t *testing.T, metric string, li
values := opts.asURLValues()
values.Add("metric", metric)
values.Add("limit", strconv.Itoa(limit))
res, _ := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, _ := c.cli.PostForm(t, url, values, opts.Headers)
return NewPrometheusAPIV1Metadata(t, res)
}
@@ -408,7 +408,7 @@ func (c *vmselectClient) PrometheusAPIV1AdminTSDBDeleteSeries(t *testing.T, matc
url := c.url("delete", "prometheus/api/v1/admin/tsdb/delete_series", opts)
values := opts.asURLValues()
values.Add("match[]", matchQuery)
res, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusNoContent, res)
}
@@ -426,7 +426,7 @@ func (c *vmselectClient) PrometheusAPIV1StatusMetricNamesStats(t *testing.T, lim
values.Add("limit", limit)
values.Add("le", le)
values.Add("match_pattern", matchPattern)
res, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, res)
}
@@ -455,7 +455,7 @@ func (c *vmselectClient) PrometheusAPIV1StatusTSDB(t *testing.T, matchQuery stri
addNonEmpty("match[]", matchQuery)
addNonEmpty("topN", topN)
addNonEmpty("date", date)
res, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
res, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, res)
}
@@ -476,7 +476,7 @@ func (c *vmselectClient) GraphiteMetricsIndex(t *testing.T, opts QueryOpts) Grap
t.Helper()
url := c.url("select", "graphite/metrics/index.json", opts)
res, statusCode := c.vmselectCli.Get(t, url, opts.Headers)
res, statusCode := c.cli.Get(t, url, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, res)
}
@@ -499,7 +499,7 @@ func (c *vmselectClient) GraphiteMetricsFind(t *testing.T, query string, opts Qu
url := c.url("select", "graphite/metrics/find", opts)
values := opts.asURLValues()
values.Add("query", query)
resText, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
resText, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, resText)
}
@@ -522,7 +522,7 @@ func (c *vmselectClient) GraphiteMetricsExpand(t *testing.T, query string, opts
url := c.url("select", "graphite/metrics/expand", opts)
values := opts.asURLValues()
values.Add("query", query)
resText, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
resText, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, resText)
}
@@ -546,7 +546,7 @@ func (c *vmselectClient) GraphiteRender(t *testing.T, target string, opts QueryO
values := opts.asURLValues()
values.Add("format", "json")
values.Add("target", target)
resText, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
resText, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, resText)
}
@@ -567,7 +567,7 @@ func (c *vmselectClient) GraphiteTagsTagSeries(t *testing.T, record string, opts
url := c.url("select", "graphite/tags/tagSeries", opts)
values := opts.asURLValues()
values.Add("path", record)
_, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
_, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if got, want := statusCode, http.StatusNotImplemented; got != want {
t.Fatalf("unexpected status code: got %d, want %d", got, want)
}
@@ -584,7 +584,7 @@ func (c *vmselectClient) GraphiteTagsTagMultiSeries(t *testing.T, records []stri
for _, rec := range records {
values.Add("path", rec)
}
_, statusCode := c.vmselectCli.PostForm(t, url, values, opts.Headers)
_, statusCode := c.cli.PostForm(t, url, values, opts.Headers)
if got, want := statusCode, http.StatusNotImplemented; got != want {
t.Fatalf("unexpected status code: got %d, want %d", got, want)
}
@@ -598,7 +598,7 @@ func (c *vmselectClient) GraphiteTagsTagMultiSeries(t *testing.T, records []stri
func (c *vmselectClient) PrometheusAPIV1AdminStatusMetricNamesStatsReset(t *testing.T, opts QueryOpts) {
t.Helper()
values := opts.asURLValues()
res, statusCode := c.vmselectCli.PostForm(t, c.metricNamesStatsResetURL, values, opts.Headers)
res, statusCode := c.cli.PostForm(t, c.metricNamesStatsResetURL, values, opts.Headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusNoContent, res)
}
@@ -608,7 +608,7 @@ func (c *vmselectClient) PrometheusAPIV1AdminStatusMetricNamesStatsReset(t *test
// /admin/tenants endpoint.
func (c *vmselectClient) APIV1AdminTenants(t *testing.T, opts QueryOpts) *AdminTenantsResponse {
t.Helper()
res, statusCode := c.vmselectCli.Get(t, c.tenantsURL, opts.Headers)
res, statusCode := c.cli.Get(t, c.tenantsURL, opts.Headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", statusCode, http.StatusOK, res)
}
@@ -622,7 +622,7 @@ func (c *vmselectClient) APIV1AdminTenants(t *testing.T, opts QueryOpts) *AdminT
}
type vminsertClient struct {
vminsertCli *Client
cli *Client
url func(op, path string, opts QueryOpts) string
openTSDBURL func(op, path string, opts QueryOpts) string
graphiteListenAddr string
@@ -647,7 +647,7 @@ func (c *vminsertClient) PrometheusAPIV1ImportCSV(t *testing.T, records []string
headers := opts.getHeaders()
headers.Set("Content-Type", "text/plain")
c.sendBlocking(t, len(records), func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusNoContent)
}
@@ -671,7 +671,7 @@ func (c *vminsertClient) PrometheusAPIV1ImportNative(t *testing.T, data []byte,
headers := opts.getHeaders()
headers.Set("Content-Type", "text/plain")
c.sendBlocking(t, 1, func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusNoContent)
}
@@ -693,7 +693,7 @@ func (c *vminsertClient) PrometheusAPIV1Write(t *testing.T, wr prompb.WriteReque
headers := opts.getHeaders()
headers.Set("Content-Type", "application/x-protobuf")
c.sendBlocking(t, recordsCount, func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusNoContent)
}
@@ -745,7 +745,7 @@ func (c *vminsertClient) PrometheusAPIV1ImportPrometheus(t *testing.T, records [
headers := opts.getHeaders()
headers.Set("Content-Type", "text/plain")
c.sendBlocking(t, recordsCount, func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusNoContent)
}
@@ -771,7 +771,7 @@ func (c *vminsertClient) InfluxWrite(t *testing.T, records []string, opts QueryO
headers.Set("Content-Type", "text/plain")
c.sendBlocking(t, len(records), func() {
t.Helper()
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusNoContent)
}
@@ -805,7 +805,7 @@ func (c *vminsertClient) OpentelemetryV1Metrics(t *testing.T, md otlppb.MetricsD
headers := opts.getHeaders()
headers.Set("Content-Type", "application/x-protobuf")
c.sendBlocking(t, recordsCount, func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -830,7 +830,7 @@ func (c *vminsertClient) OpenTSDBAPIPut(t *testing.T, records []string, opts Que
headers := opts.getHeaders()
headers.Set("Content-Type", "application/json")
c.sendBlocking(t, len(records), func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusNoContent {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusNoContent)
}
@@ -853,7 +853,7 @@ func (c *vminsertClient) ZabbixConnectorHistory(t *testing.T, records []string,
headers := opts.getHeaders()
headers.Set("Content-Type", "application/json")
c.sendBlocking(t, len(records), func() {
_, statusCode := c.vminsertCli.Post(t, url, data, headers)
_, statusCode := c.cli.Post(t, url, data, headers)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -867,11 +867,11 @@ func (c *vminsertClient) ZabbixConnectorHistory(t *testing.T, records []string,
// See https://docs.victoriametrics.com/victoriametrics/integrations/graphite/#ingesting
func (c *vminsertClient) GraphiteWrite(t *testing.T, records []string, _ QueryOpts) {
t.Helper()
c.vminsertCli.Write(t, c.graphiteListenAddr, records)
c.cli.Write(t, c.graphiteListenAddr, records)
}
type vmstorageClient struct {
vmstorageCli *Client
cli *Client
httpListenAddr string
}
@@ -881,7 +881,7 @@ func (c *vmstorageClient) ForceFlush(t *testing.T) {
t.Helper()
url := fmt.Sprintf("http://%s/internal/force_flush", c.httpListenAddr)
_, statusCode := c.vmstorageCli.Get(t, url, nil)
_, statusCode := c.cli.Get(t, url, nil)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -892,7 +892,7 @@ func (c *vmstorageClient) ForceMerge(t *testing.T) {
t.Helper()
url := fmt.Sprintf("http://%s/internal/force_merge", c.httpListenAddr)
_, statusCode := c.vmstorageCli.Get(t, url, nil)
_, statusCode := c.cli.Get(t, url, nil)
if statusCode != http.StatusOK {
t.Fatalf("unexpected status code: got %d, want %d", statusCode, http.StatusOK)
}
@@ -905,7 +905,7 @@ func (c *vmstorageClient) ForceMerge(t *testing.T) {
func (c *vmstorageClient) SnapshotCreate(t *testing.T) *SnapshotCreateResponse {
t.Helper()
data, statusCode := c.vmstorageCli.Post(t, c.SnapshotCreateURL(), nil, nil)
data, statusCode := c.cli.Post(t, c.SnapshotCreateURL(), nil, nil)
if got, want := statusCode, http.StatusOK; got != want {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", got, want, data)
}
@@ -931,7 +931,7 @@ func (c *vmstorageClient) APIV1AdminTSDBSnapshot(t *testing.T) *APIV1AdminTSDBSn
t.Helper()
url := fmt.Sprintf("http://%s/api/v1/admin/tsdb/snapshot", c.httpListenAddr)
data, statusCode := c.vmstorageCli.Post(t, url, nil, nil)
data, statusCode := c.cli.Post(t, url, nil, nil)
if got, want := statusCode, http.StatusOK; got != want {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", got, want, data)
}
@@ -952,7 +952,7 @@ func (c *vmstorageClient) SnapshotList(t *testing.T) *SnapshotListResponse {
t.Helper()
url := fmt.Sprintf("http://%s/snapshot/list", c.httpListenAddr)
data, statusCode := c.vmstorageCli.Get(t, url, nil)
data, statusCode := c.cli.Get(t, url, nil)
if got, want := statusCode, http.StatusOK; got != want {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", got, want, data)
}
@@ -973,7 +973,7 @@ func (c *vmstorageClient) SnapshotDelete(t *testing.T, snapshotName string) *Sna
t.Helper()
url := fmt.Sprintf("http://%s/snapshot/delete?snapshot=%s", c.httpListenAddr, snapshotName)
data, statusCode := c.vmstorageCli.Delete(t, url)
data, statusCode := c.cli.Delete(t, url)
wantStatusCodes := map[int]bool{
http.StatusOK: true,
http.StatusInternalServerError: true,
@@ -998,7 +998,7 @@ func (c *vmstorageClient) SnapshotDeleteAll(t *testing.T) *SnapshotDeleteAllResp
t.Helper()
url := fmt.Sprintf("http://%s/snapshot/delete_all", c.httpListenAddr)
data, statusCode := c.vmstorageCli.Post(t, url, nil, nil)
data, statusCode := c.cli.Post(t, url, nil, nil)
if got, want := statusCode, http.StatusOK; got != want {
t.Fatalf("unexpected status code: got %d, want %d, resp text=%q", got, want, data)
}

View File

@@ -45,11 +45,13 @@ func TestSingleMetricsMetadata(t *testing.T) {
{Labels: []prompb.Label{{Name: "__name__", Value: "metric_name_4"}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
{Labels: []prompb.Label{{Name: "__name__", Value: "metric_name_5"}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
{Labels: []prompb.Label{{Name: "__name__", Value: "metric_name_6"}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
{Labels: []prompb.Label{{Name: "__name__", Value: `metric_name_7_!@"_suffix`}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
},
Metadata: []prompb.MetricMetadata{
{MetricFamilyName: "metric_name_4", Help: "some help message", Type: prompb.MetricTypeSummary},
{MetricFamilyName: "metric_name_5", Help: "some help message", Type: prompb.MetricTypeSummary},
{MetricFamilyName: "metric_name_6", Help: "some help message", Type: prompb.MetricTypeStateset},
{MetricFamilyName: `metric_name_7_!@"_suffix`, Help: "some help message", Type: prompb.MetricTypeStateset},
},
}
@@ -59,12 +61,13 @@ func TestSingleMetricsMetadata(t *testing.T) {
expected := &apptest.PrometheusAPIV1Metadata{
Status: "success",
Data: map[string][]apptest.MetadataEntry{
"metric_name_1": {{Help: "some help message", Type: "gauge"}},
"metric_name_2": {{Help: "some help message", Type: "counter"}},
"metric_name_3": {{Help: "some help message", Type: "gauge"}},
"metric_name_4": {{Help: "some help message", Type: "summary"}},
"metric_name_5": {{Help: "some help message", Type: "summary"}},
"metric_name_6": {{Help: "some help message", Type: "stateset"}},
"metric_name_1": {{Help: "some help message", Type: "gauge"}},
"metric_name_2": {{Help: "some help message", Type: "counter"}},
"metric_name_3": {{Help: "some help message", Type: "gauge"}},
"metric_name_4": {{Help: "some help message", Type: "summary"}},
"metric_name_5": {{Help: "some help message", Type: "summary"}},
"metric_name_6": {{Help: "some help message", Type: "stateset"}},
`metric_name_7_!@"_suffix`: {{Help: "some help message", Type: "stateset"}},
},
}
gotStats := sut.PrometheusAPIV1Metadata(t, "", 0, apptest.QueryOpts{})
@@ -154,11 +157,13 @@ func TestClusterMetricsMetadata(t *testing.T) {
{Labels: []prompb.Label{{Name: "__name__", Value: "metric_name_4"}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
{Labels: []prompb.Label{{Name: "__name__", Value: "metric_name_5"}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
{Labels: []prompb.Label{{Name: "__name__", Value: "metric_name_6"}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
{Labels: []prompb.Label{{Name: "__name__", Value: `metric_name_7_!@"_suffix`}}, Samples: []prompb.Sample{{Value: 40, Timestamp: ingestTimestamp}}},
},
Metadata: []prompb.MetricMetadata{
{MetricFamilyName: "metric_name_4", Help: "some help message", Type: prompb.MetricTypeSummary},
{MetricFamilyName: "metric_name_5", Help: "some help message", Type: prompb.MetricTypeSummary},
{MetricFamilyName: "metric_name_6", Help: "some help message", Type: prompb.MetricTypeStateset},
{MetricFamilyName: `metric_name_7_!@"_suffix`, Help: "some help message", Type: prompb.MetricTypeStateset},
},
}
@@ -171,12 +176,13 @@ func TestClusterMetricsMetadata(t *testing.T) {
expected := &apptest.PrometheusAPIV1Metadata{
Status: "success",
Data: map[string][]apptest.MetadataEntry{
"metric_name_1": {{Help: "some help message", Type: "gauge"}},
"metric_name_2": {{Help: "some help message", Type: "counter"}},
"metric_name_3": {{Help: "some help message", Type: "gauge"}},
"metric_name_4": {{Help: "some help message", Type: "summary"}},
"metric_name_5": {{Help: "some help message", Type: "summary"}},
"metric_name_6": {{Help: "some help message", Type: "stateset"}},
"metric_name_1": {{Help: "some help message", Type: "gauge"}},
"metric_name_2": {{Help: "some help message", Type: "counter"}},
"metric_name_3": {{Help: "some help message", Type: "gauge"}},
"metric_name_4": {{Help: "some help message", Type: "summary"}},
"metric_name_5": {{Help: "some help message", Type: "summary"}},
"metric_name_6": {{Help: "some help message", Type: "stateset"}},
`metric_name_7_!@"_suffix`: {{Help: "some help message", Type: "stateset"}},
},
}
gotStats := vmselect.PrometheusAPIV1Metadata(t, "", 0, apptest.QueryOpts{Tenant: tenantID})

View File

@@ -8,6 +8,7 @@ import (
"net/http/httptest"
"strings"
"sync"
"sync/atomic"
"testing"
"time"
@@ -332,13 +333,11 @@ func TestSingleVMAgentDropOnOverload(t *testing.T) {
vmagent.APIV1ImportPrometheusNoWaitFlush(t, []string{
"foo_bar 1 1652169600000", // 2022-05-10T08:00:00Z
}, apptest.QueryOpts{})
waitFor(
func() bool {
return vmagent.RemoteWriteRequests(t, url1) == 1 && vmagent.RemoteWriteRequests(t, url2) == 1
},
)
// Send 2 more requests, the first RW endpoint should receive everything, the second should add them to the queue
// since worker is busy with the first request.
for i := range 2 {
@@ -641,3 +640,116 @@ func TestSingleVMAgentMultitenancy(t *testing.T) {
t.Fatalf("expected vmagent_tenant_inserted_rows_total to have value 1 for accountID=5, projectID=0")
}
}
func TestSingleVMAgentPriorizeRecentData(t *testing.T) {
tc := apptest.NewTestCase(t)
defer tc.Stop()
remoteWriteSrv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNoContent)
}))
defer remoteWriteSrv.Close()
var mustRW2ReturnError atomic.Bool
mustRW2ReturnError.Store(true)
remoteWriteSrv2 := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if mustRW2ReturnError.Load() {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusNoContent)
}))
defer remoteWriteSrv2.Close()
vmagent := tc.MustStartDefaultRWVmagent("vmagent", []string{
fmt.Sprintf(`-remoteWrite.url=%s/api/v1/write`, remoteWriteSrv.URL),
fmt.Sprintf(`-remoteWrite.url=%s/api/v1/write`, remoteWriteSrv2.URL),
"-remoteWrite.disableOnDiskQueue=true",
// use only 1 worker to get a full queue faster
"-remoteWrite.queues=1",
"-remoteWrite.flushInterval=1ms",
"-remoteWrite.inmemoryQueues=1",
// fastqueue size is roughly memory.Allowed() / len(urls) / *maxRowsPerBlock / 100
// Use very large maxRowsPerBlock to get fastqueue of minimal length(2).
// See initRemoteWriteCtxs function in remotewrite.go for details.
"-remoteWrite.maxRowsPerBlock=1000000000",
"-remoteWrite.tmpDataPath=" + tc.Dir() + "/vmagent",
// Delay retry logic to avoid race conditions with waitFor assertions.
// It improves the test stability on resource-constrained runners.
"-remoteWrite.retryMinInterval=3s",
"-remoteWrite.retryMaxTime=3s",
})
const (
retries = 20
period = 200 * time.Millisecond
)
waitFor := func(f func() bool) {
t.Helper()
for range retries {
if f() {
return
}
time.Sleep(period)
}
t.Fatalf("timed out waiting for retry #%d", retries)
}
// Real remote write URLs are hidden in metrics
url1 := "1:secret-url"
url2 := "2:secret-url"
// Wait until first request got flushed to remote write server
vmagent.APIV1ImportPrometheusNoWaitFlush(t, []string{
"foo_bar 1 1652169600000", // 2022-05-10T08:00:00Z
}, apptest.QueryOpts{})
waitFor(
func() bool {
return vmagent.RemoteWriteRequests(t, url1) == 1 && vmagent.RemoteWriteRequests(t, url2) == 1
},
)
// Wait until second request got flushed to remote write server
// since there are 2 independent queues (general and in-memory) with minimal capacity of 1
vmagent.APIV1ImportPrometheusNoWaitFlush(t, []string{
"foo_bar 1 1652169600000", // 2022-05-10T08:00:00Z
}, apptest.QueryOpts{})
waitFor(
func() bool {
return vmagent.RemoteWriteRequests(t, url1) == 2 && vmagent.RemoteWriteRequests(t, url2) == 2
},
)
// Send 2 more requests, the first RW endpoint should receive everything, the second should add them to the queue
// since worker is busy with the first request.
for i := range 2 {
vmagent.APIV1ImportPrometheusNoWaitFlush(t, []string{
"foo_bar 1 1652169600000", // 2022-05-10T08:00:00Z
}, apptest.QueryOpts{})
waitFor(
func() bool {
return vmagent.RemoteWriteRequests(t, url1) == 3+i && vmagent.RemoteWritePendingInmemoryBlocks(t, url2) == 1+i
},
)
}
// Send one more request.
vmagent.APIV1ImportPrometheusNoWaitFlush(t, []string{
"foo_bar 1 1652169600000", // 2022-05-10T08:00:00Z
}, apptest.QueryOpts{})
waitFor(
func() bool {
return vmagent.RemoteWriteRequests(t, url1) == 5 && vmagent.RemoteWriteSamplesDropped(t, url2) > 0
},
)
mustRW2ReturnError.Store(false)
// ensure that inmemory data correctly flushed to the remote write
waitFor(
func() bool {
return vmagent.RemoteWritePendingInmemoryBlocks(t, url2) == 0
},
)
}

View File

@@ -77,7 +77,7 @@ type vminsertRuntimeValues struct {
func newVminsert(app *app, cli *Client, rt vminsertRuntimeValues) *Vminsert {
metricsClient := newMetricsClient(cli, rt.httpListenAddr)
vminsertClient := &vminsertClient{
vminsertCli: cli,
cli: cli,
url: func(op, path string, opts QueryOpts) string {
return getClusterPath(rt.httpListenAddr, op, path, opts)
},

View File

@@ -48,7 +48,7 @@ func newVmselect(app *app, cli *Client, rt vmselectRuntimeValues) *Vmselect {
app: app,
metricsClient: newMetricsClient(cli, rt.httpListenAddr),
vmselectClient: &vmselectClient{
vmselectCli: cli,
cli: cli,
url: func(op, path string, opts QueryOpts) string {
return getClusterPath(rt.httpListenAddr, op, path, opts)
},

View File

@@ -58,11 +58,11 @@ func newVmsingle(app *app, cli *Client, rt vmsingleRuntimeValues) *Vmsingle {
app: app,
metricsClient: newMetricsClient(cli, rt.httpListenAddr),
vmstorageClient: &vmstorageClient{
vmstorageCli: cli,
cli: cli,
httpListenAddr: rt.httpListenAddr,
},
vmselectClient: &vmselectClient{
vmselectCli: cli,
cli: cli,
url: func(op, path string, opts QueryOpts) string {
return fmt.Sprintf("http://%s/%s", rt.httpListenAddr, path)
},
@@ -70,7 +70,7 @@ func newVmsingle(app *app, cli *Client, rt vmsingleRuntimeValues) *Vmsingle {
tenantsURL: "vmsingle-does-not-serve-tenants",
},
vminsertClient: &vminsertClient{
vminsertCli: cli,
cli: cli,
url: func(_, path string, _ QueryOpts) string {
return fmt.Sprintf("http://%s/%s", rt.httpListenAddr, path)
},

View File

@@ -63,7 +63,7 @@ func newVmstorage(app *app, cli *Client, rt vmstorageRuntimeValues) *Vmstorage {
app: app,
metricsClient: newMetricsClient(cli, rt.httpListenAddr),
vmstorageClient: &vmstorageClient{
vmstorageCli: cli,
cli: cli,
httpListenAddr: rt.httpListenAddr,
},
storageDataPath: rt.storageDataPath,

View File

@@ -2083,7 +2083,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -2388,7 +2388,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -2084,7 +2084,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -2389,7 +2389,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -2165,7 +2165,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -6201,7 +6201,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "The rate of ignored samples during aggregation. \nStream aggregation will drop samples with NaN values, or samples with too old timestamps. See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#ignoring-old-samples ",
"description": "The rate of dropped samples during aggregation. \nStream aggregation will drop samples with NaN values, too old timestamps or samples identified as duplicates during deduplication. See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#ignoring-old-samples ",
"fieldConfig": {
"defaults": {
"color": {
@@ -6282,14 +6282,14 @@
"uid": "$ds"
},
"editorMode": "code",
"expr": "sum(rate(vm_streamaggr_ignored_samples_total{job=~\"$job\",instance=~\"$instance\", url=~\"$url\"}[$__rate_interval]) > 0) without (instance, pod)",
"expr": "sum(rate({__name__=~\"vm_streamaggr_ignored_samples_total|vm_streamaggr_dedup_dropped_samples_total\", job=~\"$job\",instance=~\"$instance\", url=~\"$url\"}[$__rate_interval]) > 0) without (instance, pod)",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "Ignored samples ($instance)",
"title": "Dropped samples ($instance)",
"type": "timeseries"
},
{

View File

@@ -1840,7 +1840,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -2164,7 +2164,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -6200,7 +6200,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "The rate of ignored samples during aggregation. \nStream aggregation will drop samples with NaN values, or samples with too old timestamps. See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#ignoring-old-samples ",
"description": "The rate of dropped samples during aggregation. \nStream aggregation will drop samples with NaN values, too old timestamps or samples identified as duplicates during deduplication. See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#ignoring-old-samples ",
"fieldConfig": {
"defaults": {
"color": {
@@ -6281,14 +6281,14 @@
"uid": "$ds"
},
"editorMode": "code",
"expr": "sum(rate(vm_streamaggr_ignored_samples_total{job=~\"$job\",instance=~\"$instance\", url=~\"$url\"}[$__rate_interval]) > 0) without (instance, pod)",
"expr": "sum(rate({__name__=~\"vm_streamaggr_ignored_samples_total|vm_streamaggr_dedup_dropped_samples_total\", job=~\"$job\",instance=~\"$instance\", url=~\"$url\"}[$__rate_interval]) > 0) without (instance, pod)",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "Ignored samples ($instance)",
"title": "Dropped samples ($instance)",
"type": "timeseries"
},
{

View File

@@ -1839,7 +1839,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSee major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -3,9 +3,9 @@
DOCKER_REGISTRIES ?= docker.io quay.io
DOCKER_NAMESPACE ?= victoriametrics
ROOT_IMAGE ?= alpine:3.23.4
ROOT_IMAGE ?= alpine:3.24.1
ROOT_IMAGE_SCRATCH ?= scratch
CERTS_IMAGE := alpine:3.23.4
CERTS_IMAGE := alpine:3.24.1
GO_BUILDER_IMAGE := golang:1.26.4

View File

@@ -3,7 +3,7 @@ services:
# It scrapes targets defined in --promscrape.config
# And forward them to --remoteWrite.url
vmagent:
image: victoriametrics/vmagent:v1.145.0
image: victoriametrics/vmagent:v1.146.0
depends_on:
- "vmauth"
ports:
@@ -42,14 +42,14 @@ services:
# vmstorage shards. Each shard receives 1/N of all metrics sent to vminserts,
# where N is number of vmstorages (2 in this case).
vmstorage-1:
image: victoriametrics/vmstorage:v1.145.0-cluster
image: victoriametrics/vmstorage:v1.146.0-cluster
volumes:
- strgdata-1:/storage
command:
- "--storageDataPath=/storage"
restart: always
vmstorage-2:
image: victoriametrics/vmstorage:v1.145.0-cluster
image: victoriametrics/vmstorage:v1.146.0-cluster
volumes:
- strgdata-2:/storage
command:
@@ -59,7 +59,7 @@ services:
# vminsert is ingestion frontend. It receives metrics pushed by vmagent,
# pre-process them and distributes across configured vmstorage shards.
vminsert-1:
image: victoriametrics/vminsert:v1.145.0-cluster
image: victoriametrics/vminsert:v1.146.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -68,7 +68,7 @@ services:
- "--storageNode=vmstorage-2:8400"
restart: always
vminsert-2:
image: victoriametrics/vminsert:v1.145.0-cluster
image: victoriametrics/vminsert:v1.146.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -80,7 +80,7 @@ services:
# vmselect is a query fronted. It serves read queries in MetricsQL or PromQL.
# vmselect collects results from configured `--storageNode` shards.
vmselect-1:
image: victoriametrics/vmselect:v1.145.0-cluster
image: victoriametrics/vmselect:v1.146.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -90,7 +90,7 @@ services:
- "--vmalert.proxyURL=http://vmalert:8880"
restart: always
vmselect-2:
image: victoriametrics/vmselect:v1.145.0-cluster
image: victoriametrics/vmselect:v1.146.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -105,7 +105,7 @@ services:
# read requests from Grafana, vmui, vmalert among vmselects.
# It can be used as an authentication proxy.
vmauth:
image: victoriametrics/vmauth:v1.145.0
image: victoriametrics/vmauth:v1.146.0
depends_on:
- "vmselect-1"
- "vmselect-2"
@@ -119,7 +119,7 @@ services:
# vmalert executes alerting and recording rules
vmalert:
image: victoriametrics/vmalert:v1.145.0
image: victoriametrics/vmalert:v1.146.0
depends_on:
- "vmauth"
ports:

View File

@@ -3,7 +3,7 @@ services:
# It scrapes targets defined in --promscrape.config
# And forward them to --remoteWrite.url
vmagent:
image: victoriametrics/vmagent:v1.145.0
image: victoriametrics/vmagent:v1.146.0
depends_on:
- "victoriametrics"
ports:
@@ -18,7 +18,7 @@ services:
# VictoriaMetrics instance, a single process responsible for
# storing metrics and serve read requests.
victoriametrics:
image: victoriametrics/victoria-metrics:v1.145.0
image: victoriametrics/victoria-metrics:v1.146.0
ports:
- 8428:8428
- 8089:8089
@@ -59,7 +59,7 @@ services:
# vmalert executes alerting and recording rules
vmalert:
image: victoriametrics/vmalert:v1.145.0
image: victoriametrics/vmalert:v1.146.0
depends_on:
- "victoriametrics"
- "alertmanager"

View File

@@ -223,4 +223,16 @@ groups:
Unexpected TSID misses for \"{{ $labels.job }}\" ({{ $labels.instance }}) for the last 15 minutes.
If this happens after unclean shutdown of VictoriaMetrics process (via \"kill -9\", OOM or power off),
then this is OK - the alert must go away in a few minutes after the restart.
Otherwise this may point to the corruption of index data.
Otherwise this may point to the corruption of index data.
- alert: VMSelectConcurrentQueriesExceedMemoryLimit
expr: (vm_max_memory_per_query * on(job, instance) vm_concurrent_select_capacity) > on(job, instance) vm_available_memory_bytes
for: 5m
labels:
severity: warning
annotations:
summary: "vmselect ({{ $labels.instance }}) concurrent query memory may exceed pod limit"
description: "Current concurrent queries ({{ $value | humanize1024 }} combined max memory) exceed
the available memory on instance {{ $labels.instance }}.
This may result in OOM kills. Consider reducing -maxConcurrentRequests,
lowering -maxMemoryPerQuery, or scaling up pod memory limits."

View File

@@ -1,6 +1,6 @@
services:
vmagent:
image: victoriametrics/vmagent:v1.145.0
image: victoriametrics/vmagent:v1.146.0
depends_on:
- "victoriametrics"
ports:
@@ -14,7 +14,7 @@ services:
restart: always
victoriametrics:
image: victoriametrics/victoria-metrics:v1.145.0
image: victoriametrics/victoria-metrics:v1.146.0
ports:
- 8428:8428
volumes:
@@ -40,7 +40,7 @@ services:
restart: always
vmalert:
image: victoriametrics/vmalert:v1.145.0
image: victoriametrics/vmalert:v1.146.0
depends_on:
- "victoriametrics"
ports:
@@ -59,7 +59,7 @@ services:
- '--external.alert.source=explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr": },{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]'
restart: always
vmanomaly:
image: victoriametrics/vmanomaly:v1.29.4
image: victoriametrics/vmanomaly:v1.29.7
depends_on:
- "victoriametrics"
ports:

View File

@@ -43,7 +43,7 @@
"content": "If you don't observe any data initially, please wait a few minutes for it to appear. \n\nUpon the first running the guide (if there is not enough node_exporter monitoring data collected in your system), you may notice a significant number of false positive anomalies found. The predictions will become more accurate with at least two weeks' (full `fit_window`) worth of data provided to vmanomaly.\n\nEach row displays information for a distinct mode. The query used for anomaly detection is `sum(rate(node_cpu_seconds_total[5m])) by (mode, instance, job)`.\n",
"mode": "markdown"
},
"pluginVersion": "10.2.1",
"pluginVersion": "12.2.0",
"title": "Overview",
"type": "text"
},

View File

@@ -32,6 +32,17 @@ docs-image:
--platform $(DOCKER_PLATFORM) \
vmdocs
docs-check-links: docs-image
rm -rf vmdocs/public
docker run \
--rm \
--platform $(DOCKER_PLATFORM) \
-v ./vmdocs:/opt/docs \
$(shell for d in ./docs/*/; do printf ' -v %s:/opt/docs/content/%s' "$${d}" "$$(basename $${d})"; done) \
--entrypoint /bin/sh \
vmdocs-docker-package \
-c "yarn install && hugo --minify && yarn run check-links"
docs-debug: docs docs-image
docker run \
--rm \

View File

@@ -14,6 +14,39 @@ aliases:
---
Please find the changelog for VictoriaMetrics Anomaly Detection below.
## v1.29.7
Released: 2026-06-25
- UI: updated [vmanomaly UI](https://docs.victoriametrics.com/anomaly-detection/ui/) from [v1.7.1](https://docs.victoriametrics.com/anomaly-detection/ui/#v171) to [v1.7.2](https://docs.victoriametrics.com/anomaly-detection/ui/#v172), see respective [release notes](https://docs.victoriametrics.com/anomaly-detection/ui/#v172) for details. Notable mentions include `api/v1/server/model` endpoint for accessing production models config and queries from UI, manually or through [AI assistant](https://docs.victoriametrics.com/anomaly-detection/ui/#ai-assistance).
- IMPROVEMENT: Increased high-cardinality inference scaling by optionally scattering periodic infer jobs to reduce contention on shared resources (e.g. datasource, CPU, RAM) when `settings.n_workers > 1` and `scheduler.infer_every` is smaller than the total time to fetch and process all queries. This is controlled by new `scatter_infer_jobs` boolean argument of [Periodic Scheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#parameters-1) (default: `false`).
- IMPROVEMENT: Optimized internal batching for reader post-fetch series processing, exposing reader processing queue depth (`vmanomaly_reader_processing_tasks_queued` [metric](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#reader-behaviour-metrics)), and clarifying inference skip logs after data fetch timeouts. See `series_processing_batch_size` argument of [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) and [VLogsReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader) for details.
- IMPROVEMENT: Refined `VmReader` and `VLogsReader` logging after datasource request failures by suppressing the follow-up generic "No data" or "No unseen data" warning for failed fetches. Failed requests now keep the original datasource error while empty successful responses still emit the no-data warning.
## v1.29.6
Released: 2026-06-17
- BUGFIX: Fixed `VLogsReader` startup and query execution when `tenant_id` is omitted or provided in short account-only form such as `"0"`. Omitted or empty tenant IDs are treated as single-node/no-tenant mode, and account-only tenant IDs are expanded to `accountID:0` before adding VictoriaLogs `AccountID`/`ProjectID` params or VM tenant labels.
- BUGFIX: Hardened [`OnlineMADModel`](https://docs.victoriametrics.com/anomaly-detection/components/models/#online-mad) anomaly scoring for perfectly constant time series (all values identical). The model now keeps a small deterministic prediction interval when the learned MAD is zero, so values deviating from an unknown constant baseline can produce `anomaly_score > 1` (previously, all anomaly scores were `0`).
## v1.29.5
Released: 2026-06-11
- UI: Updated [vmanomaly UI](https://docs.victoriametrics.com/anomaly-detection/ui/) from [v1.7.0](https://docs.victoriametrics.com/anomaly-detection/ui/#v170) to [v1.7.1](https://docs.victoriametrics.com/anomaly-detection/ui/#v171), see respective [release notes](https://docs.victoriametrics.com/anomaly-detection/ui/#v171) for details.
- IMPROVEMENT: Redesigned [hot reload](https://docs.victoriametrics.com/anomaly-detection/components/#hot-reload) config change detection to content-based polling with configurable `-configCheckInterval`, improving reliability for Kubernetes ConfigMap symlink rotations and other filesystems where event delivery can be inconsistent.
- IMPROVEMENT: Refined config validation errors for broken or invalid config sections, so startup and reload failures point to the affected section more clearly (e.g. YAML indentation typos).
- IMPROVEMENT: Tightened config validation for [`PeriodicScheduler`](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler) `infer_every` and [`IsolationForestModel`](https://docs.victoriametrics.com/anomaly-detection/components/models/#isolation-forest-multivariate) `contamination`, including clearer handling of missing scheduler intervals, numeric contamination strings, and invalid non-finite values.
- BUGFIX: Fixed a multiprocessing startup issue with `settings.n_workers > 1` that could leave scheduled data fetch or successive inference jobs stuck and repeatedly skipped by internal scheduler.
- BUGFIX: Bounded [`VmReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) and [`VLogsReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader) data fetch and post-fetch processing waits so stalled datasource reads or multiprocessing dataframe creation no longer keep [`PeriodicScheduler`](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler) `data_fetch` jobs running indefinitely. Previously, such stuck jobs could keep internal scheduler's `max_instances=1` slot per (scheduler, query) pair occupied, causing future data fetch, fit, or infer runs to be skipped until vmanomaly was restarted. The config validator now also warns when the configured reader timeout budget can exceed the connected scheduler interval.
## v1.29.4
Released: 2026-05-15

View File

@@ -423,7 +423,7 @@ services:
# ...
vmanomaly:
container_name: vmanomaly
image: victoriametrics/vmanomaly:v1.29.4
image: victoriametrics/vmanomaly:v1.29.7
# ...
restart: always
volumes:
@@ -641,7 +641,7 @@ options:
Heres an example of using the config splitter to divide configurations based on the `extra_filters` argument from the reader section:
```sh
docker pull victoriametrics/vmanomaly:v1.29.4 && docker image tag victoriametrics/vmanomaly:v1.29.4 vmanomaly
docker pull victoriametrics/vmanomaly:v1.29.7 && docker image tag victoriametrics/vmanomaly:v1.29.7 vmanomaly
```
```sh

View File

@@ -45,8 +45,7 @@ There are 2 types of compatibility to consider when migrating in stateful mode:
| Group start | Group end | Compatibility | Notes |
|---------|--------- |------------|-------|
| [v1.29.3](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1293) | Latest* | Fully Compatible | Just a placeholder for new releases |
| [v1.29.1](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1291) | [v1.29.3](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1293) | Fully Compatible | - |
| [v1.29.1](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1291) | [v1.29.7](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1297) | Fully Compatible | - |
| [v1.28.7](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1287) | [v1.29.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1290) | Partially compatible* | Dumped models of class [prophet](https://docs.victoriametrics.com/anomaly-detection/components/models/#prophet) and [seasonal quantile](https://docs.victoriametrics.com/anomaly-detection/components/models/#online-seasonal-quantile) have problems with loading to [v1.29.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1290) due to dropped `pytz` library. **Upgrading directly from v1.28.7 to [v1.29.1](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1291) with a fix is suggested** |
| [v1.26.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1262) | [v1.28.7](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1287) | Fully Compatible | [v1.28.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1280) introduced [rolling](https://docs.victoriametrics.com/anomaly-detection/components/models/#rolling-models) model class drop in favor of [online](https://docs.victoriametrics.com/anomaly-detection/components/models/#online-models) models (`rolling_quantile` and `std` models), however, it does not impact compatibility, as artifacts were not produced by default for rolling models. Also, offline `mad` and `zscore` models are redirecting to their respective online counterparts since [v1.28.4](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1284). |
| [v1.25.3](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1253) | [v1.26.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1270) | Partially Compatible* | [v1.25.3](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1253) introduced `forecast_at` argument for base [univariate](https://docs.victoriametrics.com/anomaly-detection/components/models/#univariate-models) and `Prophet` [models](https://docs.victoriametrics.com/anomaly-detection/components/models/#prophet), however, itself remains backward-reversible from newer states like [v1.26.2](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1262), [v1.27.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1270). (All models except `isolation_forest_multivariate` class will be dropped) |

View File

@@ -37,29 +37,39 @@ The `vmanomaly` service supports a set of command-line arguments to configure it
> Single-dashed command-line argument {{% available_from "v1.23.3" anomaly %}} format can be used, e.g. `-license.forceOffline` in addition to `--license.forceOffline`. This aligns better with other VictoriaMetrics ecosystem components. Mixing the two styles is also supported, e.g. `-license.forceOffline --loggerLevel INFO`.
```shellhelp
usage: vmanomaly.py [-h] [--license STRING | --licenseFile PATH] [--license.forceOffline] [--loggerLevel {DEBUG,WARNING,FATAL,ERROR,INFO}] [--watch] [--dryRun] [--outputSpec PATH] config [config ...]
usage: vmanomaly.py [-h] [--license STRING | --licenseFile PATH] [--license.forceOffline] [--loggerLevel {DEBUG,INFO,WARNING,ERROR,FATAL}] [--watch] [-configCheckInterval DURATION] [--dryRun] [--outputSpec PATH] config [config ...]
VictoriaMetrics Anomaly Detection Service
positional arguments:
config YAML config file(s) or directories containing YAML files. Multiple files will recursively merge each other values so multiple configs can be combined. If a directory is provided,
all `.yaml` files inside will be merged, without recursion. Default: vmanomaly.yaml is expected in the current directory.
config YAML config file(s) or directories containing YAML files. Multiple files will recursively merge each other
values so multiple configs can be combined. If a directory is provided, all `.yaml` files inside will be
merged, without recursion. Default: vmanomaly.yaml is expected in the current directory.
options:
-h Show this help message and exit
--license STRING License key for VictoriaMetrics Enterprise. See https://victoriametrics.com/products/enterprise/trial/ to obtain a trial license.
--licenseFile PATH Path to file with license key for VictoriaMetrics Enterprise. See https://victoriametrics.com/products/enterprise/trial/ to obtain a trial license.
--license STRING License key for VictoriaMetrics Enterprise. See https://victoriametrics.com/products/enterprise/trial/ to
obtain a trial license.
--licenseFile PATH Path to file with license key for VictoriaMetrics Enterprise. See
https://victoriametrics.com/products/enterprise/trial/ to obtain a trial license.
--license.forceOffline
Whether to force offline verification for VictoriaMetrics Enterprise license key, which has been passed either via -license or via -licenseFile command-line flag. The issued
license key must support offline verification feature. Contact info@victoriametrics.com if you need offline license verification.
--loggerLevel {DEBUG,WARNING,FATAL,ERROR,INFO}
Minimum level to log. Possible values: DEBUG, INFO, WARNING, ERROR, FATAL.
--watch Watch config files for changes and trigger hot reloads. Watches the specified config file or directory for modifications, deletions, or additions. Upon detecting changes,
triggers config reload. If new config validation fails, continues with previous valid config and state.
--dryRun Validate only: parse + merge all YAML(s) and run schema checks, then exit. Does not require a license to run. Does not expose metrics, or launch vmanomaly service(s).
Whether to force offline verification for VictoriaMetrics Enterprise license key, which has been passed either
via -license or via -licenseFile command-line flag. The issued license key must support offline verification
feature. Contact info@victoriametrics.com if you need offline license verification.
--loggerLevel {DEBUG,INFO,WARNING,ERROR,FATAL}
Minimum level to log. Possible values: {'DEBUG', 'INFO', 'WARNING', 'ERROR', 'FATAL'}.
--watch Watch config files for changes and trigger hot reloads. Watches the specified config file or directory for
modifications, deletions, or additions. Upon detecting changes, triggers config reload. If new config
validation fails, continues with previous valid config and state.
-configCheckInterval DURATION
Interval for checking watched config files for content changes. Default: 30s.
--dryRun Validate only: parse + merge all YAML(s) and run schema checks, then exit. Does not require a license to run.
Does not expose metrics, or launch vmanomaly service(s).
--outputSpec PATH Target location of .yaml output spec.
```
{{% available_from "v1.29.5" anomaly %}} When `--watch` is enabled, config changes are detected by fixed-interval content polling instead of filesystem event delivery. The polling frequency is controlled by `-configCheckInterval` (default: `30s`). The same option can also be passed as `--configCheckInterval`, `--config.check.interval`, `--config-check-interval`, `--config_check_interval`, or in key-value form such as `configCheckInterval=30s`.
You can specify these options when running `vmanomaly` to fine-tune logging levels or handle licensing configurations, as per your requirements.
### Licensing
@@ -122,7 +132,7 @@ Below are the steps to get `vmanomaly` up and running inside a Docker container:
1. Pull Docker image:
```sh
docker pull victoriametrics/vmanomaly:v1.29.4
docker pull victoriametrics/vmanomaly:v1.29.7
```
2. Create the license file with your license key.
@@ -142,7 +152,7 @@ docker run -it \
-v ./license:/license \
-v ./config.yaml:/config.yaml \
-p 8490:8490 \
victoriametrics/vmanomaly:v1.29.4 \
victoriametrics/vmanomaly:v1.29.7 \
/config.yaml \
--licenseFile=/license \
--loggerLevel=INFO \
@@ -159,7 +169,7 @@ docker run -it \
-e VMANOMALY_DATA_DUMPS_DIR=/tmp/vmanomaly/data \
-e VMANOMALY_MODEL_DUMPS_DIR=/tmp/vmanomaly/models \
-p 8490:8490 \
victoriametrics/vmanomaly:v1.29.4 \
victoriametrics/vmanomaly:v1.29.7 \
/config.yaml \
--licenseFile=/license \
--loggerLevel=INFO \
@@ -172,7 +182,7 @@ services:
# ...
vmanomaly:
container_name: vmanomaly
image: victoriametrics/vmanomaly:v1.29.4
image: victoriametrics/vmanomaly:v1.29.7
# ...
restart: always
volumes:
@@ -257,6 +267,7 @@ schedulers:
# https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler
class: 'periodic'
infer_every: '5m'
scatter_infer_jobs: true
fit_every: '1d'
fit_window: '4w'
@@ -288,6 +299,7 @@ reader:
datasource_url: "https://play.victoriametrics.com/" # [YOUR_DATASOURCE_URL]
tenant_id: '0:0'
sampling_period: "5m"
series_processing_batch_size: 8 # number of time series to process together while preparing data for fit or infer stages
queries:
# define your queries with MetricsQL - https://docs.victoriametrics.com/victoriametrics/metricsql/
cpu_user:
@@ -403,11 +415,13 @@ For optimal service behavior, consider the following tweaks when configuring `vm
- Configure the **inference frequency** in the [scheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/) section of the configuration file.
- Ensure that `infer_every` aligns with your **minimum required alerting frequency**.
- For example, if receiving **alerts every 15 minutes** is sufficient (when `anomaly_score > 1`), set `infer_every` to match `reader.sampling_period` or override it per query via `reader.queries.query_xxx.step` for an optimal setup.
- Set `scheduler.scatter_infer_jobs` {{% available_from "v1.29.7" anomaly %}} [arg](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#parameters-1) to `true` to allow for equal distribution of inference jobs across `infer_every` intervals, which can further enhance parallel processing efficiency and reduce resource contention when `reader.queries` contains a large number of queries.
**Reader**:
- Setup the datasource to read data from in the [reader](https://docs.victoriametrics.com/anomaly-detection/components/reader/) section. Include tenant ID if using a [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) (`multitenant` value {{% available_from "v1.16.2" anomaly %}} can be also used here).
- Define queries for input data using [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/) under `reader.queries` section. Note, it's possible to override reader-level arguments at query level for increased flexibility, e.g. specifying per-query [timezone](https://docs.victoriametrics.com/anomaly-detection/faq/#handling-timezones) or [sampling period](https://docs.victoriametrics.com/anomaly-detection/components/reader/#config-parameters).
- For longer `fit_window` intervals in scheduler, consider splitting queries into smaller time ranges to avoid excessive memory usage, timeouts and hitting server-side constraints, so they can be queried separately and reconstructed on `vmanomaly` side. Please refer to this [example](https://docs.victoriametrics.com/anomaly-detection/faq/#handling-large-queries-in-vmanomaly) for more details.
- Set `reader.series_processing_batch_size` {{% available_from "v1.29.7" anomaly %}} [arg](https://docs.victoriametrics.com/anomaly-detection/components/reader/#config-parameters) to a reasonable value (4-16, default is 8) to balance between memory usage and processing speed when preparing data for fit or infer stages.
> If applicable - consider [`VLogsReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader) {{% available_from "v1.26.0" anomaly %}} to perform anomaly detection on **log-derived metrics**. This is particularly useful for scenarios where log data needs to be analyzed for unusual patterns or behaviors, such as error rates or request latencies.

View File

@@ -315,7 +315,7 @@ docker run -it --rm \
-e VMANOMALY_MCP_SERVER_URL=http://mcp-vmanomaly:8081/mcp \
-p 8080:8080 \
-p 8490:8490 \
victoriametrics/vmanomaly:v1.29.4 \
victoriametrics/vmanomaly:v1.29.7 \
vmanomaly_config.yaml
```
@@ -640,6 +640,32 @@ If the **results** look good and the **model configuration should be deployed in
## Changelog
### v1.7.2
Released: 2026-06-25
vmanomaly version: [v1.29.7](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1297)
- FEATURE: Added controls for selecting server-configured scheduled models (drop-down inside [model wizard](#model-panel)) and browsing scheduled queries from the running vmanomaly instance ("Queries" button, "scheduled queries" tab).
- IMPROVEMENT: Surfaced datasource fetch failures from ad-hoc VMUI raw queries as query-level errors instead of returning a successful empty result that triggers a generic "No match" warning. Now the user can see the actual error message from the datasource (e.g. "unauthorized", "not found", etc.) and take appropriate action.
- BUGFIX: Fixed [UI/query-server](#settings-panel) handling of VictoriaMetrics datasource URLs that already include `/select/multitenant/prometheus`. Such URLs are now recognized as cluster datasource URLs, preserving the multitenant path when proxying VMUI requests and allowing `server.use_reader_connection_settings` to reuse [configured reader credentials for authenticated datasources](#authentication).
- BUGFIX: Fixed [settings](#settings-panel) inputs for server and datasource URLs so editing, deleting, or pasting text is no longer immediately reverted to the previous value before applying changes.
- BUGFIX: Fixed [model wizard](#model-panel) settings for [`IsolationForestModel`](https://docs.victoriametrics.com/anomaly-detection/components/models/#isolation-forest-multivariate) `contamination`, allowing decimal float values such as `0.1` or `0,1` to be typed or pasted without being collapsed to `0`, while preserving the `"auto"` value.
### v1.7.1
Released: 2026-06-11
vmanomaly version: [v1.29.5](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1295)
- FEATURE: Added bulk Apply/Decline actions for [Copilot](#ai-assistance) chat suggestions.
- BUGFIX: Fixed modal windows closing when the mouse is released outside the window during text selection.
- BUGFIX: Fixed tooltip hover behavior so tooltips do not disappear while the cursor moves into the hover content.
### v1.7.0
Released: 2026-05-15

View File

@@ -49,6 +49,7 @@ schedulers:
periodic_online: # alias
class: 'periodic' # scheduler class
infer_every: "30s" # how often to produce anomaly scores for new data
scatter_infer_jobs: true # distribute infer jobs evenly across the infer interval to reduce synchronized bursts
fit_every: "365d" # how often to re-fit the models, for online models used effectively once, then they are updated with new data and won't require re-fit
fit_window: "3d" # how much historical data to use for fit stage
start_from: "00:00" # start from specified time, i.e. 00:00 given timezone and do daily fits as `fit_every` is 1 day
@@ -56,6 +57,7 @@ schedulers:
periodic_offline_1w:
class: 'periodic'
infer_every: "15m"
scatter_infer_jobs: true
fit_every: "24h"
fit_window: "14d"
# if no start_from is specified, jobs will start immediately after service starts
@@ -135,6 +137,7 @@ server:
port: 8490
path_prefix: '/vmanomaly' # optional path prefix for all HTTP routes
max_concurrent_tasks: 4 # maximum number of concurrent anomaly detection tasks processed by backend
use_reader_connection_settings: True # if True, use reader's datasource_url and credentials for UI requests to datasource
uvicorn_config: # optional Uvicorn server configuration
log_level: 'warning'
```
@@ -143,11 +146,14 @@ server:
> This feature is better used in conjunction with [stateful service](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) to preserve the state of the models and schedulers between restarts and reuse what can be reused, thus avoiding unnecessary re-training of models, re-initialization of schedulers and re-reading of data.
{{% available_from "v1.25.0" anomaly %}} Service supports hot reload of configuration files, which allows for automatic reloading of configurations on config files change filesystem events without the need of explicit service restart. This can be enabled via the `--watch` [CLI argument](https://docs.victoriametrics.com/anomaly-detection/quickstart/#command-line-arguments). `vmanomaly_config_reload_enabled` flag in [self-monitoring metrics](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#startup-metrics) will be set to 1 (if enabled) or 0 (if disabled).
{{% available_from "v1.25.0" anomaly %}} Service supports hot reload of configuration files, which allows for automatic reloading of configurations on config files change without the need of explicit service restart. This can be enabled via the `--watch` [CLI argument](https://docs.victoriametrics.com/anomaly-detection/quickstart/#command-line-arguments). `vmanomaly_config_reload_enabled` flag in [self-monitoring metrics](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#startup-metrics) will be set to 1 (if enabled) or 0 (if disabled).
> [!NOTE]
> {{% deprecated_from "v1.29.5" anomaly %}} File system event-based hot reload has been deprecated in favor of content-based polling with configurable `-configCheckInterval` due to reliability issues with Kubernetes ConfigMap symlink rotations and other filesystems where event delivery can be inconsistent. If you were using file system event-based hot reload, please switch to content-based polling by enabling `--watch` flag and configuring `-configCheckInterval` as needed.
### How it works
It works by watching for file system events, such as modifications, creations, or deletions of `.yml|.yaml` files in the specified directories. When a change is detected, the service will attempt to reload the configuration files, rebuild the [global config](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#global-configuration) and reinitialize the components. If the reload is successful, the `vmanomaly_config_reloads_total` metric will be incremented for `status="success"` label, otherwise it will be incremented with `status="failure"` label and a respective error message on config validation failure(s) will be logged.
It works by checking watched `.yml|.yaml` file contents in the specified files or directories on the configured interval `-configCheckInterval` (default is `30s`) {{% available_from "v1.29.5" anomaly %}}. When a content change is detected, the service will attempt to reload the configuration files after the existing debounce window, rebuild the [global config](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#global-configuration) and reinitialize the components. If the reload is successful, the `vmanomaly_config_reloads_total` metric will be incremented for `status="success"` label, otherwise it will be incremented with `status="failure"` label and a respective error message on config validation failure(s) will be logged.
> If the reload fails, the service will log an error message indicating the reason for the failure, and the **previous configuration will remain active until a successful reload occurs** to preserve the service's stability. This means that if there are errors in the new configuration, the service will continue to operate with the last valid configuration until the issues are resolved.

View File

@@ -449,9 +449,9 @@ models:
> The `decay` argument works only in combination with [online models](#online-models) like [`ZScoreOnlineModel`](#online-z-score) or [`OnlineQuantileModel`](#online-seasonal-quantile).
The `decay` {{% available_from "v1.23.0" anomaly %}} argument is used to control the (exponential) **decay factor** for online models, which determines how quickly the model adapts to new data. It is a float value between `0.0` and `1.0`, where:
- `1.0` means no decay (the model treats all data equally, without giving more weight to recent data). This is the default value for backward compatibility.
- Less than `1.0` means that the model will give more weight to recent data, effectively "forgetting" older data over time.
The `decay` {{% available_from "v1.23.0" anomaly %}} argument is used to control the (exponential) **decay factor** for online models, which determines how quickly the model adapts to new data. It is a positive float value from `(0.0, 1.0]` interval, where:
- Value `1.0` means no decay (the model treats all data points equally, without giving more weight to recent ones). This is the default value for backward compatibility.
- Values less than `1.0` mean that the model will give more weight to recent data, effectively "forgetting" older data over time.
Roughly speaking, for the recent N datapoints model processes `decay` = `d` means that these datapoints will contribute to the model as [1 - d^X] percent of total importance, for example decay of
- `0.99` means that 100 recent datapoints will contribute as [1 - 0.99^100] = 63.23% of total importance
@@ -998,7 +998,7 @@ Here we use Isolation Forest implementation from `scikit-learn` [library](https:
* `class` (string) - model class name `"model.isolation_forest.IsolationForestMultivariateModel"` (or `isolation_forest_multivariate` with class alias support {{% available_from "v1.13.0" anomaly %}})
* `contamination` (float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5].
* `contamination` (float or string, optional) - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. Default value - "auto". Should be either `"auto"` or be in the range (0.0, 0.5]. {{% available_from "v1.29.5" anomaly %}} Numeric strings, such as `"0.01"`, are accepted, while invalid non-finite values, such as `nan`, `inf`, and `-inf`, are rejected during config validation.
* `seasonal_features` (list of string) - List of seasonality to encode through [cyclical encoding](https://towardsdatascience.com/cyclical-features-encoding-its-about-time-ce23581845ca), i.e. `dow` (day of week). **Introduced in [1.12.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1120)**.
- Empty by default for backward compatibility.
@@ -1265,7 +1265,7 @@ monitoring:
Let's pull the docker image for `vmanomaly`:
```sh
docker pull victoriametrics/vmanomaly:v1.29.4
docker pull victoriametrics/vmanomaly:v1.29.7
```
Now we can run the docker container putting as volumes both config and model file:
@@ -1279,7 +1279,7 @@ docker run -it \
-v $(PWD)/license:/license \
-v $(PWD)/custom_model.py:/vmanomaly/model/custom.py \
-v $(PWD)/custom.yaml:/config.yaml \
victoriametrics/vmanomaly:v1.29.4 /config.yaml \
victoriametrics/vmanomaly:v1.29.7 /config.yaml \
--licenseFile=/license
--watch
```

View File

@@ -458,6 +458,21 @@ Label names [description](#labelnames)
<td>The total number of datapoints received from VictoriaMetrics for the `query_key` query within the specified scheduler `scheduler_alias`, in the `vmanomaly` service running in `preset` mode.</td>
<td>
`url`, `query_key`, `scheduler_alias`, `preset`
</td>
</tr>
<tr>
<td>
<span style="white-space: nowrap;">`vmanomaly_reader_processing_tasks_queued`</span>
</td>
<td>
`Gauge`
</td>
<td>The total number of queued processing tasks {{% available_from "v1.29.7" anomaly %}} (timeseries batches of size `series_processing_batch_size`) for the `query_key` query within the specified scheduler `scheduler_alias`, in the `vmanomaly` service running in `preset` mode. If continuously >0, it may lead to skipped infer runs due to resource contention and timeouts.</td>
<td>
`url`, `query_key`, `scheduler_alias`, `preset`
</td>
</tr>

View File

@@ -421,7 +421,20 @@ Optional argument{{% available_from "v1.18.1" anomaly %}} allows defining **vali
`60s`
</td>
<td>
Optional argument{{% available_from "v1.25.3" anomaly %}} allows specifying a time offset for all queries in `queries`. Defaults to `0s` (0) if not set and can be overridden on a [per-query basis](#per-query-parameters).
Optional argument {{% available_from "v1.25.3" anomaly %}}, allows specifying a time offset for all queries in `queries`. Defaults to `0s` (0) if not set and can be overridden on a [per-query basis](#per-query-parameters).
</td>
</tr>
<tr>
<td>
<span style="white-space: nowrap;">`series_processing_batch_size`</span>
</td>
<td>
`8`
</td>
<td>
Optional argument {{% available_from "v1.29.7" anomaly %}}, allows specifying the number of time series to process together while preparing data for fit or infer stages. Defaults to `8`. Suggested values are 4-16 for high-cardinality queries.
</td>
</tr>
</tbody>
@@ -450,6 +463,7 @@ reader:
sampling_period: '1m'
query_from_last_seen_timestamp: True # false by default
latency_offset: '1ms'
series_processing_batch_size: 8
```
### MetricsQL Playground
@@ -879,6 +893,19 @@ If a path to a CA bundle file (like `ca.crt`), it will verify the certificate us
(Optional) Password for authentication. If set, it will be used to authenticate the request.
</td>
</tr>
<tr>
<td>
<span style="white-space: nowrap;">`series_processing_batch_size`</span>
</td>
<td>
`8`
</td>
<td>
Optional argument {{% available_from "v1.29.7" anomaly %}}, allows specifying the number of time series to process together while preparing data for fit or infer stages. Defaults to `8`. Suggested values are 4-16 for high-cardinality queries.
</td>
</tr>
</tbody>
</table>
@@ -897,6 +924,7 @@ reader:
# tenant_id: '0:0' # for cluster version only
sampling_period: '1m'
max_points_per_query: 10000
series_processing_batch_size: 8
data_range: [0, 'inf'] # reader-level
offset: '0s' # reader-level
timeout: '30s'

View File

@@ -74,40 +74,7 @@ options={`"scheduler.periodic.PeriodicScheduler"`, `"scheduler.oneoff.OneoffSche
### Parameters
For periodic scheduler parameters are defined as differences in times, expressed in difference units, e.g. days, hours, minutes, seconds.
Examples: `"50s"`, `"4m"`, `"3h"`, `"2d"`, `"1w"`.
<table class="params">
<thead>
<tr>
<th></th>
<th>Time granularity</th>
</tr>
</thead>
<tbody>
<tr>
<td>s</td>
<td>seconds</td>
</tr>
<tr>
<td>m</td>
<td>minutes</td>
</tr>
<tr>
<td>h</td>
<td>hours</td>
</tr>
<tr>
<td>d</td>
<td>days</td>
</tr>
<tr>
<td>w</td>
<td>weeks</td>
</tr>
</tbody>
</table>
For periodic scheduler parameters are defined as differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. Time granularity is defined by the last characters of a string. Examples: `"50s"` (seconds), `"4m"` (minutes), `"3h"` (hours), `"2d"` (days), `"1w"` (weeks).
<table class="params">
<thead>
@@ -188,6 +155,21 @@ Specifies when to initiate the first `fit_every` call. Accepts either an ISO 860
Defines the local timezone for the `start_from` parameter, if specified. Defaults to `UTC` if no timezone is provided.
</td>
</tr>
<tr>
<td>
<span style="white-space: nowrap;">`scatter_infer_jobs`{{% available_from "v1.29.7" anomaly %}}</span>
</td>
<td>bool, <span style="white-space: nowrap;">Optional</span></td>
<td>
`true` or `false`
</td>
<td>
If `true`, distribute infer jobs and their dependent data-fetch jobs evenly across the infer interval. This reduces synchronized read and inference bursts for high-scale configurations. Defaults to `false`. Useful when `settings.n_workers > 1`, `reader.queries` cardinality is high, and `scheduler.infer_every` is small.
</td>
</tr>
</tbody>
</table>
@@ -200,6 +182,7 @@ schedulers:
# (or class: "scheduler.periodic.PeriodicScheduler" for versions before v1.13.0, without class alias support)
fit_window: "14d"
infer_every: "1m"
scatter_infer_jobs: true # Distribute infer jobs evenly across the infer interval to reduce synchronized bursts.
fit_every: "1h"
start_from: "20:00" # If launched before 20:00 (local Kyiv time), the first run starts today at 20:00. Otherwise, it starts tomorrow at 20:00.
tz: "Europe/Kyiv" # Defaults to 'UTC' if not specified.

View File

@@ -10,12 +10,12 @@ sitemap:
- To use *vmanomaly*, part of the enterprise package, a license key is required. Obtain your key [here](https://victoriametrics.com/products/enterprise/trial/) for this tutorial or for enterprise use.
- In the tutorial, we'll be using the following VictoriaMetrics components:
- [VictoriaMetrics Single-Node](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) (v1.137.0)
- [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) (v1.137.0)
- [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) (v1.137.0)
- [Grafana](https://grafana.com/) (v.10.2.1)
- [VictoriaMetrics Single-Node](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) (v1.146.0)
- [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/) (v1.146.0)
- [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) (v1.146.0)
- [Grafana](https://grafana.com/) (v12.2.0)
- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/)
- [Node exporter](https://github.com/prometheus/node_exporter#node-exporter) (v1.7.0) and [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) (v0.27.0)
- [Node exporter](https://github.com/prometheus/node_exporter#node-exporter) (v1.9.1) and [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) (v0.28.1)
![typical setup diagram](guide-vmanomaly-vmalert_overview.webp)
@@ -323,7 +323,7 @@ Let's wrap it all up together into the `docker-compose.yml` file.
services:
vmagent:
container_name: vmagent
image: victoriametrics/vmagent:v1.137.0
image: victoriametrics/vmagent:v1.146.0
depends_on:
- "victoriametrics"
ports:
@@ -340,7 +340,7 @@ services:
victoriametrics:
container_name: victoriametrics
image: victoriametrics/victoria-metrics:v1.137.0
image: victoriametrics/victoria-metrics:v1.146.0
ports:
- 8428:8428
volumes:
@@ -356,7 +356,7 @@ services:
grafana:
container_name: grafana
image: grafana/grafana-oss:10.2.1
image: grafana/grafana:12.2.0
depends_on:
- "victoriametrics"
ports:
@@ -373,7 +373,7 @@ services:
vmalert:
container_name: vmalert
image: victoriametrics/vmalert:v1.137.0
image: victoriametrics/vmalert:v1.146.0
depends_on:
- "victoriametrics"
ports:
@@ -395,7 +395,7 @@ services:
restart: always
vmanomaly:
container_name: vmanomaly
image: victoriametrics/vmanomaly:v1.29.4
image: victoriametrics/vmanomaly:v1.29.7
depends_on:
- "victoriametrics"
ports:
@@ -412,7 +412,7 @@ services:
- "--licenseFile=/license"
alertmanager:
container_name: alertmanager
image: prom/alertmanager:v0.27.0
image: prom/alertmanager:v0.28.1
volumes:
- ./alertmanager.yml:/config/alertmanager.yml
command:
@@ -424,7 +424,7 @@ services:
restart: always
node-exporter:
image: quay.io/prometheus/node-exporter:v1.7.0
image: quay.io/prometheus/node-exporter:v1.9.1
container_name: node-exporter
ports:
- 9100:9100

View File

@@ -240,23 +240,23 @@ vmagent will write data into VictoriaMetrics single-node and cluster (with tenan
# compose.yaml
services:
vmsingle:
image: victoriametrics/victoria-metrics:v1.145.0
image: victoriametrics/victoria-metrics:v1.146.0
vmstorage:
image: victoriametrics/vmstorage:v1.145.0-cluster
image: victoriametrics/vmstorage:v1.146.0-cluster
vminsert:
image: victoriametrics/vminsert:v1.145.0-cluster
image: victoriametrics/vminsert:v1.146.0-cluster
command:
- -storageNode=vmstorage:8400
vmselect:
image: victoriametrics/vmselect:v1.145.0-cluster
image: victoriametrics/vmselect:v1.146.0-cluster
command:
- -storageNode=vmstorage:8401
vmagent:
image: victoriametrics/vmagent:v1.145.0
image: victoriametrics/vmagent:v1.146.0
volumes:
- ./scrape.yaml:/etc/vmagent/config.yaml
command:
@@ -308,7 +308,7 @@ Now add the vmauth service to `compose.yaml`:
# compose.yaml
services:
vmauth:
image: docker.io/victoriametrics/vmauth:v1.145.0
image: docker.io/victoriametrics/vmauth:v1.146.0
ports:
- 8427:8427
volumes:

View File

@@ -6,45 +6,348 @@ build:
sitemap:
disable: true
---
**Objective**
Setup Victoria Metrics Cluster with support of multiple retention periods within one installation.
> [VictoriaMetrics Enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) supports specifying multiple retentions for distinct sets of time series and tenants. If you are an Enterprise user, [configure multiple retentions directly through retention filters](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#retention-filters) instead of following this guide.
**Enterprise Solution**
This guide explains how to set up multiple retentions using an [open-source VictoriaMetrics Cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/).
[VictoriaMetrics Enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) supports specifying multiple retentions
for distinct sets of time series and [tenants](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy)
via [retention filters](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#retention-filters).
## Overview
**Open Source Solution**
VictoriaMetrics retains metrics by default for **1 month**. You can change data retention with the [`-retentionPeriod` command-line flag](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention), but this value applies to **all time series stored** on a given `vmstorage` node and cannot be customized per tenant or per metric in the open source version.
Community version of VictoriaMetrics supports only one retention period per `vmstorage` node via [-retentionPeriod](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention) command-line flag.
The core idea of this guide is to run **separate logic groups of storages** (or even clusters) with individual `-retentionPeriod` settings, while still providing a single unified write and read path via vmagent and vmselect.
A multi-retention setup can be implemented by dividing a [victoriametrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) into logical groups with different retentions.
## Multi-Retention Architecture
Example:
Setup should handle 3 different retention groups 3months, 1year and 3 years.
Solution contains 3 groups of vmstorages + vminserts and one group of vmselects. Routing is done by [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/)
by [splitting data streams](https://docs.victoriametrics.com/victoriametrics/vmagent/#splitting-data-streams-among-multiple-systems).
The [-retentionPeriod](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#retention) sets how long to keep the metrics.
To support multiple retentions with the open source version of VictoriaMetrics cluster, you can split the cluster into several logical groups of storage nodes. Each group is configured with a different `-retentionPeriod` and receives only the data that must follow that retention.
The diagram below shows a proposed solution
Each storage group is connected to a separate vminsert, while a shared vmselect layer queries across all storage groups so that dashboards and alerts continue to see a single unified VictoriaMetrics backend.
![Setup](setup.webp)
**Implementation Details**
In the example used throughout this guide, the cluster is divided into three groups:
1. Groups of vminserts A know about only vmstorages A and this is explicitly specified via `-storageNode` [configuration](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-setup).
1. Groups of vminserts B know about only vmstorages B and this is explicitly specified via `-storageNode` [configuration](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-setup).
1. Groups of vminserts C know about only vmstorages C and this is explicitly specified via `-storageNode` [configuration](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-setup).
1. vmselect reads data from all vmstorage nodes via `-storageNode` [configuration](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-setup)
with [deduplication](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#deduplication) setting equal to vmagent's scrape interval or minimum interval between collected samples.
1. vmagent routes incoming metrics to the given set of `vminsert` nodes using relabeling rules specified at `-remoteWrite.urlRelabelConfig` [configuration](https://docs.victoriametrics.com/victoriametrics/relabeling/).
- Group A: 3-month retention.
- Group B: 1-year retention.
- Group C: 3-year retention.
**Multi-Tenant Setup**
Metrics are routed to the appropriate vminsert group by splitting data streams in vmagent, so each time series is sent to exactly one retention group instead of being replicated to all groups. See [Deploying vmagent](https://docs.victoriametrics.com/guides/guide-vmcluster-multiple-retention-setup/#step3) for an example of labelbased routing that implements this split. An optional [vmauth](https://docs.victoriametrics.com/guides/guide-vmcluster-multiple-retention-setup/#additional-enhancements) layer can be added on top to restrict access to specific subclusters or tenants while still keeping a unified write and read path.
Every group of vmstorages can handle one tenant or multiple one. Different groups can have overlapping tenants. As vmselect reads from all vmstorage nodes, the data is aggregated on its level.
## Implementing Multi-Retention on Kubernetes
**Additional Enhancements**
In this section, we'll install and configure the components for a multi-retention deployment of the VictoriaMetrics cluster. See [Kubernetes monitoring with VictoriaMetrics Cluster](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/) for details on running VictoriaMetrics in Kubernetes.
You can set up [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/) for routing data to the given vminsert group depending on the needed retention.
Run the following command to add the VictoriaMetrics Helm repository:
```shell
helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update
```
### Step 1: Deploying storage groups {#step1}
We'll create three storage groups. Each has a different retention period and disk size. Read [Understand Your Setup Size](https://docs.victoriametrics.com/guides/understand-your-setup-size/) to estimate how much space you will need for each group. The following table is shown as an example:
| Group | Retention Period | Total disk size |
|--------------|------------------|-----------------------|
| `vmcluster-a` | 3 months (`3M`) | 80 Gi |
| `vmcluster-b` | 1 year (`1Y`) | 300 Gi |
| `vmcluster-c` | 3 years (`3Y`) | 900 Gi |
Create a Helm values file for Group A.
```shell
cat <<EOF > vmcluster-a.yaml
vmstorage:
enabled: true
replicaCount: 1
persistence:
size: 80Gi
extraArgs:
retentionPeriod: 3M
podLabels:
retention-group: a
vminsert:
enabled: true
podLabels:
retention-group: a
vmselect:
enabled: false
EOF
```
The values file above creates vminsert and vmstorage services while turning off vmselect, which we'll deploy separately. The `retentionPeriod` flag configures how long data is kept in this group.
Create the values files for Group B and Group C:
```shell
cat <<EOF > vmcluster-b.yaml
vmstorage:
enabled: true
replicaCount: 1
persistence:
size: 300Gi
extraArgs:
retentionPeriod: 1y
podLabels:
retention-group: b
vminsert:
enabled: true
podLabels:
retention-group: b
vmselect:
enabled: false
EOF
cat <<EOF > vmcluster-c.yaml
vmstorage:
enabled: true
replicaCount: 1
persistence:
size: 900Gi
extraArgs:
retentionPeriod: 3y
podLabels:
retention-group: c
vminsert:
enabled: true
podLabels:
retention-group: c
vmselect:
enabled: false
EOF
```
Deploy the three storage groups with:
```shell
helm upgrade --install vmcluster-a vm/victoria-metrics-cluster -f vmcluster-a.yaml
helm upgrade --install vmcluster-b vm/victoria-metrics-cluster -f vmcluster-b.yaml
helm upgrade --install vmcluster-c vm/victoria-metrics-cluster -f vmcluster-c.yaml
# Wait for all storage pods to be ready
kubectl rollout status statefulset -l app.kubernetes.io/instance=vmcluster-a
kubectl rollout status statefulset -l app.kubernetes.io/instance=vmcluster-b
kubectl rollout status statefulset -l app.kubernetes.io/instance=vmcluster-c
```
### Step 2: Deploying vmselect {#step2}
Next, we'll deploy a vmselect service to route queries to the storage groups.
Create a Helm values file with:
```shell
cat <<EOF >vmselect.yaml
vmstorage:
enabled: false
vminsert:
enabled: false
vmselect:
enabled: true
replicaCount: 1
suppressStorageFQDNsRender: true
extraArgs:
# Each list item is a single -storageNode flag. In this example, there is
# one vmstorage pod per retention group, so each entry contains a single host.
# If you run multiple pods per group, list them as comma-separated hosts
# in the same -storageNode value.
#
# The FQDN format is:
# <pod>.<svc>.default.svc
# where pod = <release>-victoria-metrics-cluster-vmstorage-<N>
# and svc = <release>-victoria-metrics-cluster-vmstorage
storageNode:
- "vmcluster-a-victoria-metrics-cluster-vmstorage-0.vmcluster-a-victoria-metrics-cluster-vmstorage.default.svc:8401"
- "vmcluster-b-victoria-metrics-cluster-vmstorage-0.vmcluster-b-victoria-metrics-cluster-vmstorage.default.svc:8401"
- "vmcluster-c-victoria-metrics-cluster-vmstorage-0.vmcluster-c-victoria-metrics-cluster-vmstorage.default.svc:8401"
EOF
```
Let's break down the file above:
- Deploys vmselect as a separate Helm release.
- Disables vminsert and vmstorage as these services were already deployed in Step 1.
- `suppressStorageFQDNsRender: true` turns off automatic FQDN generation for storage nodes. By default, the Helm chart auto-generates `-storageNodes` flags, but since `vmstorage` has been disabled, we need to supply them manually in `extraArgs`.
- In `extraArgs.storageNode:` we define the vmstorage endpoints for queries. On querying, vmselect merges results across all the specified vmstorages to provide a unified view of the data.
Deploy the `vmselect` release with:
```shell
helm upgrade --install vmselect vm/victoria-metrics-cluster -f vmselect.yaml
```
### Step 3: Deploying vmagent {#step3}
We'll use `vmagent` to route incoming metrics to the correct retention group. For example, we can use a `retention` label for mapping metrics to storage groups in the following way:
| `retention` label | Storage Group |
|-------------------|--------------|
| `"3mo"` | `vmcluster-a` |
| `"1yr"` | `vmcluster-b` |
| `"3yr"` | `vmcluster-c` |
Create the values file for vmagent:
```shell
cat <<EOF >vmagent.yaml
service:
enabled: true
remoteWrite:
# Group A: receives metrics with retention="3mo"
- url: http://vmcluster-a-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus/api/v1/write
urlRelabelConfig:
- if: '{retention="3mo"}'
action: keep
# Group B: receives metrics with retention="1yr"
- url: http://vmcluster-b-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus/api/v1/write
urlRelabelConfig:
- if: '{retention="1yr"}'
action: keep
# Group C: receives metrics with retention="3yr"
- url: http://vmcluster-c-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus/api/v1/write
urlRelabelConfig:
- if: '{retention="3yr"}'
action: keep
EOF
```
> Metrics without a matching `retention` label are silently dropped by the `keep` rules. You must ensure that every metric is labeled, or use a different routing configuration.
Now deploy the vmagent release:
```shell
helm upgrade --install vmagent vm/victoria-metrics-agent -f vmagent.yaml
```
Wait for vmagent to become ready:
```shell
kubectl rollout status deploy/vmagent-victoria-metrics-agent
```
### Step 4: Verification
We can send test data to verify that the data is flowing to the correct storage group.
First, port-forward vmagent and vmselect:
```shell
VMAGENT_SVC=$(kubectl get svc -l app.kubernetes.io/instance=vmagent -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward "svc/$VMAGENT_SVC" 8429 &
VMSELECT_SVC=$(kubectl get svc -l app.kubernetes.io/instance=vmselect -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward "svc/$VMSELECT_SVC" 8481 &
```
Send test metrics directly to vmagent's HTTP endpoint to exercise all three retention labels:
```shell
POD=$(kubectl get pod -l app.kubernetes.io/instance=vmagent -o jsonpath='{.items[0].metadata.name}')
for retention in 3mo 1yr 3yr; do
kubectl exec "$POD" -- wget -qO- --post-data="test_routing{retention=\"${retention}\"} 1.0" \
"http://127.0.0.1:8429/api/v1/import/prometheus"
done
```
Query the data back from vmselect (it may take around 30-60 seconds for new data to be available for queries):
```shell
for retention in 3mo 1yr 3yr; do
echo "-> retention=${retention}"
curl -s "http://localhost:8481/select/0/prometheus/api/v1/query" \
--data-urlencode "query=test_routing{retention=\"${retention}\"}"
echo
done
```
You can also check that vmagent is forwarding data to all three groups:
```shell
curl -s http://localhost:8429/metrics | grep vmagent_remotewrite_blocks_sent_total
```
Each `url="N:secret-url"` corresponds to one `remoteWrite` entry (N=1 for Group A, N=2 for Group B, N=3 for Group C). Non-zero values confirm data is flowing.
## Alternative Routing by Existing Labels
The example setup above relies on a synthetic `retention` label to exist in every incoming metric.
If having a `retention` label in every metric isn't practical, you can, as an alternative, rely on existing labels to map data to the correct storage group.
The following example configures vmagent to route metrics based on the `environment` and `team` labels:
```yaml
# vmagent.yaml
remoteWrite:
# send dev and staging data to Group A
- url: "http://vmcluster-a-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus/api/v1/write"
urlRelabelConfig:
- if: {environment=~"dev|staging"}
action: keep
# send prod data to Group B
- url: "http://vmcluster-b-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus/api/v1/write"
urlRelabelConfig:
- if: {environment=~"prod|production"}
action: keep
# send data from Infra and SRE teams to Group C
- url: "http://vmcluster-c-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus/api/v1/write"
urlRelabelConfig:
- if: {team=~"infra|sre"}
action: keep
```
> Metrics that do not match any of the `keep` rules are dropped in the configuration above.
## Additional Enhancements
You can set up [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/) to route data to the specified vminsert group based on the required retention or to restrict which data different users can query.
The following [`-auth.config`](https://docs.victoriametrics.com/victoriametrics/vmauth/#quick-start) example exposes the same vmselect backend via vmauth with two users using basic auth:
- `admin`: can query **all** data across all retention groups.
- `dev`: can query **only** time series that have `team="dev"` label, enforced via the `extra_label` query argument.
```yaml
users:
# User with access to all data across all retention groups
- username: "admin"
password: "foo"
url_map:
- src_paths:
- "/api/v1/query"
- "/api/v1/query_range"
- "/api/v1/series"
- "/api/v1/labels"
- "/api/v1/label/.+/values"
# vmselect service that aggregates all vmstorage groups
url_prefix: "http://vmselect-victoria-metrics-cluster-vmselect:8481/select/0/prometheus"
# User restricted to Dev team data only
- username: "dev"
password: "bar"
url_map:
- src_paths:
- "/api/v1/query"
- "/api/v1/query_range"
- "/api/v1/series"
- "/api/v1/labels"
- "/api/v1/label/.+/values"
# Same vmselect backend, but enforce label filter at query time
# by adding extra_label=team=dev to every proxied request
url_prefix: "http://vmselect-victoria-metrics-cluster-vmselect:8481/select/0/prometheus/?extra_label=team=dev"
```
This is useful for restricting access by team, environment, or tenant without changing the underlying storage topology.

View File

@@ -155,15 +155,15 @@ These services will store and query the metrics scraped by vmagent.
# compose.yaml
services:
vmstorage:
image: victoriametrics/vmstorage:v1.145.0-cluster
image: victoriametrics/vmstorage:v1.146.0-cluster
vminsert:
image: victoriametrics/vminsert:v1.145.0-cluster
image: victoriametrics/vminsert:v1.146.0-cluster
command:
- -storageNode=vmstorage:8400
vmselect:
image: victoriametrics/vmselect:v1.145.0-cluster
image: victoriametrics/vmselect:v1.146.0-cluster
command:
- -storageNode=vmstorage:8401
ports:
@@ -196,7 +196,7 @@ Add the vmauth service to `compose.yaml`:
# compose.yaml
services:
vmauth:
image: victoriametrics/vmauth:v1.145.0-enterprise
image: victoriametrics/vmauth:v1.146.0-enterprise
ports:
- 8427:8427
volumes:
@@ -251,7 +251,7 @@ Add the vmagent service to `compose.yaml` with OAuth2 configuration:
# compose.yaml
services:
vmagent:
image: victoriametrics/vmagent:v1.145.0
image: victoriametrics/vmagent:v1.146.0
volumes:
- ./scrape.yaml:/etc/vmagent/config.yaml
command:

View File

@@ -107,7 +107,7 @@ The final piece is the Docker Compose file. This ties all the services together
# compose.yml
services:
victoriametrics:
image: victoriametrics/victoria-metrics:v1.145.0
image: victoriametrics/victoria-metrics:v1.146.0
command:
- "--storageDataPath=/victoria-metrics-data"
- "--selfScrapeInterval=10s"
@@ -128,7 +128,7 @@ services:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
vmalert:
image: victoriametrics/vmalert:v1.145.0
image: victoriametrics/vmalert:v1.146.0
depends_on:
- victoriametrics
- alertmanager

View File

@@ -19,6 +19,7 @@ See also [case studies](https://docs.victoriametrics.com/victoriametrics/casestu
* [Datanami: Why Roblox Picked VictoriaMetrics for Observability Data Overhaul](https://www.hpcwire.com/bigdatawire/2023/05/30/why-roblox-picked-victoriametrics-for-observability-data-overhaul/)
* [Cloudflare: Introducing notifications for HTTP Traffic Anomalies](https://blog.cloudflare.com/introducing-http-traffic-anomalies-notifications/)
* [Grammarly: Better, Faster, Cheaper: How Grammarly Improved Monitoring by Over 10x with VictoriaMetrics](https://www.grammarly.com/blog/engineering/monitoring-with-victoriametrics/)
* [Xata: How we rebuilt PostgreSQL branch metrics on VictoriaMetrics, per cell](https://xata.io/blog/how-we-rebuilt-postgresql-branch-metrics-on-victoriametrics-per-cell)
* [CERN: CMS monitoring R&D: Real-time monitoring and alerts](https://indico.cern.ch/event/877333/contributions/3696707/attachments/1972189/3281133/CMS_mon_RD_for_opInt.pdf)
* [CERN: The CMS monitoring infrastructure and applications](https://arxiv.org/pdf/2007.03630.pdf)
* [Forbes: The (Almost) Infinitely Scalable Open Source Monitoring Dream](https://www.forbes.com/sites/adrianbridgwater/2022/08/16/the-almost-infinitely-scalable-open-source-monitoring-dream/)

View File

@@ -28,7 +28,7 @@ If you like VictoriaMetrics and want to contribute, then it would be great:
## Issues
When making a new issue, make sure to create no duplicates. Use GitHub search to find whether similar issues exist already.
The new issue should be written in English and contain concise description of the problem and environment where it exists.
The new issue should be written in English and contain a concise description of the problem and the environment where it exists.
We'd very much prefer to have a specific use-case included in the description, since it could have workaround or alternative solutions.
When looking for an issue to contribute, always prefer working on [bugs](https://github.com/VictoriaMetrics/VictoriaMetrics/issues?q=is%3Aopen+is%3Aissue+label%3Abug)
@@ -48,7 +48,7 @@ We use [labels](https://docs.github.com/en/issues/using-labels-and-milestones-to
1. `need more info`, assigned to issues that require elaboration from the issue creator.
For example, if we weren't able to reproduce the reported bug based on the ticket description then we ask additional
questions which could help to reproduce the issue and add `need more info` label. This label helps other maintainers
to understand that this issue wasn't forgotten but waits for the feedback from user.
to understand that this issue wasn't forgotten but waits for the feedback from the user.
1. `completed`, assigned to issues that required code changes and those changes were merged to upstream, but not released yet.
Once a release is made, maintainers go through all labeled issues, leave a comment about the new release, and close the issue.
1. `vmui`, assigned to issues related to [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) or [VictoriaLogs webui](https://docs.victoriametrics.com/victorialogs/querying/#web-ui)
@@ -63,32 +63,46 @@ Pull requests requirements:
1. Don't use `master` branch for making PRs, as it makes it impossible for reviewers to modify the changes.
1. All commits need to be [signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits).
1. Pull request title should be prefixed with `<dir>/<component>:` to show what component has been changed, i.e. `app/vmalert: fix...`.
Pull request description should contain clear and concise description of what was done, why it is needed and for what purpose.
Pull request description should contain a clear and concise description of what was done, why it is needed and for what purpose.
Use clear language, so reviewers can quickly understand the change and its impact.
1. A link to the issue(s) related to the change, if any. Use `Fixes [issue link]` if the PR resolves the issue, or `Related to [issue link]` for reference.
1. Tests proving that the change is effective. Tests are expected for non-trivial new functionality or non-trivial modifications.
Bug fixes must include tests unless a maintainer explicitly agrees otherwise.
See [this style guide](https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e) for tests.
To run tests and code checks locally, execute commands `make test-full` and `make check-all`.
See [this style guide](https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e) for tests. See [this section](#testing) for how to run tests.
1. Try to not extend the scope of the pull requests outside the issue, do not make unrelated changes.
1. Update [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/docs) if needed. For example, adding a new flag or changing behavior of existing flags or features
1. Update [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/docs) if needed. For example, adding a new flag or changing the behavior of existing flags or features
requires reflecting these changes in the documentation. For new features add `{{%/* available_from "#" */%}}` shortcode to the documentation.
It will be later automatically replaced with an actual release version.
1. A line in the [changelog](https://docs.victoriametrics.com/victoriametrics/changelog/#tip) mentioning the change and related issue in a way
that would be clear to other readers even if they don't have the full context.
1. Avoid modifying code in the `/vendor` folder manually, even when the vendored package originates are from the VictoriaMetrics GitHub organization.
1. Avoid modifying code in the `/vendor` folder manually, even when the vendored package originates from the VictoriaMetrics GitHub organization.
For instance, VictoriaLogs vendors packages under the `/lib` folder from VictoriaMetrics, and VictoriaTraces vendors the `/lib/logstorage` package from VictoriaLogs.
Submit a pull request to the upstream repository first. Afterward, a separate pull request can be opened to update the version of the vendored folder in downstream repository.
Submit a pull request to the upstream repository first. Afterward, a separate pull request can be opened to update the version of the vendored folder in the downstream repository.
* For common packages, the vendored package can be updated with this command: `go get <dependency>@vX.Y.Z`.
* For VictoriaMetrics packages, use `go get <dependency>@canonical_commit_hash`.
Finally, run `go mod tidy` and `go mod vendor` to update `go.mod`, `go.sum`, and `/vendor`.
1. Ping reviewers who you think have the best expertise on the matter.
See good example of a [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6487).
See a good example of a [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6487).
## AI policy
You are free to use any AI tools when working on a contribution, on code,
documentation, issues, or anything else. You do not need to disclose whether or
how you used them.
With or without the help of AI, you are responsible for the changes you submit.
Take the effort to understand the code base and every change in your pull request,
and clean up any AI slop before sending it. Do not use AI to automate your
responses to maintainers.
We review contributions on their quality, regardless of how they were produced. A
pull request or issue that looks like unreviewed AI output, with low-quality or
broken changes, may be closed without a detailed review or triage.
## Merging Pull Request
The person who merges the Pull Request is responsible for satisfying requirements below:
The person who merges the Pull Request is responsible for satisfying the requirements below:
1. Make sure that PR satisfies [Pull Request checklist](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist),
it is approved by at least one reviewer, all CI checks are green.
@@ -97,9 +111,9 @@ The person who merges the Pull Request is responsible for satisfying requirement
1. If applicable, cherry-pick the change to [LTS release lines](https://docs.victoriametrics.com/victoriametrics/lts-releases/)
and mention in the PR comment what was or wasn't cherry-picked.
1. Update related issues with a meaningful message of what has changed and when it will be
released. _This helps users to understand the change without reading PR._
released. _This helps users to understand the change without reading the PR._
1. Add label `completed` to related issues.
1. Do not close related tickets until release is made. If ticket was auto-closed by GitHub or user - re-open it.
1. Do not close related tickets until the release is made. If the ticket was auto-closed by GitHub or a user - re-open it.
## KISS principle
@@ -115,9 +129,9 @@ We are open to third-party pull requests provided they follow [KISS design princ
- Minimize the number of moving parts in the distributed system.
- Avoid automated decisions, which may hurt cluster availability, consistency, performance or debuggability.
Adhering to `KISS` principle, simplifies the resulting code and architecture so it can be reviewed, understood and debugged by a wider audience.
Adhering to the `KISS` principle, simplifies the resulting code and architecture so it can be reviewed, understood and debugged by a wider audience.
Due to `KISS`, [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) has none of the following "features" popular in distributed computing world:
Due to `KISS`, [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) has none of the following "features" popular in distributed computing:
- Fragile gossip protocols. See [failed attempt in Thanos](https://github.com/improbable-eng/thanos/blob/030bc345c12c446962225221795f4973848caab5/docs/proposals/completed/201809_gossip-removal.md).
- Hard-to-understand-and-implement-properly [Paxos protocols](https://www.quora.com/In-distributed-systems-what-is-a-simple-explanation-of-the-Paxos-algorithm).
@@ -126,3 +140,17 @@ Due to `KISS`, [cluster version of VictoriaMetrics](https://docs.victoriametrics
- Automatic cluster resizing, which may cost you a lot of money if improperly configured.
- Automatic discovering and addition of new nodes in the cluster, which may mix data between dev and prod clusters :)
- Automatic leader election, which may result in split brain disaster on network errors.
## Testing
We recommend running the following sequence of checks and tests before submitting a pull request:
```sh
# run static checks
make check-all
# run unit test
make test-full
# run integration tests
make apptest
```

View File

@@ -61,9 +61,9 @@ Download the newest available [VictoriaMetrics release](https://docs.victoriamet
from [DockerHub](https://hub.docker.com/r/victoriametrics/victoria-metrics) or [Quay](https://quay.io/repository/victoriametrics/victoria-metrics?tab=tags):
```sh
docker pull victoriametrics/victoria-metrics:v1.145.0
docker pull victoriametrics/victoria-metrics:v1.146.0
docker run -it --rm -v `pwd`/victoria-metrics-data:/victoria-metrics-data -p 8428:8428 \
victoriametrics/victoria-metrics:v1.145.0 --selfScrapeInterval=5s -storageDataPath=victoria-metrics-data
victoriametrics/victoria-metrics:v1.146.0 --selfScrapeInterval=5s -storageDataPath=victoria-metrics-data
```
_For Enterprise images, see [this link](https://docs.victoriametrics.com/victoriametrics/enterprise/#docker-images)._

View File

@@ -245,7 +245,7 @@ The following steps must be performed during the upgrade / downgrade procedure:
* Wait until the process stops. This can take a few seconds.
* Start the upgraded VictoriaMetrics.
Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](https://grafana.com/blog/2019/03/25/whats-new-in-prometheus-2.8-wal-based-remote-write/) for details. The same applies also to [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/).
Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](https://grafana.com/blog/whats-new-in-prometheus-2-8-wal-based-remote-write/) for details. The same applies also to [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/).
> If you'd prefer not to manage upgrades yourself, [VictoriaMetrics Cloud](https://console.victoriametrics.cloud/signUp?utm_source=website&utm_campaign=docs_vm_single_upgrade)
> performs version upgrades automatically during maintenance windows with no action required on your part.
@@ -417,7 +417,7 @@ VictoriaMetrics is configured via command-line flags, so it must be restarted wh
* Wait until the process stops. This can take a few seconds.
* Start VictoriaMetrics with the new command-line flags.
Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](https://grafana.com/blog/2019/03/25/whats-new-in-prometheus-2.8-wal-based-remote-write/) for details. The same applies also to [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/).
Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](https://grafana.com/blog/whats-new-in-prometheus-2-8-wal-based-remote-write/) for details. The same applies also to [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/).
## How to scrape Prometheus exporters such as [node-exporter](https://github.com/prometheus/node_exporter)
@@ -478,6 +478,11 @@ and [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyco
to the given number of digits after the decimal point.
For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
VictoriaMetrics accepts `optimize_repeated_binary_op_subexprs=1` query arg for [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query)
handler. It allows `vmselect` to execute left and right sides of binary operators sequentially when they contain the same
optimized aggregate rollup result expression, so the second side may reuse the rollup result cache populated by the first side.
The optimization is disabled by default and applies only when rollup result cache can be used for the request.
VictoriaMetrics accepts `limit` query arg for [/api/v1/labels](https://docs.victoriametrics.com/victoriametrics/url-examples/#apiv1labels)
and [`/api/v1/label/<labelName>/values`](https://docs.victoriametrics.com/victoriametrics/url-examples/#apiv1labelvalues) handlers for limiting the number of returned entries.
For example, the query to `/api/v1/labels?limit=5` returns a sample of up to 5 unique labels, while ignoring the rest of labels.
@@ -1712,64 +1717,63 @@ The following versions of VictoriaMetrics receive regular security fixes:
| [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/) | ✅ |
| other releases | ❌ |
### Software Bill of Materials (SBOM)
Every VictoriaMetrics container{{% available_from "v1.137.0" %}} image published to
[Docker Hub](https://hub.docker.com/u/victoriametrics) and [Quay.io](https://quay.io/organization/victoriametrics) include an [SPDX](https://spdx.dev/) SBOM attestation generated automatically by BuildKit during `docker buildx build`.
To inspect the SBOM for an image:
```sh
docker buildx imagetools inspect \
docker.io/victoriametrics/victoria-metrics:latest \
--format "{{ json .SBOM }}"
```
To scan an image using its SBOM attestation with [Trivy](https://github.com/aquasecurity/trivy):
```sh
trivy image --sbom-sources oci \
docker.io/victoriametrics/victoria-metrics:latest
```
### Reporting a Vulnerability
Please report any security issues to <security@victoriametrics.com>
### CVE handling policy
**Source code:** Go dependencies are scanned by [govulncheck](https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck) in CI.
All vulnerabilities must be fixed before the next scheduled release and backported to [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/).
**Docker images:** CVE findings in the [Alpine](https://security.alpinelinux.org/) base image pose minimal risk since VictoriaMetrics binaries are statically compiled with no OS dependencies.
When detected, only the Alpine base tag is updated.
Releases proceed as planned even if upstream fixes are not yet available.
For maximum security, hardened [scratch](https://hub.docker.com/_/scratch)-based images are also provided.
All images are continuously scanned by Docker Hub and verified before release using [grype](https://github.com/anchore/grype).
### General security recommendations:
* All the VictoriaMetrics components must run in protected private networks without direct access from untrusted networks such as Internet.
* All VictoriaMetrics components must run in protected private networks without direct access from untrusted networks such as the Internet.
The exception is [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/) and [vmgateway](https://docs.victoriametrics.com/victoriametrics/vmgateway/),
which are intended for serving public requests and performing authorization with [TLS termination](https://en.wikipedia.org/wiki/TLS_termination_proxy).
* All the requests from untrusted networks to VictoriaMetrics components must go through auth proxy such as [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/)
* All the requests from untrusted networks to VictoriaMetrics components must go through an auth proxy, such as [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/)
or [vmgateway](https://docs.victoriametrics.com/victoriametrics/vmgateway/). The proxy must be set up with proper authentication and authorization.
* Prefer using lists of allowed API endpoints, while disallowing access to other endpoints when configuring [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/)
in front of VictoriaMetrics components.
* Set reasonable [`Strict-Transport-Security`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security) header value to all the components to mitigate [MitM attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack), for example: `max-age=31536000; includeSubDomains`. See `-http.header.hsts` flag.
* Set a reasonable [`Strict-Transport-Security`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security) header value on all the components to mitigate [MitM attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack), for example: `max-age=31536000; includeSubDomains`. See `-http.header.hsts` flag.
* Set reasonable [`Content-Security-Policy`](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) header value to mitigate [XSS attacks](https://en.wikipedia.org/wiki/Cross-site_scripting). See `-http.header.csp` flag.
* Set reasonable [`X-Frame-Options`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options) header value to mitigate [clickjacking attacks](https://en.wikipedia.org/wiki/Clickjacking), for example `DENY`. See `-http.header.frameOptions` flag.
VictoriaMetrics provides the following security-related command-line flags:
The following security-related command-line flags are available for all components with HTTP API:
* `-tls`, `-tlsCertFile` and `-tlsKeyFile` for switching from HTTP to HTTPS at `-httpListenAddr` (TCP port 8428 is listened by default).
* `-tls`, `-tlsCertFile` and `-tlsKeyFile` for switching from HTTP to HTTPS at `-httpListenAddr`.
[Enterprise version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/enterprise/) supports automatic issuing of TLS certificates.
See [these docs](#automatic-issuing-of-tls-certificates).
* `-mtls` and `-mtlsCAFile` for enabling [mTLS](https://en.wikipedia.org/wiki/Mutual_authentication) for requests to `-httpListenAddr`. See [these docs](#mtls-protection).
* `-httpAuth.username` and `-httpAuth.password` for protecting all the HTTP endpoints
with [HTTP Basic Authentication](https://en.wikipedia.org/wiki/Basic_access_authentication).
* `-http.header.hsts`, `-http.header.csp`, and `-http.header.frameOptions` for serving `Strict-Transport-Security`, `Content-Security-Policy`
and `X-Frame-Options` HTTP response headers.
### Protecting service endpoints
All VictoriaMetrics components expose internal metrics in Prometheus exposition format at the `/metrics` page for [#Monitoring](https://docs.victoriametrics.com/victoriametrics/#monitoring).
Consider limiting access to the `/metrics` page to trusted networks only.
The following service endpoints may require protection:
* `-deleteAuthKey` for protecting the `/api/v1/admin/tsdb/delete_series` endpoint. See [how to delete time series](#how-to-delete-time-series).
* `-snapshotAuthKey` for protecting the `/snapshot*` endpoints. See [how to work with snapshots](#how-to-work-with-snapshots).
* `-forceFlushAuthKey` for protecting the `/internal/force_flush` endpoint. See [force flush docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#forced-flush).
* `-forceMergeAuthKey` for protecting the `/internal/force_merge` endpoint. See [force merge docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#forced-merge).
* `-search.resetCacheAuthKey` for protecting the `/internal/resetRollupResultCache` endpoint. See [backfilling](#backfilling) for more details.
* `-reloadAuthKey` for protecting the `/-/reload` endpoint, which is used for force reloading of [`-promscrape.config`](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* `-reloadAuthKey` for protecting the `/-/reload` endpoint, which is used to force reload the [`-promscrape.config`](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* `-configAuthKey` for protecting the `/config` endpoint, since it may contain sensitive information such as passwords.
* `-flagsAuthKey` for protecting the `/flags` endpoint.
* `-pprofAuthKey` for protecting the `/debug/pprof/*` endpoints, which can be used for [profiling](#profiling).
* `-metricNamesStatsResetAuthKey` for protecting the `/api/v1/admin/status/metric_names_stats/reset` endpoint, used for [Metric Names Tracker](#track-ingested-metrics-usage).
* `-denyQueryTracing` for disallowing [query tracing](#query-tracing).
* `-http.header.hsts`, `-http.header.csp`, and `-http.header.frameOptions` for serving `Strict-Transport-Security`, `Content-Security-Policy`
and `X-Frame-Options` HTTP response headers.
Explicitly set internal network interface for TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats.
For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<internal_iface_ip>:2003`. This protects from unexpected requests from untrusted network interfaces.
@@ -1777,17 +1781,6 @@ For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<i
See also [security recommendation for VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#security)
and [the general security page at VictoriaMetrics website](https://victoriametrics.com/security/).
### CVE handling policy
**Source code:** Go dependencies are scanned by [govulncheck](https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck) in CI.
All vulnerabilities must be fixed before next scheduled release and backported to [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/).
**Docker images:** CVE findings in [Alpine](https://security.alpinelinux.org/) base image pose minimal risk since VictoriaMetrics binaries are statically compiled with no OS dependencies.
When detected, only the Alpine base tag is updated.
Releases proceed as planned even if upstream fixes are not yet available.
For maximum security, hardened [scratch](https://hub.docker.com/_/scratch)-based images are also provided.
All images are continuously scanned by Docker Hub and verified before release using [grype](https://github.com/anchore/grype).
### mTLS protection
By default `VictoriaMetrics` accepts http requests at `8428` port (this port can be changed via `-httpListenAddr` command-line flags).
@@ -1817,19 +1810,39 @@ This functionality can be evaluated for free according to [these docs](https://d
See also [security recommendations](#security).
### Software Bill of Materials (SBOM)
Every VictoriaMetrics container{{% available_from "v1.137.0" %}} image published to
[Docker Hub](https://hub.docker.com/u/victoriametrics) and [Quay.io](https://quay.io/organization/victoriametrics) include an [SPDX](https://spdx.dev/) SBOM attestation generated automatically by BuildKit during `docker buildx build`.
To inspect the SBOM for an image:
```sh
docker buildx imagetools inspect \
docker.io/victoriametrics/victoria-metrics:latest \
--format "{{ json .SBOM }}"
```
To scan an image using its SBOM attestation with [Trivy](https://github.com/aquasecurity/trivy):
```sh
trivy image --sbom-sources oci \
docker.io/victoriametrics/victoria-metrics:latest
```
## Tuning
* No need in tuning for VictoriaMetrics - it uses reasonable defaults for command-line flags,
* No need to tune for VictoriaMetrics - it uses reasonable defaults for command-line flags,
which are automatically adjusted for the available CPU and RAM resources.
* No need in tuning for Operating System - VictoriaMetrics is optimized for default OS settings.
* No need to tune for Operating System - VictoriaMetrics is optimized for default OS settings.
The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a).
The recommendation is not specific for VictoriaMetrics only but also for any service which handles many HTTP connections and stores data on disk.
* VictoriaMetrics is a write-heavy application and its performance depends on disk performance. So be careful with other
The recommendation is not specific to VictoriaMetrics only, but also for any service that handles many HTTP connections and stores data on disk.
* VictoriaMetrics is a write-heavy application, and its performance depends on disk performance. So be careful with other
applications or utilities (like [fstrim](https://manpages.ubuntu.com/manpages/lunar/en/man8/fstrim.8.html))
which could [exhaust disk resources](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1521).
* The recommended filesystem is `ext4`, the recommended persistent storage is [persistent HDD-based disk on GCP](https://cloud.google.com/compute/docs/disks/#pdspecs),
since it is protected from hardware failures via internal replication and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
If you plan to store more than 1TB of data on `ext4` partition, then the following options are recommended to pass to `mkfs.ext4`:
since it is protected from hardware failures via internal replication, and it can be [resized on the fly](https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).
If you plan to store more than 1TB of data on an `ext4` partition, then the following options are recommended to pass to `mkfs.ext4`:
```sh
mkfs.ext4 ... -O 64bit,huge_file,extent -T huge

View File

@@ -26,7 +26,48 @@ See also [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-rel
## tip
* BUGFIX: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): fix issue with producing aggregated samples with identical timestamps between flushes. See PR [#10808](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10808) for details.
* SECURITY: upgrade base docker image (Alpine) from 3.23.4 to 3.24.1. See [Alpine 3.24.1 release notes](https://www.alpinelinux.org/posts/Alpine-3.24.1-released.html).
* FEATURE: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): expose `vm_max_memory_per_query` metric reflecting the `-search.maxMemoryPerQuery` limit. Create `VMSelectConcurrentQueriesExceedMemoryLimit` alert to warn when OOMs are possible due to misconfiguration of `-search.maxMemoryPerQuery` and max concurrent queries.
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): add `default_vm_access_claim` field into `jwt` section of auth config. It could be used at [JWT claim placeholders](https://docs.victoriametrics.com/victoriametrics/vmauth/#jwt-claim-based-request-templating), if `JWT` token doesn't have `vm_access` claim. See [#11054](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11054).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): reduces CPU usage by 10% at [sharding among remote storages](https://docs.victoriametrics.com/victoriametrics/vmagent/#sharding-among-remote-storages). See [#11113](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11113). Thanks to @bennf for contribution.
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): add `optimize_repeated_binary_op_subexprs=1` query arg to [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query) for executing binary operator sides sequentially when they share the same optimized aggregate rollup result expression. This allows the second side to reuse rollup result cache populated by the first side. See [#10575](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10575).
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): prevent possible password brute-force attacks with an artificial 2-3 second delay as recommended by [OWASP](https://owasp.org/Top10/2025/A07_2025-Authentication_Failures). See [#11180](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11180).
* BUGFIX: all VictoriaMetrics components: cancel in-flight HTTP requests shortly before `-http.maxGracefulShutdownDuration` elapses during graceful shutdown, so they can drain and the shutdown completes cleanly within that window instead of timing out and exiting via `logger.Fatalf` -> `os.Exit`. This prevents skipping the storage flush and losing in-memory data when long-lived requests are in flight (such as VictoriaLogs live tailing). See [#1502](https://github.com/VictoriaMetrics/VictoriaLogs/issues/1502).
* BUGFIX: `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fixes unexpected rare rerouting. See [#11162](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11162).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): propagate cache reset operation to `selectNode` when `/internal/resetRollupResultCache` is called. Previously, the propagation only happened when the `delete_series` API was called. See [#11112](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11112).
* BUGFIX: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): fix possible unexpected increases in `rate_avg` and `rate_sum` if an out-of-order sample is ingested after the previous flush. See [#11140](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11140).
## [v1.146.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.146.0)
Released at 2026-06-22
* FEATURE: all VictoriaMetrics components: add `-http.header.disableServerHostname` command-line flag for disabling the `X-Server-Hostname` HTTP response header. See [#11067](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11067). Thanks to @zasdaym for contribution.
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): expose `vm_streamaggr_dedup_dropped_samples_total` to allow tracking dropped old samples during [deduplication](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication).
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): use the aggregation rule interval as the default [staleness_interval](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#staleness) instead of `2*interval`, to reduce spikes when there are gaps between received samples. See [#11102](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11102).
* FEATURE: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): add new aggregation output [sum_samples_total](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#sum_samples_total) for summing input delta values into a cumulative counter. See issues [#11002](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11002) and [#4843](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4843).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add a new flag `-remoteWrite.inmemoryQueues` to prioritize recently ingested data over historical data stored at file-based [persistent queue](https://docs.victoriametrics.com/victoriametrics/vmagent/#on-disk-persistence-and-data-processing-order). See [#8833](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8833)
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add `-promscrape.cluster.shardByLabels` command-line flag for selecting target labels used for sharding scrape targets among `vmagent` instances in cluster mode. See [#11044](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11044).
* FEATURE: [vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/): add `-vm-headers` and `-vm-bearer-token` flags for authenticating requests to the VictoriaMetrics import destination. The flags are available in `opentsdb`, `influx`, `remote-read`, `prometheus`, `mimir`, and `thanos` vmctl sub-commands. See [#8897](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8897).
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): log calls to [/api/v1/admin/tsdb/delete_series](https://docs.victoriametrics.com/victoriametrics/url-examples/#apiv1admintsdbdelete_series) API handler. This should help to identify events of metrics deletion from the database. See [#11104](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11104).
* FEATURE: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): add the `last` value to graph legend statistics. See [#10759](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10759).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): add support for [Monitoring Data eXchange (MDX)](https://docs.victoriametrics.com/victoriametrics/vmagent/#monitoring-data-exchange): the ability to route only metrics from VictoriaMetrics services to a specific `-remoteWrite.url`. MDX is useful for building monitoring-of-monitoring where one remote storage should receive the full metric stream and another should receive only VictoriaMetrics metrics. Enable per destination with `-remoteWrite.mdx.enable=true`. See [#10600](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10600).
* BUGFIX: [enterprise](https://docs.victoriametrics.com/enterprise/) [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly expose metric `vm_retention_filters_partitions_scheduled_rows`. See [#11138](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11138)
* BUGFIX: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): fix issue with producing aggregated samples with identical timestamps between flushes. See [#10808](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10808).
* BUGFIX: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): fix potential corruption of remote-write metadata `Unit` values. See [#11120](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11120). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/),[vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/),[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): fix rare unbounded shutdown delay when config reload takes longer than `-configCheckInterval`. See [#11107](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11107). Thanks to @PleasingFungus for contribution.
* BUGFIX: [vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/): push metrics to configured `-pushmetrics.url` on shutdown when migration fails. Previously, metrics were not pushed if vmctl exited with an error. See [#11081](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11081). Thanks to @zasdaym for contribution.
* BUGFIX: [vmrestore](https://docs.victoriametrics.com/victoriametrics/vmrestore/): disallow restoring parts outside the configured `-storageDataPath` directory. See [710c920d](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/710c920d6083327042a309e449fae4383617d817).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): correctly apply long tenant filters. Previously, such filters could be truncated, causing tenants to be matched incorrectly. See [#11096](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11096). Thanks for @fxrlv for the contribution.
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix corrupted metrics metadata when a response contains multiple rows. See [#11115](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11115). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmbackup](https://docs.victoriametrics.com/vmbackup/), [vmbackupmanager](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/): do not fail backup list if directory is absent while using `fs://` destination to align with other protocols. See [6c3c548](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/6c3c548ddb0385b749e731f52276f130e2a4e4a8)
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): don't cache empty responses for tenant IDs discovery during [multitenant queries](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenant-reads). This problem was visible during integration tests when multitenant queries were executed before the first ingestion happened. See [#10982](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10982)
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly escape `metricFamilyName` at metrics metadata response. See [#11129](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11129). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent more cases of panic during directory deletion on `NFS`-based mounts. See [#11060](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11060).
## [v1.145.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.145.0)
@@ -266,6 +307,25 @@ It enables back `Discovered targets` debug UI by default.
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly apply `extra_filters[]` filter when querying `vm_account_id` or `vm_project_id` labels via [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) request for `/api/v1/label/…/values` API. Before, `extra_filters` was ignored. See [#10503](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10503).
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): revert the use of rollup result cache for [instant queries](https://docs.victoriametrics.com/keyConcepts.html#instant-query) that contain [`rate`](https://docs.victoriametrics.com/MetricsQL.html#rate) function with a lookbehind window larger than `-search.minWindowForInstantRollupOptimization`. The cache usage was removed since [v1.132.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.132.0). See [#10098](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10098#issuecomment-3895011084) for more details.
## [v1.136.12](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.136.12)
Released at 2026-06-19
**v1.136.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
The v1.136.x line will be supported for at least 12 months since [v1.136.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11360) release**
* BUGFIX: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): fix issue with producing aggregated samples with identical timestamps between flushes. See [#10808](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10808).
* BUGFIX: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/): fix potential corruption of remote-write metadata `Unit` values. See [#11120](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11120). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/),[vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/),[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): fix rare unbounded shutdown delay when config reload takes longer than `-configCheckInterval`. See [#11107](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11107). Thanks to @PleasingFungus for contribution.
* BUGFIX: [vmbackup](https://docs.victoriametrics.com/vmbackup/), [vmbackupmanager](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/): do not fail backup list if directory is absent while using `fs://` destination to align with other protocols. See [6c3c548d](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/6c3c548ddb0385b749e731f52276f130e2a4e4a8)
* BUGFIX: [vmctl](https://docs.victoriametrics.com/victoriametrics/vmctl/): push metrics to configured `-pushmetrics.url` on shutdown when migration fails. Previously, metrics were not pushed if vmctl exited with an error. See [#11081](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11081). Thanks to @zasdaym for contribution.
* BUGFIX: [vmrestore](https://docs.victoriametrics.com/victoriametrics/vmrestore/): disallow restoring parts outside the configured `-storageDataPath` directory. See [710c920d](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/710c920d6083327042a309e449fae4383617d817).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix corrupted metrics metadata when a response contains multiple rows. See [#11115](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11115). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly escape `metricFamilyName` at metrics metadata response. See [#11129](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11129). Thanks for @fxrlv for the contribution.
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): correctly apply long tenant filters. Previously, such filters could be truncated, causing tenants to be matched incorrectly. See [#11096](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11096). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent more cases of panic during directory deletion on `NFS`-based mounts. See [#11060](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11060).
## [v1.136.11](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.136.11)
Released at 2026-06-05
@@ -627,6 +687,20 @@ See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/ch
See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/changelog_2025/#v11230)
## [v1.122.25](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.122.25)
Released at 2026-06-19
**v1.122.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
The v1.122.x line will be supported for at least 12 months since [v1.122.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11220) release**
* BUGFIX: [stream aggregation](https://docs.victoriametrics.com/victoriametrics/stream-aggregation/): fix issue with producing aggregated samples with identical timestamps between flushes. See [#10808](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10808).
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/),[vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/),[vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): fix rare unbounded shutdown delay when config reload takes longer than `-configCheckInterval`. See [#11107](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11107). Thanks to @PleasingFungus for contribution.
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix corrupted metrics metadata when a response contains multiple rows. See [#11115](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11115). Thanks for @fxrlv for the contribution.
* BUGFIX: [vmbackup](https://docs.victoriametrics.com/vmbackup/), [vmbackupmanager](https://docs.victoriametrics.com/victoriametrics/vmbackupmanager/): do not fail backup list if directory is absent while using `fs://` destination to align with other protocols. See [6c3c548d](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/6c3c548ddb0385b749e731f52276f130e2a4e4a8)
* BUGFIX: [vmrestore](https://docs.victoriametrics.com/victoriametrics/vmrestore/): disallow restoring parts outside the configured `-storageDataPath` directory. See [710c920d](https://github.com/VictoriaMetrics/VictoriaMetrics/commit/710c920d6083327042a309e449fae4383617d817).
## [v1.122.24](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.122.24)
Released at 2026-06-05

View File

@@ -121,7 +121,7 @@ It is allowed to run Enterprise components in [cases listed here](https://docs.v
Binary releases of Enterprise components are available at [the releases page for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest),
[the releases page for VictoriaLogs](https://github.com/VictoriaMetrics/VictoriaLogs/releases/latest)
and [the releases page for VictoriaTraces](https://github.com/VictoriaMetrics/VictoriaTraces/releases/latest).
Enterprise binaries and packages have `enterprise` suffix in their names. For example, `victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz`.
Enterprise binaries and packages have `enterprise` suffix in their names. For example, `victoria-metrics-linux-amd64-v1.146.0-enterprise.tar.gz`.
In order to run binary release of Enterprise component, please download the `*-enterprise.tar.gz` archive for your OS and architecture
from the corresponding releases page and unpack it. Then run the unpacked binary.
@@ -139,8 +139,8 @@ For example, the following command runs VictoriaMetrics Enterprise binary with t
obtained at [this page](https://victoriametrics.com/products/enterprise/trial/):
```sh
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.145.0/victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz
tar -xzf victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.146.0/victoria-metrics-linux-amd64-v1.146.0-enterprise.tar.gz
tar -xzf victoria-metrics-linux-amd64-v1.146.0-enterprise.tar.gz
./victoria-metrics-prod -license=BASE64_ENCODED_LICENSE_KEY
```
@@ -155,7 +155,7 @@ Alternatively, VictoriaMetrics Enterprise license can be stored in the file and
It is allowed to run Enterprise components in [cases listed here](https://docs.victoriametrics.com/victoriametrics/enterprise/#valid-cases-for-victoriametrics-enterprise).
Docker images for Enterprise components are available at [VictoriaMetrics Docker Hub](https://hub.docker.com/u/victoriametrics) and [VictoriaMetrics Quay](https://quay.io/organization/victoriametrics).
Enterprise docker images have `enterprise` suffix in their names. For example, `victoriametrics/victoria-metrics:v1.145.0-enterprise`.
Enterprise docker images have `enterprise` suffix in their names. For example, `victoriametrics/victoria-metrics:v1.146.0-enterprise`.
In order to run Docker image of VictoriaMetrics Enterprise component, it is required to provide the license key via the command-line
flag as described in the [binary-releases](https://docs.victoriametrics.com/victoriametrics/enterprise/#binary-releases) section.
@@ -165,13 +165,13 @@ Enterprise license key can be obtained at [this page](https://victoriametrics.co
For example, the following command runs VictoriaMetrics Enterprise Docker image with the specified license key:
```sh
docker run --name=victoria-metrics victoriametrics/victoria-metrics:v1.145.0-enterprise -license=BASE64_ENCODED_LICENSE_KEY
docker run --name=victoria-metrics victoriametrics/victoria-metrics:v1.146.0-enterprise -license=BASE64_ENCODED_LICENSE_KEY
```
Alternatively, the license code can be stored in the file and then referred via `-licenseFile` command-line flag:
```sh
docker run --name=victoria-metrics -v /vm-license:/vm-license victoriametrics/victoria-metrics:v1.145.0-enterprise -licenseFile=/path/to/vm-license
docker run --name=victoria-metrics -v /vm-license:/vm-license victoriametrics/victoria-metrics:v1.146.0-enterprise -licenseFile=/path/to/vm-license
```
Example docker-compose configuration:
@@ -181,7 +181,7 @@ version: "3.5"
services:
victoriametrics:
container_name: victoriametrics
image: victoriametrics/victoria-metrics:v1.145.0
image: victoriametrics/victoria-metrics:v1.146.0
ports:
- 8428:8428
volumes:
@@ -213,7 +213,7 @@ is used to provide the license key in plain-text:
```yaml
server:
image:
tag: v1.145.0-enterprise
tag: v1.146.0-enterprise
license:
key: {BASE64_ENCODED_LICENSE_KEY}
@@ -224,7 +224,7 @@ In order to provide the license key via existing secret, the following values fi
```yaml
server:
image:
tag: v1.145.0-enterprise
tag: v1.146.0-enterprise
license:
secret:
@@ -274,7 +274,7 @@ spec:
license:
key: {BASE64_ENCODED_LICENSE_KEY}
image:
tag: v1.145.0-enterprise
tag: v1.146.0-enterprise
```
In order to provide the license key via an existing secret, the following custom resource is used:
@@ -291,7 +291,7 @@ spec:
name: vm-license
key: license
image:
tag: v1.145.0-enterprise
tag: v1.146.0-enterprise
```
Example secret with license key:
@@ -342,7 +342,7 @@ Builds are available for amd64 and arm64 architectures.
Example archive:
`victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz`
`victoria-metrics-linux-amd64-v1.146.0-enterprise.tar.gz`
Includes:
@@ -351,7 +351,7 @@ Includes:
Example Docker image:
`victoriametrics/victoria-metrics:v1.145.0-enterprise-fips` uses the FIPS-compatible binary and based on `scratch` image.
`victoriametrics/victoria-metrics:v1.146.0-enterprise-fips` uses the FIPS-compatible binary and based on `scratch` image.
## What Happens to Licensed Components When a License Expires

Some files were not shown because too many files have changed in this diff Show More