Compare commits

..

303 Commits

Author SHA1 Message Date
Andrii Chubatiuk
305f1c91f8 lib/{fs,filestream}: use single ParallelExecutor for fs and filestream tasks 2025-12-31 11:51:32 +02:00
JAYICE
74b03c93a6 makefile: support vmauth in docs-update-flags command (#10222)
### Describe Your Changes

implement
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10221

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-12-30 19:14:06 +02:00
Max Kotliar
0e9bb5a42d docs: sync flags in docs with acutal binaries 2025-12-30 18:59:33 +02:00
Max Kotliar
f1a88e57cf docs/changelog: fix link to PR
follow up on
1792b6bd9a
2025-12-30 17:38:48 +02:00
Max Kotliar
76176ac1d3 app/vmauth: increase concurency limit reached before waiting in queue
Follow up on
c9596a0364 (r173413964)

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
2025-12-30 17:23:10 +02:00
Max Kotliar
c08adb31bb docs: remove available from placeholder from code block
The {{% available_from "#" %}} placeholder does not work inside code
blocks. Replacing it with hard coded value.

Introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10168.

See comment
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10168/files#r2651440620
for more details.
2025-12-30 16:10:55 +02:00
Artem Fetishev
b49b0471ef lib/storage: move legacy code to legacy files (#10215)
Follow-up for f97f627 (#8134)

The code was moved as is, no changes were made to moved code.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-30 13:16:24 +01:00
Artem Fetishev
13102045a7 changelog: update v1.132.0 release notes with a note on ungraceful shutdown
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-30 10:29:24 +01:00
Artem Fetishev
d226e5b95f lib/ingestserver: Actually close the first vminsert connection (#10224)
Since the first connection is not closed, the vmstorage will never
terminate gracefully which will cause the reset of all caches on the
start-up.

Follow-up for 244769a00d (#10136)

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-29 15:13:30 +01:00
Hui Wang
30bbb5660b docs: clarify recording rule labels do not support templating (#10186)
fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10183
2025-12-29 15:29:45 +02:00
Max Kotliar
1792b6bd9a docs/changelog: Add PR\issue links, fix typo in tip section 2025-12-29 12:58:07 +02:00
Artem Fetishev
f97f627f79 lib/storage: implement partition index (#8134)
This should reduce disk space occupied by indexDBs as they get deleted along
with the corresponding partitions once those partitions become outside the
retention window.

- Motivation: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7599
- What to expect: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/8134

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Andrei Baidarov <baidarov@nebius.com>
2025-12-24 18:53:49 +01:00
Phuong Le
785c1fd053 issues/question-template: fix typos (#947) 2025-12-24 11:37:34 +01:00
Aliaksandr Valialkin
697bfd5cee app/vmauth: properly verify whether the request has been canceled by the client in handleConcurrecnyLimitError()
The `err` may contain information about request cancelation performed by the server code.
In such cases the error must be logged. The error must be ignored only if the client canceled the request.

This is a follow-up for the commit c9596a0364

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
2025-12-24 11:31:36 +01:00
Artem Fetishev
f0ac6d9ac9 lib/storage: log the beginning and end of saving metric name usage stats to file (#10205)
This is to debug cases when metric name tracker resets the tsid cache
after restart. It could be due vmstorage not having enough time to stop
gracefully. Logs should provide this info.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-23 17:25:43 +01:00
Artem Fetishev
f0b251d967 lib/storage: fix per-idb cache stats (#10204)
This fixes the following corner case: if all instances of a cache have
zero size, the stats won't be set at all. This results in some weird
graphs if the cache is reset very often (such as tfssCache): the cache
sizeMaxBytes alternates between the actual value and zero.

Follow-up for f62893c151

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-23 17:06:10 +01:00
Nikolay
c3346ae8fd app/victoria-metrics: properly add prometheus metrics metadata (#10192)
Commit 5a587f2006 was not properly ported
to the single node branch. Since single node is able to perform both
promscrape and self-scrape, it's required to add metadata add methods to
those paths.

 This commit fixes missing metadata add to the storage.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10175
2025-12-23 13:57:19 +01:00
Jinlin
0ffb3fdfce lib/storage: fix log typo 2025-12-23 13:50:40 +01:00
Zakhar Bessarab
4e234ccbd1 docs/enterprise: add description of license key update (#10194)
Describe Your Changes:

- describe options of updating the enterprise license key
- fix a few typos
2025-12-23 13:37:36 +01:00
Alexander Frolov
943589ca31 lib/promscrape: fix isAutoMetric to recognize all auto-generated metrics
Previously, `scrape_labels_limit` was missing from the check.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10197
2025-12-23 13:36:50 +01:00
Aliaksandr Valialkin
c9596a0364 app/vmauth: add -maxQueueDuration command-line flag for graceful handling of short spikes in the number of concurrent requests
Previously a short spike in the number of concurrent requests immediately led to `429 Too Many Requests` errors
when the number of concurrent requests exceeds -maxConcurrentRequests or -maxConcurrentPerUserRequests.

This commit allows processing short spikes in the number of concurrent requests during the -maxQueueDuration timeout.
The requests are rejected only if they couldn't be served accroding to the concurrency limits during the -maxQueueDuration.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10112
2025-12-22 16:39:01 +01:00
Aliaksandr Valialkin
e7b0a00493 app/vmauth: follow-up for the commit 7f689df824
- Introduce backendURLs struct, which holds all the backend urls and allows stopping
  all the health checkers across all the backend urls with a single call to backendURLs.stopHealthChecks().

- Immediately cancel the pending Dial call to the backend when backendURLs.stopHealthChecks() is called.
  Use lib/netutil.Dialer.DialContext() for this.

- Replace a fragile closing of stopHealthCheckCh channel via stopHealthCheckOnce.Do()
  with easier to maintain call of cancel() func for the corresponding healthChecksContext.

- Wait until health checker goroutines are finished before return from UserInfo.stopHealthChecks().
  Previously the health checker goroutines could run for some time trying to dial the backend
  after the return from UserInfo.stopHealthChecks().

- Try dialing the broken backend for https urls. It is better if the broken backend logs the error
  instead of routing client requests to the broken backend.

- Log dial errors to the broken backend, so users could troubleshoot the backend connectivity issue with more details.

- Refer the correct issue - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997 -
  in the comments explaining why periodic dialing of the broken backend is needed.
  Previously the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890 was incorrectly referred.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10147
2025-12-22 15:20:51 +01:00
Hui Wang
be0fe546e5 vmauth: skip a redundant request if all backends are broken with least_loaded policy (#10202)
similar to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10170
2025-12-22 13:06:12 +01:00
Hui Wang
13911db316 vmauth: add new counters to track the number of user request errors
follow up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10177

Add `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` which track
the number of user request errors, and aligned with
`vmauth_user_requests_total`.

The existing `vmauth_http_request_errors_total` currently only counts
requests with `invalid_auth_token`. Once authorization has passed, any
subsequent request errors are tracked under
`xxx_user_request_backend_requests_total`.
2025-12-22 13:05:54 +01:00
Artem Fetishev
0cb90f91fc lib/storage: follow-up for d9c07dbc0b (#10169) - fix changelog
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-19 08:44:10 +01:00
Alexander Frolov
bdf65dde88 app/vmagent: make sure vmagent_rows_inserted_total counts samples (#10191)
As vminsert does

4d9b69b5a6/app/vminsert/newrelic/request_handler.go (L68)

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10191
2025-12-18 16:37:37 +01:00
Max Kotliar
4d9b69b5a6 docs/changelog: add known issue note related to memory leak on OpenTelemetry parsing code. 2025-12-18 12:39:12 +02:00
Nikolay
692a9be5fa lib/storage: check indexDB refCount at MustClose
In order to gracefully stop indexDB, refCount must be checked during
storage graceful shutdown.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10063
2025-12-17 18:48:53 +01:00
Kirill Kobylyanskiy
c8742ab120 lib/promscrape: add global sampleLimit support
This commit introduces the global `sampleLimit` setting to restrict the number
of samples accepted per scrape target, mirroring the behavior of
Prometheus.

Motivation:
1) The existing `-promscrape.seriesLimitPerTarget` flag currently takes
precedence over any `sample_limit` setting defined directly on the
scrape target. The new `sampleLimit` implementation ensures that the
target configuration is able to override the global setting, allowing
users to define specific limits per target.
2) The existing series limit flag uses memory-intensive Bloom filters,
resulting in high RAM consumption under high-cardinality scraping
scenarios. The `sampleLimit` provides a much simpler, low-overhead
alternative.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10145
2025-12-17 18:47:05 +01:00
Aliaksandr Valialkin
b6f8128273 Makefile: update golangci-lint from v2.4.0 to v2.7.2
See https://github.com/golangci/golangci-lint/releases/tag/v2.7.2
2025-12-17 16:59:02 +01:00
Aliaksandr Valialkin
bed7cbd0a4 all: consistently use encoding.DecompressZSTD* instead of zstd.Decompress* across the codebase
The encoding.DecompressZSTD* consistently updates the vm_zstd_block_decompress_calls_total metric.

Also make the follwing improvements after the commit 10f7cd2ffc:

- Add encoding.DecompressZSTDLimited() function and use it instead of zstd.DecompressLimited,
  so it properly updates vm_zstd_block_decompress_calls_total metric.

- Clarify description for the encoding.DecompressZSTD* and zstd.Decompress* functions.
2025-12-17 16:48:06 +01:00
Artem Fetishev
d9c07dbc0b lib/storage: rotate dateMetricIDCache instead of resetting (#10169)
Currently, `dateMetricIDCache` is reset when it is full and it is never
reset is not full but the data it stores is no longer needed. This leads
to the following problems:
- During regular data ingestion the cache sizeBytes may exceed max
allowed size and the cache gets reset which may potentially slow down
data ingestion (see #10064)
- The cache is per-indexDB. This means that in partition index (#8134)
there will be as many instances of this cache as the number of
partitions. If someone performs a backfill across all partitions, this
will fill all caches and they will never get reset even if no more
historical data is ingested.

So the solution is to periodically rotate the cache. After first
rotation the data is not deleted but moved to `prev` storage. After
second rotation `prev` gets deleted. This gives the cache an opportunity
to restore the `prev` data if it is still in use. Based on #10167.

This PR also removes the introduced recently introduced
`-storage.cacheSizeIndexDBDateMetricID` flag (see #10135). This should
be safe since it is new and its use case is very niche, i.e. no one
would really use it.

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-17 15:43:05 +01:00
Artem Fetishev
20ad9cd395 lib/storage: introduce metricIDCache
The cache serves the same purpose as `dateMetricIDCache` but is used for
caching metricIDs from global index.
The cache was introduces in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10167 and it has been decided to add it in a separate commit to reduce diff.

Related  PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10167
2025-12-17 13:31:11 +01:00
Hui Wang
8b3fe9cdec app/vmauth: add new counters to track the number of requests sent to backends
We have `vmauth_user_requests_total` and
`vmauth_unauthorized_user_requests_total` to track requests from the
user side. However, in scenarios such as request timeouts or when the
response code matches `retry_status_code`, a single request may be
retried across multiple backends.

Exposing counters `vmauth_user_request_backend_requests_total` and
`vmauth_unauthorized_user_request_backend_requests_total` that track the
number of requests sent to backends provides insight into the routing
logic and can help identify if requests are being consistently retried,
which may contribute to increased request duration.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10171
2025-12-17 13:27:08 +01:00
Hui Wang
e1e367b3cb app/vmauth: properly increment metric xxx_user_request_backend_errors_total
Currently, backendErrors may be counted twice if a request to the
backend fails due to context.DeadlineExceeded.

9bc7a17d80/app/vmauth/main.go (L328)

9bc7a17d80/app/vmauth/main.go (L294)

And we increment this counter in a way that is somewhat inconsistent.
Given that the counter's name is `xx_request_backend_errors_total`, it
should only increase when a backend request returns an error. This value
can exceed the user request error count if multiple backend requests
fail for a single user request.
The `xxx_request_backend_errors_total` counter should be used in
conjunction with the `xxx_request_backend_requests_total` introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10171.
2025-12-17 13:24:26 +01:00
Hui Wang
f40c6fcad1 app/vmauth: skip a redundant request if all backends are broken with first_available policy
There is no reason to send a request to the first backend if all
backends are marked as broken.
Also, 
>// getFirstAvailableBackendURL returns the first available backendURL,
which isn't broken.


The fix only skips a redundant request when all backends are
unavailable, it doesn't introduce any changes from user's perspective,
so I skipped changelog.
2025-12-17 13:22:37 +01:00
Aliaksandr Valialkin
b6bc186013 docs/victoriametrics/Articles.md: add https://developer-friendly.blog/blog/2024/06/17/unlocking-the-power-of-victoriametrics-a-prometheus-alternative/ 2025-12-16 15:46:23 +01:00
Aliaksandr Valialkin
9bc7a17d80 lib/protoparser/opentelemetry: typo fix: wince -> since
This is a follow-up for the commit 293d80910c
2025-12-15 20:13:45 +01:00
f41gh7
9ce548dcb5 docs: update release version to latest 2025-12-15 10:37:35 +01:00
f41gh7
82e583338d docs: update LTS releases 2025-12-15 10:34:43 +01:00
Aliaksandr Valialkin
19009836c7 vendor: update github.com/valyala/fastjson from v1.6.5 to v1.6.7 2025-12-14 23:09:43 +01:00
Max Kotliar
c2362ab670 docs: review links in changelogs 2025-12-12 19:43:15 +02:00
f41gh7
d04a42e846 make vmui-update 2025-12-12 12:50:13 +01:00
f41gh7
0d930dda16 CHANGELOG.md: cut v1.132.0 release 2025-12-12 12:45:34 +01:00
Artem Fetishev
e026215701 lib/storage: Document post-delete cache resets (#10158)
When the time series deletion is performed some of the storage caches
need to be reset but some not. This PR reviews all storage caches and
documents why there are reset or not and also places all the resetting
logic (and comments) in one place.
2025-12-12 11:11:30 +01:00
JAYICE
34a542c324 lib/storage: include last sample when query at the last millisecond of the day
One millisecond shouldn't be subtracted from the `tr.MaxTimestamp`, and
related test cases will be added

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9804
2025-12-12 11:01:06 +01:00
Fred Navruzov
ff0aaa38b7 docs/vmanomaly: release v1.28.2 (#10160)
### Describe Your Changes

Update docs and assets (visualizations) for /anomaly-detection section
with `v1.28.2` release

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-12-11 20:56:59 +02:00
Max Kotliar
0e2f0ac95f lib/protoparser/opentelemetry: fix typo in code
#
github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/pb
lib/protoparser/opentelemetry/pb/pb.go:1683:19: undefined: lctx

Bug introduced in
1dc71212f8
2025-12-11 18:37:04 +02:00
Max Kotliar
7f689df824 app/vmauth: validate backend with a dial check before marking it healthy (#10147)
### Describe Your Changes

Previously, a backend was considered healthy as soon as its
'bu.brokenDeadline' deadline expired, even if it was still unavailable.
This caused avoidable request failures and retries.

Now vmauth performs a TCP dial (1s timeout) before restoring the backend
to the healthy
pool. This avoids routing traffic to backends that are still down.

The dial check also covers cases where a route to the backend cannot be
resolved. Without this check, user requests would hang until the
connection timeout, leading to long waits
or errors. The new check fails fast and doesn't impact real user
requests.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997


### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-12-11 18:26:59 +02:00
Max Kotliar
bd725bdd69 dashboards: add usseful links to dashboards
Dashboards:

- Add a link to proper docs section
- Add a link to troubleshooting page
- Add links to community and enterprise support

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9904
2025-12-11 18:11:07 +02:00
Aliaksandr Valialkin
712b7cfeeb lib/promscrape: allow scraping targets with responses equal to c.maxScrapeSize
Return "too big response size" error only for responses bigger than c.maxScrapeSize
(this option can be set either via max_scrape_size option inside scrape config
or via -promscrape.maxScrapeSize command-line flag).

Previously responses with sizes equal to c.maxScrapeSize were incorrectly rejected.
2025-12-11 16:15:46 +01:00
Aliaksandr Valialkin
1dc71212f8 lib/protoparser/opentelemetry/pb: reset the decoderContext.ls.Labels length to zero after clearing all the references to the original byte slice
This is a follow-up for 25f49e6f54
2025-12-11 15:39:32 +01:00
Aliaksandr Valialkin
25f49e6f54 lib/protoparser/opentelemetry: explicitly clear all the references to the underlying byte slice at decoderContext.ls.Labels up to its capacity
This should prevent from the excess memory usage because of dangling source byte slices
referred by decoderContext.ls.Labels.

This is a change similar to 63a68edb05
2025-12-11 15:33:13 +01:00
Max Kotliar
dcf9f0eb7b lib/promscrape: Add a warning to active targets panel if -dropOriginalLabels=true (some debug info not available)
Previously the original labels were preserved (
-dropOriginalLabels=false). In
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9772 the default
behavior was changed. Now vmagent\vmsingle drops origianal labels. The
change
created some confusion related to UI. For example, debug relabling
column is completly hidden when the labels are not available. It created
a steram of questions.

This commit adds a warning similar to one we have at "Discovered
targets" tab, and also always show the "Debug relabeling" column. When
there is not info for it "N/A" printed.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9901
Follow-up https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9772
2025-12-11 16:24:50 +02:00
Aliaksandr Valialkin
606382178b lib/protoparser/protoparserutil: do not store too big buffers to the pool at ReadUncompressedData if only a small part of the buffer is used last time
This should prevent from excess memory usage because of inefficiently used buffers.

This should help the case at https://github.com/VictoriaMetrics/VictoriaLogs/issues/869
2025-12-11 15:11:44 +01:00
Artem Fetishev
220249f023 lib/storage: use lrucache to implement tagFilters loops cache
The tagFilters loop cache is per-indexDB which means that currently
there are two instances, one for idbCurr and one for idbPrev. When the
partition index (#8134) is released, there will be as many instances of
this cache as there will be partitions.

The cache is implemented using workingsetcache. Which occupies at least
30MB even when unused. Given that only the latest indexDB is used most
of the time, a lot of memory can be wasted.

Therefore the cache implementation is changed to lrucache because it
does not consume memory when it is unused and also has timeout-based
eviction.

This is a follow-up for 4cd727a511
(https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10072).
2025-12-11 08:42:58 +01:00
Max Kotliar
c6731f964c dashboards: add memory usage breakdown panels into Drilldown sections
Right now we have two separate panels: RSS memory % usage and RSS
anonymous memory % usage. This makes trend comparison difficult because
one have to visually correlate two independent panels. Another problem
is that these panels don't show Go runtime allocations at all. The same
applies to memory allocated in C. There are allocations in C (zstd) one
should account for but there is no even a metric to expose it.

The commit adds Memory usage breakdown panel into Drilldown section. It
provides insight into Go Stack, Go Heap, Go Heap Released, Go Other,
Mmap: VM Cache, File cache memory distribution

It should help spot trends changes in memory by type or invistigate
issues such as #10069 and #10028 easier.

Panel info:
This panel shows memory usage by category.

How to use:
- Start from the high-level RSS panel.
- Identify an instance with unexpected or abnormal memory growth.
- Filter to that instance to inspect the detailed breakdown here.

Interpretation
- A steadily rising Go Heap usually indicates a memory leak. Collect
pprof memory profile.
- A growing Go Stack commonly points to a goroutine leak.

<img width="1508" height="628" alt="Screenshot 2025-12-08 at 13 18 44"
src="https://github.com/user-attachments/assets/0e794324-e86d-468e-b926-8bb11f5a2043"
/>
<img width="1503" height="674" alt="Screenshot 2025-12-08 at 13 19 34"
src="https://github.com/user-attachments/assets/62fc3fff-33b3-4dfe-ad3f-ad0526a8a606"
/>

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10139
2025-12-11 08:39:00 +01:00
Sinotov Vladimir
859435a8df lib/protoparser: added push data with zabbix connector (#6087)
Support receiving data from the Zabbix connector with API `/zabbixconnector/api/v1/history`

Labels:
    - The metric name is added to the `__name__` label.
    - Host name to `host` label.
    - Visible name  to `hostname` label.

The returned response complies with the requirements of the Zabbix

 See the following doc for connector [protocol](https://www.zabbix.com/documentation/current/en/manual/config/export/streaming).

Useful links:
- Zabbix Streaming to external systems
(https://www.zabbix.com/documentation/current/en/manual/config/export/streaming)
- Zabbix Newline-delimited JSON expor
(https://www.zabbix.com/documentation/current/en/manual/appendix/protocols/real_time_export)

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6087
2025-12-10 17:00:27 +01:00
Max Kotliar
5b12fd35d7 app/vminsert: improve slowness-based rerouting logic
Adjust slowness-based rerouting logic.

Rerouting now occurs only from the slowest node, and only if the cluster
as a whole has enough available capacity to handle the additional load.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9890
2025-12-10 16:25:44 +01:00
Aliaksandr Valialkin
293d80910c lib/protoparser/opentelemetry: eliminate memory allocations during parsing of samples send via OpenTelemetry protocol
This increases the parser performance by 4x-6x.

This commit uses the technique similar to https://github.com/VictoriaMetrics/VictoriaLogs/pull/720

goos: linux
goarch: amd64
pkg: github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/stream
cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics
                                                    │   old.txt    │               new.txt               │
                                                    │    sec/op    │   sec/op     vs base                │
ParseStream/default-metrics-labels-formatting-16      15.565µ ± 1%   2.150µ ± 3%  -86.19% (p=0.000 n=10)
ParseStream/prometheus-metrics-labels-formatting-16   24.228µ ± 2%   4.355µ ± 1%  -82.02% (p=0.000 n=10)
ParseStream/prometheus-metrics-formatting-16          23.028µ ± 2%   3.395µ ± 1%  -85.26% (p=0.000 n=10)
geomean                                                20.55µ        3.168µ       -84.59%

                                                    │   old.txt    │                new.txt                 │
                                                    │     B/s      │      B/s       vs base                 │
ParseStream/default-metrics-labels-formatting-16      127.9Mi ± 1%    918.3Mi ± 3%  +617.82% (p=0.000 n=10)
ParseStream/prometheus-metrics-labels-formatting-16   82.19Mi ± 2%   453.32Mi ± 1%  +451.57% (p=0.000 n=10)
ParseStream/prometheus-metrics-formatting-16          86.47Mi ± 2%   581.56Mi ± 1%  +572.52% (p=0.000 n=10)
geomean                                               96.88Mi         623.3Mi       +543.34%

                                                    │   old.txt    │                 new.txt                  │
                                                    │     B/op     │    B/op      vs base                     │
ParseStream/default-metrics-labels-formatting-16      12.53Ki ± 0%   0.00Ki ± 0%  -100.00% (p=0.000 n=10)
ParseStream/prometheus-metrics-labels-formatting-16   21.15Ki ± 1%   0.00Ki ±  ?  -100.00% (p=0.000 n=10)
ParseStream/prometheus-metrics-formatting-16          20.74Ki ± 1%   0.00Ki ±  ?  -100.00% (p=0.000 n=10)
geomean                                               17.65Ki                     ?                       ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                                                    │  old.txt   │                new.txt                 │
                                                    │ allocs/op  │ allocs/op  vs base                     │
ParseStream/default-metrics-labels-formatting-16      426.0 ± 0%    0.0 ± 0%  -100.00% (p=0.000 n=10)
ParseStream/prometheus-metrics-labels-formatting-16   514.0 ± 0%    0.0 ± 0%  -100.00% (p=0.000 n=10)
ParseStream/prometheus-metrics-formatting-16          514.0 ± 0%    0.0 ± 0%  -100.00% (p=0.000 n=10)
geomean                                               482.8                   ?                       ¹ ²
2025-12-10 16:11:59 +01:00
Artem Fetishev
bc4d98b358 app/vmstorage: properly name dateMetricIDCache metrics
The following dmc metrics were given standard names, i.e.:

- vm_date_metric_id_cache_resets_total became
vm_cache_resets_total{type="indexdb/date_metricID"}
- vm_date_metric_id_cache_syncs_total became
vm_cache_syncs_total{type="indexdb/date_metricID"}

This change should be safe since these metrics are currently not used in
VictoriaMetrics Gragana dashboards.

Additionally, other cache metrics were organized within the code so that
each metric has the same order.
2025-12-10 14:57:52 +01:00
Alexander Frolov
ad153f72ef lib/storage: utilize persisted hourMetricIDs cache to avoid redundant indexDB lookups after vmstorage restart
This commit optimizes the performance of the storage by improving the utilization of persisted hourMetricIDs cache to avoid redundant indexDB lookups after vmstorage restart. The change refactors the hour-based cache checking logic using a switch statement to handle multiple hour scenarios more efficiently.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10114
2025-12-10 14:56:05 +01:00
Vadim Rutkovsky
f2578a9764 docs/victoriametrics: update LTS-releases.md (#10153)
### Describe Your Changes

Doc update to mention fresh patch releases - 1.122.10 and 1.110.25

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-12-10 15:11:26 +02:00
Aliaksandr Valialkin
d5e19717b7 Makefile: use the correct -trim_path at pprof-cpu
It shouldn't end with @.

The `PPROF_FILE=/path/to/cpu.pprof make pprof-cpu` is good for investigating profiles received from production builds.
2025-12-10 13:41:10 +01:00
Max Kotliar
5c40328e5f docs: mention Grafana panel that can help with swap related issues 2025-12-10 14:08:43 +02:00
Yury Moladau
1117437456 app/vmui: improve legend auto-collapse threshold, warning and toggle (#10140)
### Describe Your Changes

This PR improves the legend auto-collapse behavior in vmui:
- Increase the legend auto-collapse threshold from `20` to `100` series.
- Add a warning message when the legend is collapsed by default, showing
the actual series count.
- Add a user setting to disable automatic legend collapsing (enabled by
default).

Related issue: #10075

<img width="352" alt="image"
src="https://github.com/user-attachments/assets/22ee2ef9-6369-47a8-87a1-c63a0e17fccd"
/>
<img width="1618" height="197" alt="image"
src="https://github.com/user-attachments/assets/791eb9b6-4397-476d-ad44-5152e50d1975"
/>


### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
2025-12-10 13:59:16 +02:00
Aliaksandr Valialkin
094a7cf3f9 lib/protoparser/opentelemetry/stream: benchmark cases when prometheus-compatible naming for metrics and labels is enabled 2025-12-10 11:50:46 +01:00
Artem Fetishev
538e489497 docs: Update cache tuning section (#10149)
- Remove mentions of `Caches` section in Grafana dashobards since this section does not exist anymore.
- Rewrite a bit the description of cache panels in Troubleshooting section.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-12-10 11:42:33 +01:00
Aliaksandr Valialkin
744aa3fe9f lib/protoparser/opentelemetry/stream: make the BenchmarkParseStream closer to real production cases
- Add more metrics to the protobuf to parse.
- Measure scan speed of the original protobuf at bytes/sec. Previously the number of ParseStream() calls per second was measured.
2025-12-10 11:21:57 +01:00
Aliaksandr Valialkin
44a3885f97 lib/protoparser/opentelemetry/stream: avoid memory allocations for bytes.NewBuffer() on every iteration of BenchmarkParseStream
Re-use benchReader for reading the same data on every iteration of BenchmarkParseStream.
2025-12-10 11:10:39 +01:00
Aliaksandr Valialkin
f43264f9f2 lib/ioutil: add missing package after the commit 2da010495c 2025-12-10 11:07:15 +01:00
Aliaksandr Valialkin
e07bc7a74e lib/prompb: move all the code related to WriteRequestUnmarshaler to a separate file - write_request_unmarshaler.go
This should improve code maintenance a bit.

This is a follow-up for the commit b98e592752
2025-12-10 10:45:59 +01:00
Aliaksandr Valialkin
d1680063f5 lib/prompb: rename MetricMetadataType to MetricType
Also rename MetricMetadata* constants to MetricType* constants.

This makes the code a bit more readable.

This is a follow-up for the commit 25cd5637bc

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9306
2025-12-10 01:18:44 +01:00
Aliaksandr Valialkin
2da010495c all: pool io.LimitedReader in order to save a memory allocation and reduce CPU usage a bit 2025-12-10 01:18:43 +01:00
Artem Fetishev
7c78f95f2e docs: Update flags (#10148)
Follow-up for dc5d7aa4ce
(https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10135)

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-09 17:48:40 +01:00
Kirill Yurkov
5bd67c5f49 docs: recommend disabling swap (#10113)
add swap disable commands in install recommendations to prevent
performance issues

---------

Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-12-09 15:50:13 +02:00
Max Kotliar
c618f471ca apptest: make results order stable in test special query regression
Sometimes test fails with error:

--- FAIL: TestClusterSpecialQueryRegression (15.57s)
special_query_regression_test.go:76: unexpected /api/v1/export
response (-want, +got):
          &apptest.PrometheusAPIV1QueryResponse{
          	... // 1 ignored field
          	Data: &apptest.QueryData{
          		... // 1 ignored field
          		Result: []*apptest.QueryResult{
          			&{
          				Metric: map[string]string{
          					"__name__":
"prometheus.sensitiveRegex",
- 					"label":
"SensitiveRegex",
+ 					"label":
"sensitiveRegex",
          				},
          				Sample:  nil,
          				Samples: {&{Timestamp:
1707123456700, Value: 10}},
          			},
          			&{
          				Metric: map[string]string{
          					"__name__":
"prometheus.sensitiveRegex",
- 					"label":
"sensitiveRegex",
+ 					"label":
"SensitiveRegex",
          				},
          				Sample:  nil,
          				Samples: {&{Timestamp:
1707123456700, Value: 10}},
          			},
          		},
          	},
          	ErrorType: "",
          	Error:     "",
          	IsPartial: false,
          }

FAIL
FAIL	github.com/VictoriaMetrics/VictoriaMetrics/apptest/tests
	18.676s
FAIL
2025-12-09 15:20:01 +02:00
Artem Fetishev
f62893c151 lib/storage: report per-idb cache stats only once
`tagFiltersCache` and `dateMeticIDCache` are now per-indexDB. Currently
we have 2 instance of indexDBs (prev and curr) and therefore 2 instances
of each cache.

When the storage stats is collected, the stats of individual caches is
added together. For example, is the `sizeMaxBytes` of each
tagFiltersCache is `100MB` and the `sizeBytes` of each instance is
`10MB` and `99MB`, then the resulting stats will be `sizeMaxBytes ==
200MB, sizeBytes == 109MB`.

While this is accurate, this stats hides a potential problem. It says
that the cache utilization is slightly above `50%` (109/200) and
everything seems to be okay. But in reality one of the caches is
utilized by 99% and soon will start evicting existing records to make
room for new ones, potentially slowing down the data retrieval. Ops
won't see it and will not take necessary action.

The solution is to report stats only for one instance of cache whose
utilization is the highest.

Alternatives considered:
- #10123. Might work, but breaks the encapsulation and can potentially
be slower
- Do not aggregate the stats and report is per-indexDB. This increases
the number of metrics and makes it dependent on the number of indexDB
instances (which can be many once #8134 is released).

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/pull/8134
2025-12-09 12:43:50 +01:00
JAYICE
76f5def301 dashboard: fix page fault panel (#10141)
add `[$__rate_interval]` to fix page fault panel introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9977
2025-12-09 12:41:28 +01:00
Artem Fetishev
3be5ed0e32 Revert "lib/storage: after deleting series, reset tsid only once" (#10143)
This reverts commit dbe71700b5.

tsidCache is persistent and must be reset before deletedMetricID records
are added to the index. THis is needed to handle ungraceful shutdowns
properly.
2025-12-09 10:57:08 +01:00
Aliaksandr Valialkin
4ac40d955b lib/prompb: use MetricMetadataType type for MetricMetadata.Type field
This eliminates the need of manual conversion between MetricMetadataType and uint32 / int32.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974

This is a follow-up for the commit 5a587f2006
2025-12-08 20:31:29 +01:00
Artem Fetishev
dc5d7aa4ce lib/storage: properly report dateMetricIDCache stats
A number of changes to `dateMetricIDCache` stats and configuration:

1. Export `SizeMaxBytes` metric and make the size configurable via a
flag
2. Fix `EntriesCount` and `SizeBytes` stats. Previously the cache
reported this stats for its immutable part only. Whereas there are cases
when the number of entries in its mutable part is comparable with the
number in immutable part. The stats from the mutable part remains
invisible until it is sync'ed to the immutable part. It is also possible
that the cache gets reset after the sync because the cache size exceeds
the max allowed size. Reporting the stats for both mutable and immutable
parts should provide a clear picture of the cache utilization.

Together, SizeBytes and SizeMaxBytes should enable tracking the cache
utilization properly. And take appropriate actions if necessary (such as
adjusting the memory resources and/or cache size limit via a flag).

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10064
2025-12-08 14:17:58 +01:00
JAYICE
244769a00d vmstorage: skip last sleep when closing vminsertSrv connections
After closing last connection to vminsert, vmstorage will still wait for
an interval, causing actual shutdown time will be always longger than
configurations.

This commit just skip the last sleep

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10136
2025-12-08 14:10:17 +01:00
Max Kotliar
8e81d54851 Revert "dashboards: add memory usage breakdown panels into Drilldown sections"
This reverts commit 5117cde8bc.
2025-12-08 13:42:10 +02:00
Max Kotliar
5117cde8bc dashboards: add memory usage breakdown panels into Drilldown sections
Right now we have two separate panels: RSS memory % usage and RSS
anonymous memory % usage. This makes trend comparison difficult because
one have to visually correlate two independent panels. Another problem
is that these panels don't show Go runtime allocations at all. The same
applies to memory allocated in C. There are allocations in C (zstd) one
should account for but there is no even a metric to expose it.

The commit adds Memory usage breakdown panel into Drilldown section. It
provides insight into Go Stack, Go Heap, Go Heap Released, Go Other,
Mmap: VM Cache, File cache memory distribution

It should help spot trends changes in memory by type or invistigate
issues such as #10069 and #10028 easier.

Panel info:
This panel shows memory usage by category.

How to use:
- Start from the high-level RSS panel.
- Identify an instance with unexpected or abnormal memory growth.
- Filter to that instance to inspect the detailed breakdown here.

Interpretation
- A steadily rising Go Heap usually indicates a memory leak. Collect
pprof memory profile.
- A growing Go Stack commonly points to a goroutine leak.
2025-12-08 13:39:34 +02:00
Artem Fetishev
85367cae38 Idb blockcache metrics unittest (#10050)
indexDB has 3 block caches. These caches export metrics. Storage
collects these
metrics for each indexDB it has (currently prev and curr only).

There is a potential problem:
- These caches are shared by all indexDBs
- Each indexDB reports the block cache metrics.
- Storage collects the metrics of all indexDBs by adding them together.

I.e. it is possible to count block cache metrics several times.
It is not the case in current implementation because the addition of the
metrics
is not performed intentionally.

The added unit test 1) demonstrates that the resulting counts are
reported
correctly and 2) protects from future unintentional changes in this
behavior.

Additionally a code comment is added to explain why block cache metrics
are not summed up.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-06 18:14:52 +01:00
Aliaksandr Valialkin
159b71cabb lib/protoparser/influx: properly clean references to underlying byte slices from tagsPool and fieldsPool inside unmarshalContext
This should prevent from memory leaks when unmarshalContext fields point to unused byte slices.
2025-12-06 11:52:57 +01:00
Aliaksandr Valialkin
78b8c773ae docs/victoriametrics/: remove misleading statement about extending ext4 partition to 16TB+
It is enough to recommend the given format options for disks with 1TB+ sizes
2025-12-05 23:00:47 +01:00
Nikolay
aab92d3c0f protoparser/influx: reduce memory allocation (#10109)
Previously, influx parser allocated a new slice byte for
unescape of Row fields. It adds extra pressure at GC and increases CPU
usage.

 This commit changes escape to in-place updates for provided []byte.
Since request for parsing is actually a []byte converted into the
string, it's safe to update it in-place. To be able to interact with
[]byte directly, this commit changes parser API and accepts []byte
instead of string.

Benchstat:
```
                                 │   before    │                after                │
                                 │   sec/op    │   sec/op     vs base                │
RowsUnmarshalUnescape-10           74.68n ± 4%   54.23n ± 5%  -27.38% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10   40.41n ± 2%   42.59n ± 1%   +5.39% (p=0.000 n=10)
geomean                            54.93n        48.06n       -12.51%

                                 │    before    │                after                 │
                                 │     B/s      │     B/s       vs base                │
RowsUnmarshalUnescape-10           1.035Gi ± 4%   1.425Gi ± 5%  +37.72% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10   1.613Gi ± 2%   1.531Gi ± 1%   -5.11% (p=0.000 n=10)
geomean                            1.292Gi        1.477Gi       +14.32%

                                 │   before    │                after                 │
                                 │    B/op     │    B/op     vs base                  │
RowsUnmarshalUnescape-10           149.00 ± 0%   96.00 ± 0%  -35.57% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10    80.00 ± 0%   80.00 ± 0%        ~ (p=1.000 n=10) ¹
geomean                             109.2        87.64       -19.73%
¹ all samples are equal

                                 │   before   │                after                 │
                                 │ allocs/op  │ allocs/op   vs base                  │
RowsUnmarshalUnescape-10           5.000 ± 0%   1.000 ± 0%  -80.00% (p=0.000 n=10)
RowsUnmarshalUnescapeNoEscape-10   1.000 ± 0%   1.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                            2.236        1.000       -55.28%
```

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10053

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2025-12-05 18:18:07 +01:00
Nikolay
2bef26288e lib/memory: add validation for remaining system memory
Previously, if user defined value for `memory.allowedBytes` flag
exceeded system memory limit, remaining memory could take negative
value. It results into incorrect memory auto-detect calculations for
various components. Such as vmstorage unique timeseries limit and parts
size.

 This commit adds negative value check. And also logs system memory
limit at start-up of vm components.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10083
2025-12-05 18:14:04 +01:00
Hui Wang
c14dbad33b vmselect: disable rollup result cache for instant queries that contain rate function
Previously, in order to cache results for `rate`, we consider
`rate(m[d])` as `(increase(m[d]) / d)` and cache the `increase` result.
However, in MetricsQL, `rate(d) = (lastValue - firstValue) /
(lastTimestamp - firstTimestamp)`, so it does not equal to
`increase(d)/d` if `d != (lastTimestamp - firstTimestamp)`.
Although the issue primarily arises when the time series samples are not
continuous, but the discrepancy is hard to debug and can be confusing to
users. Because the range query doesn't use this optimization, causing
recording rule results to
differ from raw query results in VMUI. 
Therefore, it is better to disable the usage and only enable it when we
can cache it correctly.

fixes https://github.com/VictoriaMetrics/victoriaMetrics/issues/10098
2025-12-05 17:38:32 +01:00
Artem Fetishev
dbe71700b5 lib/storage: after deleting series, reset tsid only once
As indexDBs became independent from each other, the tsidCache is now
reset more than once when the DeleteSeries() operation is performed. But
it needs to be performed only once. Thus, move the deletion from indexDB
to Storage.

Follow-up for 16d75ab0bd.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10119
2025-12-05 17:38:02 +01:00
Hui Wang
d4fa326659 vmselect: reset rollup result cache with -search.disableCache when necessary
There’s no need to call `c.Reset()` for rollup result cache if it’s not
persisted(`-cacheDataPath` not specified) or has already been cleared by
`-search.resetRollupResultCacheOnStartup`, as it is already newly
created.


Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10095
2025-12-05 17:37:30 +01:00
Andrei Baidarov
040ef931d1 vmalert: do not increment errors counter on cancel context errors
Follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10027

`vmalert_alerting_rules_errors_total` increments on any error


445f30a4a6/app/vmalert/rule/alerting.go (L455-L460)

while `vmalert_execution_errors_total` only on non-cancellation ones


445f30a4a6/app/vmalert/rule/group.go (L747-L756)

This commit ignores cancellation errors in
`vmalert_alerting_rules_errors_total` too
2025-12-05 17:36:42 +01:00
JAYICE
474009a7f1 dashboard: add page faults panel for vmsingle&vmcluster (#9977)
### Describe Your Changes
add page fault panel in `Troubleshooting`section for vmcluster and
vmsingle. fix
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9974

The query
```
sum(rate(process_minor_pagefaults_total{job=~"$job", instance=~"$instance"})) by (job,instance)

sum(rate(process_major_pagefaults_total{job=~"$job", instance=~"$instance"})) by (job,instance)
```

<img width="1088" height="306" alt="image"
src="https://github.com/user-attachments/assets/4b4ac884-5372-4141-a429-ac0b296dc926"
/>
2025-12-05 18:04:44 +02:00
Nikolay
1b1442d91b app/vmgateway: properly handle proxy request errors
Previously vmgateway didn't handle http.Abort error.
It could lead to the unexpected panic at webserver.

This commit adds panic recover and prevent app from crash.
2025-12-05 16:32:50 +01:00
Aliaksandr Valialkin
3e359dc920 lib/protoparser/influx: remove IgnoreErrors field from Rows and replace it with the explicit skipInvalidLines arg at Rows.Unmarshal()
This improves the maintainability of the code, since the caller of Rows.Unmarshal() always knows
whether invalid lines must be skipped.

While at it, add missing error checks returned from Rows.Unmarshal().

This is a follow-up for the commit daa7183749

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7090
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7165
2025-12-05 16:24:54 +01:00
Hui Wang
e41f642a59 add flag description for -selectNode (#10022) 2025-12-05 14:53:06 +02:00
Artem Fetishev
7a2cc7fbad lib/storage: use deadline instead is.deadline
This makes SearchTSIDs() consistent with SearchMetricNames().

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-12-05 02:08:14 +01:00
Aliaksandr Valialkin
a7b99dd164 vendor: update github.com/VictoriaMetrics/easyproto from v0.1.4 to v1.0.0 2025-12-04 21:47:20 +01:00
Andrii Chubatiuk
647f107576 vmui: always add /prometheus prefix while generating backend url 2025-12-04 18:09:47 +02:00
dependabot[bot]
04f8296c85 build(deps-dev): bump js-yaml from 4.1.0 to 4.1.1 in /app/vmui/packages/vmui (#10017)
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md">js-yaml's
changelog</a>.</em></p>
<blockquote>
<h2>[4.1.1] - 2025-11-12</h2>
<h3>Security</h3>
<ul>
<li>Fix prototype pollution issue in yaml merge (&lt;&lt;)
operator.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="cc482e7759"><code>cc482e7</code></a>
4.1.1 released</li>
<li><a
href="50968b862e"><code>50968b8</code></a>
dist rebuild</li>
<li><a
href="d092d86603"><code>d092d86</code></a>
lint fix</li>
<li><a
href="383665ff42"><code>383665f</code></a>
fix prototype pollution in merge (&lt;&lt;)</li>
<li><a
href="0d3ca7a27b"><code>0d3ca7a</code></a>
README.md: HTTP =&gt; HTTPS (<a
href="https://redirect.github.com/nodeca/js-yaml/issues/678">#678</a>)</li>
<li><a
href="49baadd52a"><code>49baadd</code></a>
doc: 'empty' style option for !!null</li>
<li><a
href="ba3460eb9d"><code>ba3460e</code></a>
Fix demo link (<a
href="https://redirect.github.com/nodeca/js-yaml/issues/618">#618</a>)</li>
<li>See full diff in <a
href="https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=js-yaml&package-manager=npm_and_yarn&previous-version=4.1.0&new-version=4.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-04 17:42:17 +02:00
dependabot[bot]
1c3e64e9ad build(deps): bump actions/checkout from 4 to 6 (#10082)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to
6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>v6-beta by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2298">actions/checkout#2298</a></li>
<li>update readme/changelog for v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2311">actions/checkout#2311</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5.0.0...v6.0.0">https://github.com/actions/checkout/compare/v5.0.0...v6.0.0</a></p>
<h2>v6-beta</h2>
<h2>What's Changed</h2>
<p>Updated persist-credentials to store the credentials under
<code>$RUNNER_TEMP</code> instead of directly in the local git
config.</p>
<p>This requires a minimum Actions Runner version of <a
href="https://github.com/actions/runner/releases/tag/v2.329.0">v2.329.0</a>
to access the persisted credentials for <a
href="https://docs.github.com/en/actions/tutorials/use-containerized-services/create-a-docker-container-action">Docker
container action</a> scenarios.</p>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.1">https://github.com/actions/checkout/compare/v4...v4.3.1</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V6.0.0</h2>
<ul>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
</ul>
<h2>V5.0.1</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.1</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1af3b93b68"><code>1af3b93</code></a>
update readme/changelog for v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2311">#2311</a>)</li>
<li><a
href="71cf2267d8"><code>71cf226</code></a>
v6-beta (<a
href="https://redirect.github.com/actions/checkout/issues/2298">#2298</a>)</li>
<li><a
href="069c695914"><code>069c695</code></a>
Persist creds to a separate file (<a
href="https://redirect.github.com/actions/checkout/issues/2286">#2286</a>)</li>
<li><a
href="ff7abcd0c3"><code>ff7abcd</code></a>
Update README to include Node.js 24 support details and requirements (<a
href="https://redirect.github.com/actions/checkout/issues/2248">#2248</a>)</li>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li><a
href="9f265659d3"><code>9f26565</code></a>
Update actions checkout to use node 24 (<a
href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/checkout/compare/v4...v6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4&new-version=6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-04 17:40:42 +02:00
Hui Wang
4212491031 vmalert: clarify templating in alerting rule labels (#10121)
follow up
38dd971f58.

Labels only support limited templating variables in
https://docs.victoriametrics.com/victoriametrics/vmalert/#templating,
including `$labels`, `$value` and `expr`, to avoid breaking alert states
or causing cardinality issue with results.
2025-12-04 17:35:27 +02:00
Zakhar Bessarab
f76bc956ca app/vmctl: respect context cancellation during user prompts
Previously, context cancellation was ignored when reading user response
for the prompt. That leads to ignoring of "Ctrl+C" and other termination
signals to vmctl until user finishes the input.

Fix that by properly propagating the context and respecting the
cancellation of the context.


Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-12-04 15:57:31 +04:00
Aliaksandr Valialkin
655074c3e0 lib/protoparser/opentelemetry/pb: remove code related to parsing logs in OTEL format
This code is no longer needed after the commit 4ffb74448d

See https://github.com/VictoriaMetrics/VictoriaLogs/pull/720
2025-12-04 00:49:54 +01:00
Aliaksandr Valialkin
5e95fdf23e docs/victoriametrics/FAQ.md: add a link to the guide on how to calculate the needed disk space at VictoriaLogs at why indexdb size is so large? chapter
This is a follow-up for 68f670cbc5
2025-12-03 15:51:40 +01:00
Aliaksandr Valialkin
ffcfb74b17 deployment: update Go builder from v1.25.4 to v1.25.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.25.5%20label%3ACherryPickApproved
2025-12-03 15:20:11 +01:00
Max Kotliar
fe803bfc6e Capitalize titles in operator.json
Signed-off-by: d3spair <git@agrshv.dev>
2025-12-03 13:43:39 +02:00
Andrii Chubatiuk
8ee466ab06 dashboard: add panels for operator flags and global params 2025-12-03 13:28:59 +02:00
Sylvain Rabot
6ca48d5025 lib/vmbackup/s3backup: support custom SSE KMS key id and ACL
Add more S3 configurations.

- SSES3KeyID allows to push to a bucket that is another account as the
KMS key it uses to encrypt data server side.
- ACL allows configure which permissions are given to the object
uploaded on the bucket (usefull when bucket policy expect a given
permission such as `bucket-owner-full-control`).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <andrew.chubatiuk@gmail.com>
2025-12-03 10:06:57 +04:00
Fred Navruzov
70eb9d39d5 docs/vmanomaly: release v1.28.1 (#10111)
### Describe Your Changes

Updates of docs and examples to `vmanomaly` v1.28.1

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-12-02 21:08:17 +02:00
Zakhar Bessarab
1985c79a4d deployment: update references to the latest release 2025-12-01 21:16:14 +04:00
Zakhar Bessarab
f0dafacfd3 docs: update references to the latest release 2025-12-01 21:15:13 +04:00
Zakhar Bessarab
6c01f5d50f docs/changelog: backport LTS changelogs 2025-12-01 20:44:43 +04:00
f41gh7
84658e77da docs/changelog: sort changelog entries
Signed-off-by: f41gh7 <nik@victoriametrics.com>
2025-12-01 11:11:54 +01:00
Zakhar Bessarab
4dc32ff1d7 app/vminsert/netstorage: fix list of nodes used for SD
Previously, vminsert was using original list of addrs instead of
discovered addrs. Properly use discovered list of addrs.
2025-12-01 11:11:53 +01:00
Artem Fetishev
08a1b2e75c lib/lrucache: do not reset requests and misses after cache reset
Follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10072.

Do not reset requests and misses metrics since cache reset implies the
reset of the storage only.
2025-12-01 10:12:47 +01:00
Zakhar Bessarab
7e5b68fc1f docs/changelog: cut v1.131.0 2025-11-28 20:20:08 +04:00
Zakhar Bessarab
dcc130603c docs: update availble from tags 2025-11-28 20:13:42 +04:00
Zakhar Bessarab
9842ad2299 app/vmselect: run make vmui-update 2025-11-28 20:01:08 +04:00
Aliaksandr Valialkin
63c0cf673f Makefile: generate quicktemplate output files only at lib and app directories
Previously the output files were incorrectly generated inside unexpeted directories such as vendor
2025-11-28 16:07:22 +01:00
Nowa Ammerlaan
7f51bb4ce7 protoparser/influx: account for excess white spaces before timestamp
Some influx clients ( such as nimon monitoring client) adds excess white spaces in the influx line and does not set a
timestamp. Since Influx protocol requires whitespace before timestamp only when it set, it could present without timestamp. Whitespace before omitted timestamp confuses parser.

This commit adds check for the skipped timestamp and test case for it.

Fixes: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10049
2025-11-28 14:36:35 +01:00
Nikolay
38df52ea08 app/vmselect: improve performance for multi-level requests
Previously, proxy vmselect (aka 1st level vmselect) performed parsing
of MetricBlock received from vmstorage before forwarding it into top vmselect. It required an additional CPU and Memory, which greatly slowed down query requests.

This commit changes lib/vmselectapi iterator API, instead of MetricBlock, it returns encoded MetricBlock as a byte slice.
It allows to save CPU and memory at proxy vmselect by eliminating need of decoding MetricBlock received from storage.

In addition, it adds the following optimizations for proxy vmselect:
* reduces memory allocations by using iterator pool
 * add per storageNode workerItem for iterator

Also, it adds optimization for vmstorage, it no longer performs extra memory copy of MetricName for MetricBlock.

vmselect and vmstorage metrics vm_vmselect_metric_rows_read_total and vm_metric_rows_read_total were removed, it's not used at any dashboards and rules. New Iterator API doesn't support it.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9899
2025-11-28 13:04:55 +01:00
Max Kotliar
023a13435c dashboards: make dashboards-sync 2025-11-27 16:52:45 +02:00
Max Kotliar
1ddcbed6d7 dashboards: Show "Disk space usage % by type" as stacked graph in Cluster dashboard. (#10089)
### Describe Your Changes

VictoriaMetrics - cluster dashboard.

vmstorage -> Disk space usage % by type pane.

Switch panel to 100% stacked view to show space distribution.

The goal is to highlight how space is split between datapoints and
indexdb types; Simple time-series values made this hard to see. A 100%
stacked layout makes the distribution immediately visible.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9932

was: <img width="1201" height="609" alt="Image"
src="https://github.com/user-attachments/assets/1d199e65-5a20-4c63-a251-b7087020f42a"
/>


now: 
<img width="1208" height="608" alt="Screenshot 2025-11-27 at 13 14 51"
src="https://github.com/user-attachments/assets/96aa32f3-1243-486b-bac8-2d3c0f4bdb7a"
/>


### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-27 16:50:15 +02:00
Aliaksandr Valialkin
edd02cdb5b docs/victoriametrics/goals.md: clarify that bugs, which affect a small number of users at rare edge cases, can be fixed later 2025-11-27 14:29:17 +01:00
Artem Fetishev
4cd727a511 lib/storage: use lrucache for tfss cache (#10072)
The purpose of this PR is the same as #10000, except `lrucache` is used
for implementing tfss cache.

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-27 14:18:03 +01:00
Andrii Chubatiuk
19c0477976 chore(app/vmui): conditionally render accordion children (#10068)
### Describe Your Changes

revert change, that was introduced in
483e00ffb9
since rendering of all nested children significantly impacts alerting
tab performance in case of multiple items
@Loori-R @arturminchukov , what do you think about using react-virtuoso
additionally for alerting tab to decrease dom size?

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-27 14:31:34 +02:00
Ben Randall
4fdd8f0906 lib/protoparser/opentelemetry: use separate loggers for unsupported delta temporality/metric type logs (#10021)
A throttled logger will continue to log messages occasionally with a
suffix indicating how many similar logs were throttled. Using the same
logger for multiple log messages can result in certain logs being
entirely suppressed and invisible in the logs. This updates most of the
loggers used in `appendFromScopeMetrics` to be their own logger so that
"unsupported delta temporality/metric type" logs will be visible for all
metric types. Additionally, `skippedSampleLogger` is only used by
`appendSamplesFromHistogram` so this was moved closer to that function.

Related to #9447
Related to #9498

- [X] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [X] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
2025-11-27 14:19:43 +02:00
Andrii Chubatiuk
9897872ca9 lib/flagutil: clarify usage of quotes in array flag values 2025-11-27 14:17:07 +02:00
Hui Wang
b8bbb07431 dashboard: tidy vmauth panels (#10088)
before:
<img width="2498" height="1042" alt="image"
src="https://github.com/user-attachments/assets/0bbd7cc2-7062-494f-827b-96d86133537f"
/>
after:
<img width="2497" height="968" alt="image"
src="https://github.com/user-attachments/assets/6256ccc2-2f8f-40ea-a23b-a1a20e242b3c"
/>
which is more consistent with other dashabords.
2025-11-27 14:12:53 +02:00
Max Kotliar
eb1c8dd67d docs: add links to issues in changelog 2025-11-27 14:09:02 +02:00
Aliaksandr Valialkin
50fc48ac47 lib/fs: avoid Go runtime stalls on Linux when all the GOMAXPROCS threads are blocked in major pagefaults while reading the data from memory-mapped files
Go runtime executes all the goroutines on GOMAXPROCS operating system threads.
Go runtime cannot switch the OS thread to another goroutine if the current goroutine
is stuck in the major pagefault while reading the data from memory-mapped file,
because Go runtime doesn't distiguinsh between reading from regular memory and reading
from memory-mapped file. So the OS thread becomes stuck while waiting until the OS
reads the data from file at the requested memory address and returns back control to Go application.

In the worst case it is possible that all the GOMAXPROCS threads are stuck in major pagefaults,
so Go runtime pauses executing all the goroutines. This state is possible in environments
with small GOMAXPROCS and high-latency disks such as NFS or small HDD-based disks at AWS.

See https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d for more details.

This commit protects from such stalls by verifying whether the given memory location from memory-mapped file
is already loaded in the OS page cache before reading from that memory.
If the location isn't in the OS page cache, then it falls back to pread() syscall for reading the data from file.
Go runtime allocates extra OS threads for long-running syscalls, so it can continue executing goroutines
across all the GOMAXPROCS threads while reading the data from slow storage via pread() syscall.

This commit uses mincore() syscall for detecting whether the given memory page is available in the OS page cache.
It also caches mincore() results for up to a minute in order to reduce the overhead for the mincore() syscall.

This commit reduces the increase rate for the process_major_pagefaults_total metric by multiple orders of magnitude
on systems with high-latency disks.
2025-11-26 20:52:27 +01:00
Artem Fetishev
3bd9c75acc lib/lrucache: use uint64 for SizeBytes() and SizeMaxBytes() (#10077)
Currently, `lrucache.Cache` `SizeBytes()` and `SizeMaxBytes()` return
type is `int`. The cache `Entry.SizeBytes()` also returns `int` value.
Changing the type to `uint64` will allow using `uint64set.Set` as the
cache entry type (see #10072).

Please note that using `uint64` regardless the cpu architecture is set
is not entirely correct, because in 32-bit systems the size won't ever
get bigger than `2^32`, so the `uint64` will too much. However current
type (`int`) is not correct either since it is signed and will only
allow to store values up to `2^31`. Alternatively, all `SizeBytes()`
methods should return `uint`.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-26 11:39:07 +01:00
Max Kotliar
9c0683f8d1 dashboards: run make dashboards-sync 2025-11-25 20:13:06 +02:00
Max Kotliar
bf4660912f .github: Add changelog tip linter 2025-11-25 13:41:48 +02:00
Yury Molodov
bb54b5e661 app/vmui: improve alert styles for better readability (#10012)
### Describe Your Changes

This PR improves vmui alert styles by adding borders between rows,
introducing a hover state for easier row identification, and aligning
badges to the left.

Related issue: #9856

| Before | After |
|--------|--------|
| <img width="1427" height="1310" alt="image"
src="https://github.com/user-attachments/assets/68f3469e-95df-449f-a85d-1c0285520e2d"
/> | <img width="1427" height="1310" alt="Image"
src="https://github.com/user-attachments/assets/89501efb-c66f-402a-9d14-01c86930a5e2"
/> |

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-11-25 13:38:24 +02:00
Andrii Chubatiuk
200a729565 app/vmui: fixed ability to select multiple metrics in explore metrics tab (#10008)
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9995

change in only `Select` component leads to infinite
ExploreMetricsGraphItem component refresh since each time array has a
new reference

### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-25 13:29:45 +02:00
Yury Molodov
7303495ae1 app/vmui: fix rendering of multiple points at the same timestamp (#10010)
### Describe Your Changes

1. Removed the *step* control from the **Raw Query** page, as it didn’t
affect chart rendering and caused confusion.
2. Fixed rendering of multiple points with the same timestamp -
previously, the second point was hidden.
3. Added proper visualization for points with the same timestamp and
identical values: such points are now shown as a square, and the tooltip
displays the number of duplicates.

**Example:**

```json
{
  "values": [1, 22, 10, 10, 5, 6],
  "timestamps": [
    1761955247950,
    1761955247950,
    1761955248960,
    1761955248960,
    1761955251980,
    1761955252990
  ]
}
```

<img width="500" height="1120" alt="image"
src="https://github.com/user-attachments/assets/192aa43e-8008-4f03-8966-00f59e52ec40"
/>
<img width="300" height="676" alt="image"
src="https://github.com/user-attachments/assets/8e361cb3-1286-452a-a687-b6b40ba7807b"
/>

Related issues: #9667 and #9666

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
2025-11-25 11:00:00 +02:00
Andrei Baidarov
98b5288e9c vmselect: do not immediately fail request if vmstorage returns search… (#10030)
….maxConcurrentRequests error

If `vmstorage` is currently overloaded it could return
maxConcurrentRequests error. Now `vmselect` immediately fails the whole
request even if `replicationFactor` is set up and other replicas could
respond without errors.

This PR treats them as regular errors, not fatal ones.

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-24 20:37:37 +02:00
Cancai Cai
d7f9cd971d docs/notes: fix syntax errors (#10019)
### Describe Your Changes

I'm not sure if this is a mistake.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

Signed-off-by: cancaicai <2356672992@qq.com>
2025-11-24 20:37:05 +02:00
cancaicai
2cb08095c6 docs/storage: fix typo
Signed-off-by: cancaicai <2356672992@qq.com>
2025-11-24 15:46:40 +02:00
dependabot[bot]
8bc41f4c79 build(deps): bump golang.org/x/crypto from 0.43.0 to 0.45.0 (#10052)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from
0.43.0 to 0.45.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4e0068c009"><code>4e0068c</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="e79546e28b"><code>e79546e</code></a>
ssh: curb GSSAPI DoS risk by limiting number of specified OIDs</li>
<li><a
href="f91f7a7c31"><code>f91f7a7</code></a>
ssh/agent: prevent panic on malformed constraint</li>
<li><a
href="2df4153a03"><code>2df4153</code></a>
acme/autocert: let automatic renewal work with short lifetime certs</li>
<li><a
href="bcf6a849ef"><code>bcf6a84</code></a>
acme: pass context to request</li>
<li><a
href="b4f2b62076"><code>b4f2b62</code></a>
ssh: fix error message on unsupported cipher</li>
<li><a
href="79ec3a51fc"><code>79ec3a5</code></a>
ssh: allow to bind to a hostname in remote forwarding</li>
<li><a
href="122a78f140"><code>122a78f</code></a>
go.mod: update golang.org/x dependencies</li>
<li><a
href="c0531f9c34"><code>c0531f9</code></a>
all: eliminate vet diagnostics</li>
<li><a
href="0997000b45"><code>0997000</code></a>
all: fix some comments</li>
<li>Additional commits viewable in <a
href="https://github.com/golang/crypto/compare/v0.43.0...v0.45.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/crypto&package-manager=go_modules&previous-version=0.43.0&new-version=0.45.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 15:13:50 +02:00
Zhu Jiekun
bb1e0d8f3b opentsdb: Avoid blocking when a connection doesn't send anything (#10045)
### Describe Your Changes

fix #9987 

Avoid blocking when a connection to `-opentsdbListenAddr` doesn't send
any data. This issue blocked other connections from being handled.

> This bug can be tested with:
> 1. Start VictoriaMetrics Single-node with `-opentsdbListenAddr=:4242`.
> 2. Run: `telnet 127.0.0.1 4242` without typing any data after
connection established.
> 3. Run (in another terminal, after step 2): `curl -H 'Content-Type:
application/json' -d
'{"metric":"x.y.z","value":2222222.34,"tags":{"t1":"v1","t2":"v2"}}'
http://localhost:4242/api/put`
> 
> Before the change:
> - Step 3 was blocked infinitely.
> 
> Expect result after the change:
> - Step 3 was executed.
> - Connection established by step 2 will be closed after 5 seconds.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-11-24 14:31:19 +02:00
Mathias Palmersheim
70be2e7ea3 Remove threshold from available cpu panel (#10056)
### Describe Your Changes

fixes #9988 by removing the cpu threshold from the Available CPU panel

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-24 14:16:35 +02:00
Kirill Yurkov
61796e355a docs: link faq for large indexdb (#10061)
Clarified the index size note in
docs/guides/understand-your-setup-size/README.md to steer readers toward
the FAQ when indexdb feels oversized, noting typical ratios and
troubleshooting guidance.
2025-11-24 14:04:05 +02:00
dependabot[bot]
2ae3fd47eb build(deps): bump actions/checkout from 5 to 6 (#10060)
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to
6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>v6-beta by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2298">actions/checkout#2298</a></li>
<li>update readme/changelog for v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2311">actions/checkout#2311</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5.0.0...v6.0.0">https://github.com/actions/checkout/compare/v5.0.0...v6.0.0</a></p>
<h2>v6-beta</h2>
<h2>What's Changed</h2>
<p>Updated persist-credentials to store the credentials under
<code>$RUNNER_TEMP</code> instead of directly in the local git
config.</p>
<p>This requires a minimum Actions Runner version of <a
href="https://github.com/actions/runner/releases/tag/v2.329.0">v2.329.0</a>
to access the persisted credentials for <a
href="https://docs.github.com/en/actions/tutorials/use-containerized-services/create-a-docker-container-action">Docker
container action</a> scenarios.</p>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>V6.0.0</h2>
<ul>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
</ul>
<h2>V5.0.1</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<h2>V5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>V4.3.1</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<h2>V4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<h2>v4.1.5</h2>
<ul>
<li>Update NPM dependencies by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1703">actions/checkout#1703</a></li>
<li>Bump github/codeql-action from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1694">actions/checkout#1694</a></li>
<li>Bump actions/setup-node from 1 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1696">actions/checkout#1696</a></li>
<li>Bump actions/upload-artifact from 2 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1695">actions/checkout#1695</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1af3b93b68"><code>1af3b93</code></a>
update readme/changelog for v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2311">#2311</a>)</li>
<li><a
href="71cf2267d8"><code>71cf226</code></a>
v6-beta (<a
href="https://redirect.github.com/actions/checkout/issues/2298">#2298</a>)</li>
<li><a
href="069c695914"><code>069c695</code></a>
Persist creds to a separate file (<a
href="https://redirect.github.com/actions/checkout/issues/2286">#2286</a>)</li>
<li><a
href="ff7abcd0c3"><code>ff7abcd</code></a>
Update README to include Node.js 24 support details and requirements (<a
href="https://redirect.github.com/actions/checkout/issues/2248">#2248</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/checkout/compare/v5...v6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=5&new-version=6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 14:00:22 +02:00
Max Kotliar
ebad7e5496 docs: Describe relation between slow inserts and unsorted labels. 2025-11-24 13:35:23 +02:00
Max Kotliar
e52de06ee5 docs: sync flags in docs with actual binaries 2025-11-24 13:16:22 +02:00
Aliaksandr Valialkin
38dd971f58 docs/victoriametrics/vmalert.md: clarify that templates can be used inside rule labels
Rule labels can contain templates in the same way as annotations.
See aad6ab009e/app/vmalert/rule/alerting_test.go (L1192)
and https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#templating

Document this, since users sometimes ask this question.
2025-11-24 10:50:55 +01:00
Artem Fetishev
aad6ab009e lib/storage: minor metricNameSearch fixes (#10065)
- Fix comment
- Re-use dst instead introducing a new variable.

This change has been requested to be in a separated PR during the
pt-index (#8134) code review.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-21 20:06:20 +01:00
Artem Fetishev
2c125e14c7 lib/storage: also create parts.json on parition creation (#10051)
Currently, when a partition is created its corresponding parts.json file
is not created right away (see createNewParition()). Its creation is
delayed until the first part files are created on disk (see
swapSrcWithDstParts()). However, the parts.json file is created for a
possibly empty partition when an existing partition is opened (see
mustOpenPartition()) and when a partition snapshot is create (see
MustCreateSnapshotAt()).

I.e. `parts.json` is an important part of a partition, since it is an
artifact that describes the partition contents. And it should be created
on pt creation even if its contents is empty.

To be honest, this change is mostly a no-op for the current storage
implementation. It only makes the code consistent, i.e. the parts.json
is created along with the partition.

However having it created when a partition is created becomes in
pt-index (#7599, #8134), because it allows having partitions with no
data and therefore without parts.json file. Still not a big deal but the
unit tests start failing.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-21 14:19:53 +01:00
Artem Fetishev
13dc60e257 lib/storage: refactoring - move dateMetricIDCache code to a separate file (#10055)
dateMetricIDCache does not belong to storage anymore since it has been
moved to indexDB. Instead moving the case to index_db.go, move it to a
separate file in order to navigate the code more easily.

No changes have been done to the code or tests.

Follow up for: #9983

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
Co-authored-by: Alexander Frolov <9749087+fxrlv@users.noreply.github.com>
2025-11-21 13:52:33 +01:00
Artem Fetishev
ed64c90e7a lib/storage: fix comments related to nextDayMetricIDs
Follow-up for 49b0a4fb16

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-21 13:31:20 +01:00
Artem Fetishev
49b0a4fb16 lib/storage: refactoring - simplify nextDayMetricIDs data structure (#10058)
The data structure used for holding the nextDayMetricIDs is too complex
and can be simplified (flattened).

Follow up for: #9983

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-21 13:02:02 +01:00
Artem Fetishev
5141496c43 lib/storage: add overlapsWith() and contains() methods to TimeRange (#10059)
The change was introduced in pt-index PR (#8134) and is extracted into a
separate PR.

Currently used in partition_search and partition. If you see more places
like this, please let me know.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-21 12:24:40 +01:00
Andrii Chubatiuk
24fac64875 docs: add warning blockquote regarding latest backup lifecycle policy (#10054)
Update formatting for warning text.

<img width="732" height="432" alt="image"
src="https://github.com/user-attachments/assets/1549e69a-fc65-445f-b567-9b5e4e1a8617"
/>
2025-11-20 13:46:34 +04:00
Aliaksandr Valialkin
8250f469a7 docs/victoriametrics/Articles.md: add https://medium.com/@kanakaraju896/backing-up-victoriametrics-data-a-complete-guide-24473c74450f 2025-11-20 08:36:59 +01:00
Aliaksandr Valialkin
7fb0f0e015 docs/victoriametrics/Articles.md: add https://blackmetalz.github.io/why-i-switched-to-victoriametrics-scaling-from-small-business-to-enterprise.html 2025-11-20 08:33:19 +01:00
Andrii Chubatiuk
563dbeaea1 app/vmalert: do not increment errors counter on cancel context
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10027
2025-11-19 13:32:16 +01:00
Nikolay
7e6468c1e3 lib/storage: properly increment missing tsids metric
Bug was introduced at 2380e4829d

Due to typo vm_missing_tsids_for_metric_id_total metric was incremented instead of vm_missing_metric_names_for_metric_id_total for missing metricName for metricID search.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10041
2025-11-19 13:29:44 +01:00
Hui Wang
328f33202f chore: clarify vmalert -external.label usage (#10042)
To clarify that HA vmalert doesn't need to specify `-external.label`.
2025-11-19 13:28:36 +01:00
Fred Navruzov
951331db80 docs/vmanomaly: release v1.28.0 (#10031)
### Describe Your Changes

Upgraded vmanomaly docs & guides to release v1.28.0 (UI v1.2.0)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-18 21:47:03 +02:00
Andrii Chubatiuk
e6139be8ba docs/vmbackupmanager: mention version since which -backupTypeTagName flag is available (#10038)
Mention version since which `backupTypeTagName` flag is available
2025-11-18 18:56:19 +04:00
Andrii Chubatiuk
77e5920014 app/vmbackupmanager: set backup type tag on backup's items
* app/vmbackupmanager: set VMBackupType tag on backup's items

* address review comments
2025-11-18 16:30:13 +04:00
Zakhar Bessarab
78049e991b docs/cluster: remove mention of select for metadata (#10034)
vmselect does not have a flag to enable metadata querying, remove
invalid reference to it from the docs.
2025-11-18 15:32:20 +04:00
Artem Fetishev
c972d70f00 docs: update VictoriaMetrics components version to v1.130.0
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-17 22:03:17 +01:00
Artem Fetishev
b947562f2b deployment/docker: update VM components version to v1.130.0
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-17 21:56:42 +01:00
Artem Fetishev
344a81fa20 docs: bump last LTS versions
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-17 20:14:07 +00:00
Artem Fetishev
4b022ea8a8 docs/CHANGELOG.md: update changelog with LTS release notes
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-17 20:08:16 +00:00
Artem Fetishev
04c24fc831 lib/workingsetcache: Fix bytesSize metric calculation (#10025)
Follow-up for 3e6fc445a9

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-17 13:49:17 +01:00
Artem Fetishev
d2f78e4b2b docs/CHANGELOG.md: cut v1.130.0
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-14 17:45:20 +00:00
Max Kotliar
3995837c58 docs: update latest version in docs to v1.130.0 2025-11-14 19:37:14 +02:00
Artem Fetishev
1d53496f98 make vmui-update
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-14 17:21:49 +00:00
Artem Fetishev
73a1ce2dd6 lib/storage: Move dateMetricIDCache to indexDB (#9983)
Looks like the `dateMetricIDCache` must be per indexDB:

- the use of this cache and `is.hasDateMetricID()` often go in pairs. So
it makes
  sense to use this cache in that method.
- The same is true for `createPerDayIndexes()`: everytime the index
entry is
  created, a corresponding entry is added to the cache.
- As a result the generation field is also removed from the cache.

Related to #7599 and #8134.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-14 16:03:28 +01:00
Aliaksandr Valialkin
daa88f6a43 docs/victoriametrics: cross-link rebalancing section at VictoriaMetrics cluster docs and the corresponding question at the FAQ page 2025-11-14 15:36:57 +01:00
Aliaksandr Valialkin
7bff73b0f7 docs/victoriametrics/Cluster-VictoriaMetrics.md: add rebalancing chapter, which explains how to rebalance data among vmstorage nodes
This is very frequent question from new users of VcitoriaMetrcs who migrate from other solutions
with automatic data rebalancing among storage nodes, so it is a good idea to cover it in the docs.
2025-11-14 15:32:29 +01:00
Max Kotliar
bf3b1cf6b6 lib/storage/metricsmetadata: ensure deterministic sorting for identical metric names across tenants
Metrics metadata is loaded from a per-tenant storage map
(perTenantStorage map[uint64]map[string]*Row), so result rows order is
non-deterministic. The existing sortRows implementation only sorts by
metric name and ingestion time, which means rows that differ only by
tenant/account ID still sorted undeterministically.

This change updates `sortRows` to include account\project identifiers in
the comparison, ensuring stable and deterministic ordering for metadata
entries that share the same metric name and timestamp.

First discovered as flaky test:

--- FAIL: TestStorageRead (0.00s)
    storage_test.go:337: unexpected rows get result (-want, +got):
          []*metricsmetadata.Row{
          	&{
          		... // 2 ignored and 1 identical fields
          		Help:      "uselesshelp1",
          		Unit:      "seconds1",
        - 		AccountID: 1,
        + 		AccountID: 0,
        - 		ProjectID: 1,
        + 		ProjectID: 0,
          		Type:      1,
          	},
          	&{
          		... // 2 ignored and 1 identical fields
          		Help:      "uselesshelp1",
          		Unit:      "seconds1",
        - 		AccountID: 0,
        + 		AccountID: 1,
        - 		ProjectID: 0,
        + 		ProjectID: 1,
          		Type:      1,
          	},
          }
FAIL

https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/actions/runs/19361594138/job/55394642029#step:4:133
2025-11-14 15:22:27 +02:00
Max Kotliar
a10ff67354 docs/changelog: Add links to changelog 2025-11-14 13:41:59 +02:00
Haley Wang
9a8463df42 lib/storage: add a value check for retentionFilter to ensure it does not exceed retentionPeriod 2025-11-14 12:50:46 +02:00
Max Kotliar
7e22b169f1 docs: Add metrics metadata how to use in docs
follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9487
2025-11-14 10:37:15 +01:00
f41gh7
80c1af5af1 apptest: add metrics metadata test for vmsingle
related issue github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
2025-11-14 10:29:28 +01:00
f41gh7
5a587f2006 app/{vmstorage,vmselect,vminsert}: introduce metrics metadata storage
This commits adds storage part and cluster RPC methods for metrics metadata.

 Key concepts:
* vmstorage persists metadata in-memory only.
* vmstorage evicts metadata records older than 1 hour.
* vmstorage stores only the last value of metadata for time series
  metric name.
* vminsert opens an additional TCP connection to the vmstorage for
  metadata write requests.
* vmselect doesn't support `limit_per_metric_name`.

This feature is available optional and must be enabled via flag - `-enableMetadata` provided to vminsert/vmsingle.

Fixes github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
2025-11-14 10:24:38 +01:00
Aliaksandr Valialkin
847cd1e336 docs/guides/understand-your-setup-size/README.md: remove the misleading recommendation for having at least 2vCPU cores per each vmstorage node
vmstorage nodes work perfectly with one CPU core and even with 10% of a single CPU core
if the allocated CPU resources matches their workload.

It is better to recommend allocating the an interger number of CPU cores to vmstorage
in order to achieve an optimal performance, since vmstorage allocates internal resources
according to the available CPU cores. If there is a fractional number of CPU cores,
then the allocation of internal resources may be not so optimal.

Fractional number of CPU cores may also lead to increased latencies and stalls
because some P threads at Go runtime won't be able to run goroutines from their ready queues
in a timely manner becasue of the lack of CPU time. See https://victoriametrics.com/blog/kubernetes-cpu-go-gomaxprocs/
2025-11-14 09:48:30 +01:00
Aliaksandr Valialkin
c86857b269 docs/victoriametrics/vmagent.md: mention that it isn't recommended increasing the -maxConcurrentRequests command-line flag value in general case
Too big values for the -maxConcurrentRequests command-line flag increase memory usage
and increase CPU overhead for processing incoming requests in most cases.
The only valid reason for increasing the value for -maxConcurrentRequests command-line flag
is when many clients send data to vmagent over very slow network.
2025-11-14 09:40:31 +01:00
Hui Wang
c93937101c Improve vmalert UI tip (#9998) 2025-11-13 21:04:39 +01:00
Aliaksandr Valialkin
cca7380dd3 docs/victoriametrics: fix broken link to /api/v1/rules docs at Prometheus 2025-11-13 19:40:10 +01:00
Aliaksandr Valialkin
ca3b9b18b5 docs/victoriametrics/README.md: add context links to the FAQ entry describing why IndexDB size may be too large 2025-11-13 19:36:18 +01:00
Nikolay
10f7cd2ffc lib/encoding/zstd: properly apply size limits
Previously, zstd Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components.  And in case of incorrectly formed zstd block, VictoriaMetrics
component may allocate extra memory. Which may lead to the OOM errors.

This commit makes ingest endpoints check frame content size and window size headers based on MaxRequest Limits.
2025-11-13 18:13:23 +01:00
Hui Wang
fa85726a82 vmalert: print the error message as value if templating fails in alerting rule
For users, if an alerting rule has a misconfigured annotation, it's more
important to deliver the alert when the rule triggers rather than skip
it with templating error logs.
Then users can see the faulty annotation in alert message and fix it.

Note: the previous behavior is retained in replay mode because errors
there should be noticed immediately; hiding them could waste time,
resources and require a re-replay after fixes.
Also the rule's status in the vmalert UI remains unhealthy if templating
failed.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9853
2025-11-13 17:34:55 +01:00
Hui Wang
567c084d6d vmalert: drop labels with empty values in generated alerts and time series
In prometheus ecosystem, a label with an empty value equals no label,
since a query like `test{something=""}` matches all the series without
label `something`.
So for vmalert, preserving empty-value labels in generated alerts or
time series is unnecessary and can cause alert hash mismatches during
[restore](https://docs.victoriametrics.com/victoriametrics/vmalert/#alerts-state-on-restarts).
The empty-value label shouldn't come from datasource response since they
follow the same rule(omit empty-value labels), it may come from
`-external.label` or rule labels, but the empty value could be caused by
occasionally templating failures, which is hard to check there.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
2025-11-13 17:24:27 +01:00
Hui Wang
12a1388fbc vmalert: fix a potential race condition in web api during rule hot reload
Group rules are not protected by
[m.groupsMu](03c784e3e3/app/vmalert/manager.go (L25)),
they could be updated(with config hot reload) during `/api/v1/rule`,
`/api/v1/alert` and `/api/v1/alerts` API calls. This fix takes a
snapshot by calling `group.ToAPI()` first, making all reads safe.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9551
2025-11-13 17:22:25 +01:00
JAYICE
62c19b386a lib/httputl: fix failing to access http2 sd service by the shadow copy of http.DefaultTransport
Clone `http.DefaultTransport` and disable HTTP2 without resetting
`TLSClientConfig.NextProtos` in the shadow copy of
`http.DefaultTransport` will cause the request to HTTP/2 server to fail.
See https://github.com/golang/go/issues/39302.

To reproduce it, use a scrape config like:
```
scrape_configs:
  - job_name: test
    yandexcloud_sd_configs:
      - service: compute
        api_endpoint: https://api.cloud.yandex.net
```
Before the fix, access to the SD service would fail.

A solution is to specify `http/1.1` in  `TLSClientConfig.NextProtos`.

Related golang issue: https://github.com/golang/go/issues/39302

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9981
2025-11-13 17:19:15 +01:00
Andrii Chubatiuk
a84586a246 docs: update grafana plugin links, move root file to plugins repo (#10001)
### Describe Your Changes

update victorialogs grafana plugin links, moved root plugin file to
plugin repo

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-13 15:00:14 +02:00
Zhu Jiekun
24867a042b docs: mention VictoriaTraces playground in doc (#9999)
### Describe Your Changes

Add VictoriaTraces playground in doc.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-13 12:50:54 +02:00
Andrii Chubatiuk
9baade2898 docs: added perses section, move grafana datasource to integrations (#9994)
### Describe Your Changes

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9888

additionally adds grafana datasource into integrations section and
excludes previous location from menu and search

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-13 12:49:16 +02:00
Zhu Jiekun
6117b2ead9 VMUI/relabel debug: Allow labels textarea input without curly braces (#9950)
### Describe Your Changes

Fixed #9900

relax the validation for the labels text area. It now accepts input
labels without being enclosed in curly braces.

The following input format should be supported now:


```
	metric_name
	metric_name{label1="value1"}
	{__name__="metric_name", label1="value1"}
	__name__="metric_name", label1="value1"
	label1="value1"
```

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-11-13 12:41:38 +02:00
Aliaksandr Valialkin
0df2993cf4 vendor: update github.com/valyala/gozstd from v1.23.2 to v1.24.0
This is needed for being able to use DecompressLimited() function for limiting
the size of descropressed data.

See https://github.com/valyala/gozstd/pull/75
Updates https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/issues/958
2025-11-12 21:10:51 +01:00
Aliaksandr Valialkin
14f1bda8fc docs/victoriametrics/vmalert.md: remove "👉" char from the Common mistakes chapter, since this looks like AI-generated content
While at it, fix a typo `&step` -> `step`.

This is a follow-up for the commit 40ab285fb9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9343
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9373
2025-11-12 18:51:21 +01:00
Max Kotliar
f96f4709f6 docs/changelog: move changelog line to tip. polish it a bit 2025-11-12 19:40:18 +02:00
Samarth Bagga
96c1392b45 Add log which will report dropped log count (#9752)
### Describe Your Changes

I have added a counter for the throttled logs which gets logged every 1
minute.
Fixes #9498

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: Hui Wang <haley@victoriametrics.com>
2025-11-12 19:32:33 +02:00
Max Kotliar
8dd905c7a9 lib/envflag: apply -secret.flags inside envflag.Parse function (2nd attempt) (#9963)
### Describe Your Changes

The PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9942 was
reverted in
c90c7c3123
because of the import cycle in the enterprise VM. Needs more work.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-12 19:23:51 +02:00
Hui Wang
1c7abd3137 docs: fix flag type in descriptions (#9979)
Do not use backticks in command-line flag description, it pollutes the
flag type in descriptions.
2025-11-12 13:51:48 +02:00
Aliaksandr Valialkin
e8114806aa docs/victoriametrics: add a case study from Spotify based on the https://www.youtube.com/watch?v=87koDlpKDR4 2025-11-12 10:49:36 +01:00
Aliaksandr Valialkin
8f1c1cc7c9 docs/victoriametrics/FAQ.md: mention that disabling per-day index may reduce the growth rate of indexdb for static time series over time 2025-11-11 16:58:10 +01:00
Aliaksandr Valialkin
68f670cbc5 docs/victoriametrics/FAQ.md: add Why IndexDB is so large? chapter, since this is quite frequent question from VictoriaMetrics users 2025-11-11 16:48:03 +01:00
Aliaksandr Valialkin
dac7e8d554 docs/victoriametrics/FAQ.md: add trailing slashes to links to posts about VcitoriaMetrics components
Trailing slashes are needed to make the URLs canonical and avoid redirects.

This is a follow-up for d4aefcecc4
2025-11-11 15:59:20 +01:00
Aliaksandr Valialkin
3db6c40b70 deployment: update Go builder from v1.25.3 to v1.25.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.25.4%20label%3ACherryPickApproved
2025-11-11 12:15:12 +01:00
Aliaksandr Valialkin
90c69a07a9 docs/victoriametrics/stream-aggregation: fix broken links to https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#stream-aggregation-config ( was https://docs.victoriametrics.com/victoriametrics/stream-aggregation/configuration/#aggregation-config )
This is a follow-up after the commit f385e36b96
2025-11-11 02:01:04 +01:00
Aliaksandr Valialkin
f5b1092e07 lib/workingsetcache: prevent from duplicate misleading log messages when reading the cache from file
While at it, improve logging when reading workingsetcache from file and saving it to file.
This should simplify troubleshooting various issues related to the workingsetcache.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9750
2025-11-11 00:02:09 +01:00
Aliaksandr Valialkin
3e6fc445a9 lib/workingsetcache: properly update cache stats
This is a follow-up for the commit 1130adebad .

The EntriesCount, BytesSize and MaxBytesSize metrics must take into account the data
stored in both prev and curr caches, since this data occupies memory and it is expected
that the exposed metrics - vm_cache_entries, vm_cache_size_bytes and vm_cache_size_max_bytes -
take into account all the memory occupied by the corresponding caches.

The GetCalls, SetCalls, Collisions and Corruptions metrics must take into account stats
from the curr cache only, since the corresponding stats for the prev cache is already taken
during the rotation (when moving curr to prev and resetting the previous prev).

The Misses metric must take into account only misses in the prev cache, since these misses
mean that the given entry is missing the both the curr and the prev cache.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657

While at it, make sure that the cache mode and cache stats is always read and updated under c.mu lock.
This may help resolving races similar to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9921
2025-11-10 22:15:50 +01:00
Aliaksandr Valialkin
1130adebad Revert "lib/workingsetcache: properly count workingsetcache metrics "
This reverts commit 89fd27c922.

Reason for revert: this commit adds scalability bottleneck in the fast path - Cache.Get() -
in the form of c.getCalls.Add(). This call doesn't scale on systems with big number of CPU cores,
since it needs to update atomically a shared memory from big number of CPU cores.

The Cache.Get() is called per every ingested sample when obtaining TSID by MetricName from the cache
at lib/storage.Storage.get(), so this can be a major bottleneck on systems with many CPU cores.

The solution for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9553
is to properly track cache requests and misses: cache requests must be taken into account
only at the curr cache, while cache misses must be taken into account only at the prev cache.
This will be implemented in the follow-up commit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9657
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9715
2025-11-10 21:43:33 +01:00
Aliaksandr Valialkin
1fb3f105c3 Revert "lib/storage: Introduce vm_cache_eviction_bytes_total metric"
This reverts commit 994dadb4d5.

Reason for revert: the introduced metrics have zero practical applicability.

The lib/workingsetcache doesn't need manual tuning in most cases - its' size
is automatically adjusted to the given working set, if the working set is smaller
than the cache size limit set at the cache creation time. The limit just prevents
unbounded cache growth for large working sets.

If the working set exceeds the given limit, then the cache may become inefficient
because of the increased cache miss rate. The introduced metrics do not help determining
the needed cache size.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
2025-11-10 16:49:51 +01:00
Aliaksandr Valialkin
38d3033e66 go.mod: update github.com/VictoriaMetrics/fastcache from v1.13.1 to v1.13.2
This is needed for removing the EvictedBytes metric from the fastcache.

See the description of f6080737bb for details.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9293
Updates https://github.com/VictoriaMetrics/fastcache/pull/93
2025-11-10 16:43:33 +01:00
Aliaksandr Valialkin
cfba80ed4d lib/workingsetcache: replace Cache.Save() with Cache.MustSave()
If the cache cannot be saved to the given file, this is a fatal error.
It is better to log this fatal error inside Cache.MustSave() and then exit
instead of returning it to the caller. This makes the code more clear at the caller side.
2025-11-10 16:00:59 +01:00
Aliaksandr Valialkin
ee0eff0ca2 lib/workingsetcache: improve log messages for various expected cases when reading the cache from files
The improved log messages must help users understanding the logged cases
without asking VictoriaMetrics developers on these cases.
2025-11-10 14:54:22 +01:00
Aliaksandr Valialkin
181f95aaf6 deployment/docker/rules/alerts-health.yml: clarify the description of the TooManyTSIDMisses alert after the commit 30641b201b
It is expected that the number of TSIDs misses over the last 5 minutes is zero in steady state.
If it is non-zero, then something wrong happens. That's why it is better to use increase() instead of rate() function
for this alert.
2025-11-10 14:35:36 +01:00
Aliaksandr Valialkin
58485df033 lib/storage: consistently rename searchMetricNameWithCache() to SearchTSIDs() across comments after the commit 90d23d7c9f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9765
2025-11-10 14:30:55 +01:00
Aliaksandr Valialkin
30641b201b deployment/docker/rules/alerts-health.yml: clarify the description for the TooManyTSIDMisses alert
This alert is expected after unclean shutdown (OOM, power off, kill -9) of VictoriaMetrics.
It should go away in a few minutes after the restart while VictoriaMetrics deletes metricIDs
for the missing MetricID->TSID entries which were created for the newly registered time series
just before unclean shutdown. It is OK to delete such metricIDs, since the corresponding time series
will be re-registered again. See the commit 20812008a7 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502
2025-11-10 14:25:22 +01:00
Aliaksandr Valialkin
b2cd3bf1f2 lib/workingsetcache: properly initialize new cache when the stored cache has unexpected size
This is a follow-up for 9bc541587b
2025-11-10 12:49:04 +01:00
Artem Fetishev
5336091785 lib/storage: Fix data race in containsTimeRange() (#9965)
When one goroutine attemps to update the min timestamp under the lock it
could have been updated already by another goroutine with a smaller
timestamp. As a result the goroutine will update the timestamp with a
bigger value.

A simple unit test (included in this commit) demonstrates that.

Additionally, use a simple Mutex instead of RWMutex. RWMutexes only
introduce an unnecessary overhead for operations as simple as retrieving
a value from a map and regular Mutex should be preferred.

Thanks to @valyala for spotting a bug and the advice on RWMutexes.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-11-07 15:20:29 +01:00
Yury Molodov
5b11f6f384 app/vmui: improve chart performance and fix median calculation
This commit improves overall performance and stability of chart rendering,
refines time series generation, and fixes incorrect median calculation
in metric series.
JavaScript execution time improved by up to ×6 on large datasets.

**Changes:**

* Reworked `getTimeSeries` - one point per pixel.
* Added legend auto-collapse when >20 items.
* Switched median algorithm to Quickselect (Floyd–Rivest).
* Unified array stats functions (`min`, `max`, `avg`, `median`) into a
single pass.
* Removed unused `last` value from series.
* Renamed `roundToMilliseconds` to `roundToThousandths` and moved to
`utils/math`.
* Replaced `isSupportedDuration` with `parseSupportedDuration`, added
fractional duration support.

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9699
Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9926
2025-11-07 16:19:56 +03:00
Zhu Jiekun
e8975e560d lib/promscrape: prevent early exit when one of multiple service discovery configs fails
When multiple service discovery configs of the same type exist (e.g.,
`hetzner_sd_config`), vmagent currently behaves as follows:
1. Attempts to request each config.
2. Exits immediately if any config returns an error.
3. Skips the rest configs and falls back to the previous service
discovery result.

The correct behavior—more compatible with Prometheus—should be:
1. Attempt to request each config.
2. Collect all valid results.
3. Use the valid results if there's at least one. otherwise (all
failed), fall back to the previous SD result.

Scrape example:

```yaml
scrape_configs:
  - job_name: hetzner-default
    hetzner_sd_configs:
      - role: "hcloud"
        authorization:
          credentials: "some_valid_value"
      - role: "hcloud"
        authorization:
          credentials: "some_wrong_value"
```

Expected outcome: 
- At least targets from `credentials: "some_valid_value"` should appear
in the service discovery result.

current outcome:
- the error from `credentials: "some_wrong_value"` leads to an **empty**
result.

This issue should affect service discovery which using
`getScrapeWorkGeneric` function:

- `azure_sd_config`
- `consul_sd_config`
- `consulagent_sd_config`
- `digitalocean_sd_config`
- `dns_sd_config`
- `docker_sd_config`
- `dockerswarm_sd_config`
- `ec2_sd_config`
- `eureka_sd_config`
- `gce_sd_config`
- `hetzner_sd_config`
- `http_sd_config`
- `kuma_sd_config`
- `marathon_sd_config`
- `nomad_sd_config`
- `openstack_sd_config`
- `ovhcloud_sd_config`
- `puppetdb_sd_config`
- `vultr_sd_config`
- `yandexcloud_sd_config`

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9375
2025-11-07 16:18:11 +03:00
Yury Molodov
3b5822398c app/vmui: fix points display; add option to show all points
* Fix rendering of isolated points at gaps.
* Add toggle to always show all points (even when connected by a line).

Related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9666
2025-11-07 16:09:28 +03:00
Yury Molodov
742a3384dc app/vmui: fix incorrect median value calculation in series (#9926)
Signed-off-by: Yury Molodov <yurymolodov@gmail.com>
2025-11-07 13:42:22 +02:00
Clément Nussbaumer
8a1beef46d lib/promscrape/kubernetes: add namespace metadata discovery
permits attaching namespace metadata to pods, services, ingresses,
endpoints and endpointslices for kubernetes service-discovery.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7486
2025-11-07 11:49:08 +03:00
Fred Navruzov
22158b7272 docs/vmanomaly: patch release v1.27.1 (#9964)
### Describe Your Changes

Patch release doc updates (v1.27.1)

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-06 14:17:18 +02:00
Aliaksandr Valialkin
4a18f4e49d docs/victoriametrics/Articles.md: add https://www.tigrisdata.com/blog/billing-prometheus/ 2025-11-06 11:44:16 +01:00
Aliaksandr Valialkin
85cbbfe0bc lib/logstorage: verify that nobody holds references to parts when closing the partition
This is needed in order to detect and prevent cases of improper usage of partitions
while they are closed.

This is a follow-up for the commit 9725ee50ec .
2025-11-06 11:37:44 +01:00
Max Kotliar
d5705a9647 docs/guides: use canonical link 2025-11-05 21:09:19 +02:00
Max Kotliar
c90c7c3123 Revert "lib/envflag: apply -secret.flags inside envflag.Parse function (#9942)"
This reverts commit 1b11031ec8.

There is an import cycle because of the change in enterprise version of VM
2025-11-05 21:01:48 +02:00
Max Kotliar
1b11031ec8 lib/envflag: apply -secret.flags inside envflag.Parse function (#9942)
### Describe Your Changes

Follow up on PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839, which
addresses review comment

https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9839#discussion_r2477729886

Alex: 
```
this design decision isn't good, since it will lead to potential security issues over time when we'll forget adding ApplySecretFlags() call after the flag.Parse() call or add it at the wrong place. BTW, we do not call flag.Parse() explicitly - instead envflag.Parse() is called. So it is natural to call ApplySecretFlags() inside this call. Are there restrictions which prevent from doing this? If there are no restrictions, then there is no need in making this function public - it will be called explicitly inside envflag.Parse().
```

There is no changelog entry as there is no change in user-visible
behavior.

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-05 20:53:43 +02:00
Max Kotliar
e7b7015eb1 docs: clarify why we advise 50% free RAM. Add link to discussion (#9943)
### Describe Your Changes

Based on answer
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9895#issuecomment-3442491150

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-05 20:51:17 +02:00
Max Kotliar
6ee1edeb4d docs: fix links in VictoriaMetrics topologies guide 2025-11-05 20:45:32 +02:00
Aliaksandr Valialkin
9725ee50ec lib/mergeset: verify that Table parts are no longer used at Table.MustClose()
This should catch possible errors related to improper release of Table parts.
Fix such an error at TestTableCreateSnapshotAt by properly closing all the initialized
TableSearch instances.

Thanks to @rtm0 for pointing to this issue.
2025-11-05 13:25:16 +01:00
f41gh7
780cb1bf05 docs: mention latest v1.129.1 release 2025-11-04 18:07:57 +03:00
f41gh7
a34d0d6056 docs: mention latest LTS releases
v1.110.23 and v1.122.8
2025-11-04 18:07:57 +03:00
f41gh7
5e98e0cff5 CHANGELOG.md: cut v1.129.1 release 2025-11-04 13:15:37 +03:00
Nikolay
51b44afd34 lib: properly apply snappy Decode limits
Previously, snappy Decoder didn't take in account Request Size limits
applied by VictoriaMetrics components.  And in case of incorrectly formed snappy block, VictoriaMetrics
 component may allocate extra memory. Which may lead to the OOM errors.

This commit makes ingest endpoints check block size header based on MaxRequest Limits.
2025-11-04 13:04:27 +03:00
Fred Navruzov
5a8d7984ca docs/vmanomaly: release v1.27.0 (#9954)
### Describe Your Changes

Docs update to follow vmanomaly's release v1.27.0, including:
- UI page update (changelogs, auth, new screenshots)
- Migration guide addition
- Cross-references of the above and version bumps

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2025-11-04 10:22:20 +04:00
Zakhar Bessarab
57752ca2c0 docs: update VM version to v1.129.0
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-11-03 16:55:32 +04:00
Zakhar Bessarab
171cdf0614 deployment/docker: update VM version to v1.129.0
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-11-03 16:53:21 +04:00
Artem Fetishev
7d19ec2e4d lib/storage: extract storage file names into constants (#9944)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-10-31 20:32:16 +01:00
Zakhar Bessarab
a9b5033d50 docs/changelog: backport LTS changelog
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-10-31 22:41:17 +04:00
Zakhar Bessarab
a783b2048f docs/changelog: cut v1.129.0
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-10-31 20:04:01 +04:00
Zakhar Bessarab
fd48c72a83 docs: update version tooltips
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-10-31 19:59:27 +04:00
Zakhar Bessarab
05e52fa05a app/vmselect: run make vmui-update
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-10-31 19:47:16 +04:00
Zakhar Bessarab
b49e8d032f make: add vmutils target for linux/s390x for CI builds
Currently, CI builds are ignoring linux/s390x due to missing target. See an example: https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/actions/runs/18971198439/job/54179142369

Follow-up for 255a0cf6, 73b10d76

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-10-31 15:45:33 +04:00
Zakhar Bessarab
73b10d7621 make: include s390x binaries into release artifacts (#9941)
Previously, it was possible to build binaries with make targets but
those builds were not included in the release artifact. Update release
targets to include s390x artifacts in release artifacts.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9697

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2025-10-31 15:16:07 +04:00
hagen1778
18268c3d13 docs: re-qualify load-balancing optimization to feature
Motivation: the change updates load-balancing logic, enhancing it rather than fixing
a critical bug. Such enhancement should not be ported to LTS versions.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9712

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-31 09:52:34 +01:00
hagen1778
bfb49c55af docs: add changelog for vmalert UI fix
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9892
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9909
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-31 09:34:29 +01:00
Kirill Yurkov
bd7fed9b41 docs: address review comments from PR #9919 (#9940)
- Standardize all sections to use 'Recommended for:' instead of mixed
'For whom:' and 'Target audience:'
- Fix wording: 'Query evaluation is always local' 

Addresses comments in #9919
2025-10-31 09:21:52 +01:00
Roman Khavronenko
a85c5830c1 app/vmalert: limit delayBeforeStart up to 5min (#9930)
vmalert tries to spread the moment group starts its evaluation
on `[0..group.interval]` duration. This approach allows to avoid
thundering herd problem when on vmalert start all groups execute their
rules simultaneously. It was introduced in
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/724

While for most configs it works great, for groups with big evaluation
intervals (30min, 60min) the first evaluation can be delayed
significantly.
This change introduces a start delay limit via new flag
`--group.maxStartDelay` (5m default).
It limits the `[0..group.interval]` start delay to
`[0..math.min(--group.maxStartDelay, group.interval)]`.
So all groups will start in first 5m or earlier.

The --group.maxStartDelay is ignored if user set `eval_offset`.

The 5m default limitation was picked high to not affect users with
relatively low evaluation intervals.

-----------

Based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9929

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-31 09:20:46 +01:00
Andrii Chubatiuk
009ddb9ce1 vmui: wrap annotations in alerting (#9909)
similar to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9892
but for alerting tab in vmui
2025-10-31 09:13:32 +01:00
Zakhar Bessarab
91bce8d3b4 app/vmbackupmanager: enforce newline at the end of CLI result (#956)
* app/vmbackupmanager: enforce newline at the end of CLI result

Previously, vmbackupmanager only printed a response from API which did not include a newline character. That leads to issues with the rendering of the next command when using a shell.

Always append a newline character to avoid breaking shell formatting when using CLI mode.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/victoriametrics/changelog/CHANGELOG.md

Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-10-31 11:52:53 +04:00
Andrii Chubatiuk
a7d69cc51e app/vmbackupmanager: create vm_backup_last_created_at metric for latest backup (#954) 2025-10-31 11:52:47 +04:00
Aliaksandr Valialkin
2e6f42bff8 lib/promauth/config.go: typo fix: It -> If
The typo is spotted in https://github.com/VictoriaMetrics/VictoriaLogs/pull/765/files#r2434359693
2025-10-30 18:20:00 +01:00
Aliaksandr Valialkin
7ae1fd9614 docs/victoriametrics/Articles.md: add a link to https://medium.com/@vijayrauniyar1818/how-we-eliminated-10k-year-in-aws-cross-zone-data-transfer-costs-with-zone-aware-kubernetes-09fff0c2435b 2025-10-30 17:24:17 +01:00
Aliaksandr Valialkin
51d1c16230 app/vmauth: make load distribution more even among backends which execute queries with varying durations
The load distribution could be uneven when short queries arrive to vmauth while a part of backends are busy
with long-running queries. In this case the major load goes to the backend after a row of busy backend.

Suppose we have four backends - b1, b2, b3 and b4. The first two backends are busy with bigger number
of long-running queries than b3 and b4. Then 75% of short queries will go to b3, while only 25%
of short queries will go to b4.

The new algorithm makes the distribution more even in these cases by storing the next backend
after the chosen backend as candidate for the next query (its' index is stored in the atomicCounter).
Avoid races when updating atomicCounter from concurrently executed queries by using CompareAndSwap() -
if the concurrent query updated it first then the current query won't overwrite it with the outdated value.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9712
2025-10-30 14:37:48 +01:00
Max Kotliar
874f8b31a3 docs/changelog: fix pr link in v1.122.5 tag 2025-10-30 15:07:39 +02:00
Artem Fetishev
d3ac2473c0 lib/storage: Add data ingestion benchmarks for various data patterns
Data patterns considered:

- Same series, same date
- Same series, different dates
- Different series, same date
- Different series, different dates

To make sure that the pattern condition holds, a new storage instance is
started every benchmark iteration.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9912
2025-10-30 14:39:28 +03:00
Zakhar Bessarab
3f45690342 app/vmctl/remote-read: allow providing multiple label filters
Previously, vmctl only accepted one label for filtering. Extend this to
allow providing multiple-filters at once. This is useful when migrating
large volumes of data as it allows narrowing down migration scope of
migration for one run so that the source side is not overwhelmed with
migration.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9917/
2025-10-30 14:38:30 +03:00
Roman Khavronenko
3abd442742 app/vmalert: properly show last evaluation with 0 value
Before, rules that didn't get evaluated yet were showing weird values in
vmalert's UI. It was happening because of
`time.Since(r.LastEvaluation).Seconds()` expression when
`r.LastEvaluation` had 0 value.

With this change, rules that weren't evaluated yet would show `Never` in
Updated column instead.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9924
2025-10-30 14:37:34 +03:00
Zhu Jiekun
5cec04d8e2 lib/httpserver: revert HTTP/2 support
This commit request revert the commit
d6bbfaf164 for the following reasons:

1. HTTP/2 carries security risks.
2. Most components in the VictoriaMetrics stack do not require HTTP/2
support.
3. While HTTP/2 support was available only as an option in previous
commit, there remains a potential risk of misusing this option and
enabling HTTP/2 inadvertently.

For components (e.g., VictoriaTraces) that require HTTP/2 support, they
should currently build an HTTP server manually with built-in packages,
instead of using `lib/httpserver` in VictoriaMetrics. If the mentioned
issue is resolved in the future and more components need HTTP/2, this
support can be reintroduced into `lib/httpserver`.
 
Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9927
2025-10-30 14:36:48 +03:00
Aliaksandr Valialkin
60f777620f lib/fs/fsutil: set the default value for -fs.maxConcurrency depending on the number of available CPU cores
This should reduce the need to tune this flag on systems with different number of CPU cores.
16 concurrent file operations per CPU should give quite low Go scheduling latency (~10ms)
according to https://github.com/VictoriaMetrics/VictoriaLogs/issues/774#issuecomment-3456814064

This is a follow-up for the commit 8a9a40dbdd
2025-10-30 11:57:10 +01:00
Aliaksandr Valialkin
8a9a40dbdd lib/fs/fsutil: add -fs.maxConcurrency command-line flag for tuning concurrent operations with files
This flag can help tuning Go scheduling latency on systems with small number of CPU cores
vs data ingestion performance on systems with high-latency storage such as NFS or Ceph.

Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/774
Updates https://github.com/VictoriaMetrics/VictoriaLogs/issues/517
2025-10-30 11:46:27 +01:00
hagen1778
fde4b4013a docs: rm extra line
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 11:02:16 +01:00
hagen1778
bf69b0d686 docs: order recent changes by components
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 11:01:43 +01:00
hagen1778
a866474918 dashboards: run make dashboards-sync
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 10:58:58 +01:00
hagen1778
22f6cb6339 docs: mention PR author for dashboard change
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 10:56:08 +01:00
Samarth Bagga
0d7b7649bf dashboards: enable search for non default flags panel (#9928)
### Describe Your Changes

Added search for non default flags by editing the grafana configs.
Resolves #9910

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 10:53:36 +01:00
nemobis
74611ce6f2 docs: Update RELEX figures (#9931)
### Describe Your Changes

Please provide a brief description of the changes you made. Be as
specific as possible to help others understand the purpose and impact of
your modifications.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2025-10-30 10:20:27 +01:00
Roman Khavronenko
a5dd0324a9 app/vmalert: simplify delayBeforeStart func (#9929)
It is a cosmetic change: it simplifies function signature by making it a
method of the Group struct.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 10:19:05 +01:00
hagen1778
45c0d40127 docs: fix typo after 3e0aa46
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 10:18:28 +01:00
Andrii Chubatiuk
fc978c95af app/vmalert: use search expression to match group and file names (#9920)
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9886

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2025-10-30 10:01:25 +01:00
Kirill Yurkov
8e99efe0fa docs: add VictoriaMetrics architectures guide from startups to hypers… (#9919)
The new guide section about architecture from scratch to hyperscale!
2025-10-30 09:55:48 +01:00
hagen1778
3e0aa46fdb docs: reorder template functions alphabetically
follow-up after ea41fea453

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 09:54:40 +01:00
Anh-Dung Nguyen
ea41fea453 app/vmalert: add now template (#9913)
### Describe Your Changes

Related to #9864, add "now" as template in vmalert rules templating and
update the docs. I haven't been able to test the docs change as I can't
run make docs-debug locally so if anyone know how to do it locally,
please let me know!

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Co-authored-by: Hui Wang <haley@victoriametrics.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
2025-10-30 09:52:48 +01:00
hagen1778
0165108a8f docs: move change line to the right place
Follow-up after 2652a7c762

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-30 09:52:17 +01:00
Hui Wang
2652a7c762 vmalert: support alert_relabel_configs per each notifier in -notifier.config file (#9736)
fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5980
2025-10-30 09:47:40 +01:00
Stephan Burns
82aacc5b75 vmagent/docs: grammar changes (#9863)
### Describe Your Changes

Lots of small changes to grammar to make the docs flow nicer.

I'm sorry that this ended up being such a large PR. I will split these
up in the future.

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

Co-authored-by: Phuong Le <39565248+func25@users.noreply.github.com>
2025-10-28 16:56:03 +02:00
dependabot[bot]
0544bb12e0 build(deps): bump vite from 7.1.5 to 7.1.11 in /app/vmui/packages/vmui (#9885)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite)
from 7.1.5 to 7.1.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/releases">vite's
releases</a>.</em></p>
<blockquote>
<h2>v7.1.11</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v7.1.11/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v7.1.10</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v7.1.10/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v7.1.9</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v7.1.9/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v7.1.8</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v7.1.8/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v7.1.7</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v7.1.7/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v7.1.6</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v7.1.6/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md">vite's
changelog</a>.</em></p>
<blockquote>
<h2><!-- raw HTML omitted --><a
href="https://github.com/vitejs/vite/compare/v7.1.10...v7.1.11">7.1.11</a>
(2025-10-20)<!-- raw HTML omitted --></h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>dev:</strong> trim trailing slash before
<code>server.fs.deny</code> check (<a
href="https://redirect.github.com/vitejs/vite/issues/20968">#20968</a>)
(<a
href="f479cc57c4">f479cc5</a>)</li>
</ul>
<h3>Miscellaneous Chores</h3>
<ul>
<li><strong>deps:</strong> update all non-major dependencies (<a
href="https://redirect.github.com/vitejs/vite/issues/20966">#20966</a>)
(<a
href="6fb41a260b">6fb41a2</a>)</li>
</ul>
<h3>Code Refactoring</h3>
<ul>
<li>use subpath imports for types module reference (<a
href="https://redirect.github.com/vitejs/vite/issues/20921">#20921</a>)
(<a
href="d0094af639">d0094af</a>)</li>
</ul>
<h3>Build System</h3>
<ul>
<li>remove cjs reference in files field (<a
href="https://redirect.github.com/vitejs/vite/issues/20945">#20945</a>)
(<a
href="ef411cee26">ef411ce</a>)</li>
<li>remove hash from built filenames (<a
href="https://redirect.github.com/vitejs/vite/issues/20946">#20946</a>)
(<a
href="a81730754d">a817307</a>)</li>
</ul>
<h2><!-- raw HTML omitted --><a
href="https://github.com/vitejs/vite/compare/v7.1.9...v7.1.10">7.1.10</a>
(2025-10-14)<!-- raw HTML omitted --></h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>css:</strong> avoid duplicate style for server rendered
stylesheet link and client inline style during dev (<a
href="https://redirect.github.com/vitejs/vite/issues/20767">#20767</a>)
(<a
href="3a92bc79b3">3a92bc7</a>)</li>
<li><strong>css:</strong> respect emitAssets when cssCodeSplit=false (<a
href="https://redirect.github.com/vitejs/vite/issues/20883">#20883</a>)
(<a
href="d3e7eeefa9">d3e7eee</a>)</li>
<li><strong>deps:</strong> update all non-major dependencies (<a
href="879de86935">879de86</a>)</li>
<li><strong>deps:</strong> update all non-major dependencies (<a
href="https://redirect.github.com/vitejs/vite/issues/20894">#20894</a>)
(<a
href="3213f90ff0">3213f90</a>)</li>
<li><strong>dev:</strong> allow aliases starting with <code>//</code>
(<a
href="https://redirect.github.com/vitejs/vite/issues/20760">#20760</a>)
(<a
href="b95fa2aa75">b95fa2a</a>)</li>
<li><strong>dev:</strong> remove timestamp query consistently (<a
href="https://redirect.github.com/vitejs/vite/issues/20887">#20887</a>)
(<a
href="6537d15591">6537d15</a>)</li>
<li><strong>esbuild:</strong> inject esbuild helpers correctly for
esbuild 0.25.9+ (<a
href="https://redirect.github.com/vitejs/vite/issues/20906">#20906</a>)
(<a
href="446eb38632">446eb38</a>)</li>
<li>normalize path before calling <code>fileToBuiltUrl</code> (<a
href="https://redirect.github.com/vitejs/vite/issues/20898">#20898</a>)
(<a
href="73b6d243e0">73b6d24</a>)</li>
<li>preserve original sourcemap file field when combining sourcemaps (<a
href="https://redirect.github.com/vitejs/vite/issues/20926">#20926</a>)
(<a
href="c714776aa1">c714776</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>correct <code>WebSocket</code> spelling (<a
href="https://redirect.github.com/vitejs/vite/issues/20890">#20890</a>)
(<a
href="29e98dc3ef">29e98dc</a>)</li>
</ul>
<h3>Miscellaneous Chores</h3>
<ul>
<li><strong>deps:</strong> update rolldown-related dependencies (<a
href="https://redirect.github.com/vitejs/vite/issues/20923">#20923</a>)
(<a
href="a5e3b064fa">a5e3b06</a>)</li>
</ul>
<h2><!-- raw HTML omitted --><a
href="https://github.com/vitejs/vite/compare/v7.1.8...v7.1.9">7.1.9</a>
(2025-10-03)<!-- raw HTML omitted --></h2>
<h3>Reverts</h3>
<ul>
<li><strong>server:</strong> drain stdin when not interactive (<a
href="https://redirect.github.com/vitejs/vite/issues/20885">#20885</a>)
(<a
href="12d72b0538">12d72b0</a>)</li>
</ul>
<h2><!-- raw HTML omitted --><a
href="https://github.com/vitejs/vite/compare/v7.1.7...v7.1.8">7.1.8</a>
(2025-10-02)<!-- raw HTML omitted --></h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>css:</strong> improve url escape characters handling (<a
href="https://redirect.github.com/vitejs/vite/issues/20847">#20847</a>)
(<a
href="24a61a3f54">24a61a3</a>)</li>
<li><strong>deps:</strong> update all non-major dependencies (<a
href="https://redirect.github.com/vitejs/vite/issues/20855">#20855</a>)
(<a
href="788a183afc">788a183</a>)</li>
<li><strong>deps:</strong> update artichokie to 0.4.2 (<a
href="https://redirect.github.com/vitejs/vite/issues/20864">#20864</a>)
(<a
href="e670799e12">e670799</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8b69c9e32c"><code>8b69c9e</code></a>
release: v7.1.11</li>
<li><a
href="f479cc57c4"><code>f479cc5</code></a>
fix(dev): trim trailing slash before <code>server.fs.deny</code> check
(<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20968">#20968</a>)</li>
<li><a
href="6fb41a260b"><code>6fb41a2</code></a>
chore(deps): update all non-major dependencies (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20966">#20966</a>)</li>
<li><a
href="a81730754d"><code>a817307</code></a>
build: remove hash from built filenames (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20946">#20946</a>)</li>
<li><a
href="ef411cee26"><code>ef411ce</code></a>
build: remove cjs reference in files field (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20945">#20945</a>)</li>
<li><a
href="d0094af639"><code>d0094af</code></a>
refactor: use subpath imports for types module reference (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20921">#20921</a>)</li>
<li><a
href="ed4a0dc913"><code>ed4a0dc</code></a>
release: v7.1.10</li>
<li><a
href="c714776aa1"><code>c714776</code></a>
fix: preserve original sourcemap file field when combining sourcemaps
(<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20926">#20926</a>)</li>
<li><a
href="446eb38632"><code>446eb38</code></a>
fix(esbuild): inject esbuild helpers correctly for esbuild 0.25.9+ (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/20906">#20906</a>)</li>
<li><a
href="879de86935"><code>879de86</code></a>
fix(deps): update all non-major dependencies</li>
<li>Additional commits viewable in <a
href="https://github.com/vitejs/vite/commits/v7.1.11/packages/vite">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=vite&package-manager=npm_and_yarn&previous-version=7.1.5&new-version=7.1.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/VictoriaMetrics/VictoriaMetrics/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-28 16:45:07 +02:00
Max Kotliar
70c293467a app/vmselect: Enable log slow query stats with 5s default
Rationale: Having query stats logging enabled by default can greatly
help in investigating incidents.

Currently, it is disabled by default, so many users don’t enable it, and
when issues occur there are no stats available.

After discussion with the team, a 5s threshold was agreed upon as a
reasonable default to capture meaningful slow query data without
excessive logging.
2025-10-28 16:43:04 +02:00
Max Kotliar
90b4c84ad5 docs/changelog: put misplaced changelog entries to tip 2025-10-28 16:37:15 +02:00
Hui Wang
0a194d067a stream aggregation: change the behavior when both `streamAggr.dropInp… (#9877)
stream aggregation: change the behavior when both `streamAggr.dropInput`
and `streamAggr.keepInput` are set to true

fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9724,
making dropInput and keepInput work separately.

<img width="744" height="366" alt="image"
src="https://github.com/user-attachments/assets/7ebb3d1e-872f-4789-8dd1-c4e3f80a84de"
/>

Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-10-28 16:19:33 +02:00
Hui Wang
9ffe965063 vmagent: add /remotewrite-relabel-config and `/remotewrite-url-rela… (#9722)
…bel-config` APIs to return `-promscrape.config` and
`-remoteWrite.relabelConfig` flag values

part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9504

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2025-10-27 13:52:46 +01:00
f41gh7
775ee71fad app/{vminsert,vmstorage}: implement RPC protocol for vmstorage-vminsert communication
This commit adds new RPC protocol for vminsert-vmstorage communication,
it acts in the same way as vmselect-vmstorage RPC.

  It's implemented with new handshake hello methods in a backward
compatible way. Server attempts to parse RPC only if client send new
Hello message, while client fallbacks to the old Hello message if server
closes connection.

This change is need for the new metrics metadata forwarded from vminsert
into vmstorage.

Related issue:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974
Changes extracted from PR:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9487
2025-10-27 09:56:07 +01:00
Artem Fetishev
75b597e727 lib/storage: fix loading nextDayMetricIDs cache from a file for different indexDB generation (#9911)
Previously, if a storage started with curr indexDB different from one
stored in nextDayMetricIDs cache file, the cache would still be loaded
into memory possibly affecting the next day prefill.

This is an unlikely case but it is still possible when:

- A programmer makes a mistake in the code and uses something else
instead of idbCurr.generation.
- Downgrading from pt-index to previous version

Related to #7599 and #8134.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-10-26 19:43:59 +01:00
Artem Fetishev
f3b0f4292d lib/storage: add a unit test for next day idb prefill (#9906)
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2025-10-25 18:16:32 +02:00
Roman Khavronenko
5fa87af6be app/vmalert/datasource: explicitly check response type during replay (#9868)
This change validates that QueryRange() method for prometheus datasource
receives response with `matrix` data type. It would throw an error
otherwise.

The change is needed to avoid confusions like in
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9779.

The fix is not elegant, but it should be simple from code support
perspective. So each API has its own parsing function. Even if some
processing code is repeated.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Andrii Chubatiuk <achubatiuk@victoriametrics.com>
2025-10-24 16:53:01 +02:00
Roman Khavronenko
b9a3369254 app/vmalert: preserve html formatting in annotations (#9892)
The change is purely visual. It preserves html formatting in annotations
when rendering them on rule or alert details page. The css change is
clumsy, but demonstrates the point.

--------

Before:
<img width="1179" height="297" alt="image"
src="https://github.com/user-attachments/assets/c30e2222-7b0f-4f28-bf6e-c546cc5bb2fc"
/>


After:
<img width="1196" height="321" alt="image"
src="https://github.com/user-attachments/assets/2c6d9530-7ae9-47fd-b4ba-87fe6f44c625"
/>


-------

p.s. @AndrewChubatiuk I sure know that you could make this change in
more elegant or stylish way than I did. Please do so, if you want.
Please port this change to vmui too. Thanks!

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2025-10-24 16:39:09 +02:00
Max Kotliar
159f990c8e docs/changelog: fix typo in update note 2025-10-24 13:27:49 +03:00
Max Kotliar
b5c3e93f7e docs/changelog: fix typo in LTS link 2025-10-24 13:17:11 +03:00
Zakhar Bessarab
7a6139416e lib/backup/s3remote: properly extend http client if it is present
fb1344b5 replaced an HTTP client unconditionally which overrides
configurations which were loaded by AWS SDK. This leads to AWS env
variables to being overwritten.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9858

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9898
2025-10-24 10:30:59 +02:00
Zhu Jiekun
d6bbfaf164 httpserver: add http2 option
Currently, the `httpserver` disabled HTTP/2 support by design, because:
```
// Disable http/2, since it doesn't give any advantages for VictoriaMetrics services.
```

As VictoriaLogs and VictoriaTraces rely on `httpserver`, in order to
support gRPC over HTTP/2, an option to support HTTP/2 is required.


Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/9881
2025-10-24 10:30:27 +02:00
Nikolay
11f488d8ff lib/streamaggr: concurrently push timeseries to aggregators
Previously all timeseries pushed into aggregators were added
sequentially. It could cause delays on data ingestion and it was not
possible to use all available.

 This commit adds concurrency based on available CPU cores.

Also, it adds new generic Buffer and BufferPool into slicesutil.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9878
2025-10-24 10:29:48 +02:00
Aliaksandr Valialkin
d0f8773f4b app/vmauth: log the real cause for timed out requests to vmauth
Previously a misleading random error could be logged for canceled and/or timed out requests to vmauth.
Consistently log the request timeout error for timed out requests.

While at it, do not log errors for requests canceled by the remote client, since such logs aren't actionable
and just pollute error logs generated by vmauth.
2025-10-21 15:59:05 +02:00
Max Kotliar
7ec6f28a7c docs/changelog: add links to related PRs. 2025-10-21 16:11:21 +03:00
Max Kotliar
46ef5460a9 docs: run make docs-update-flags
sync flags with actual values in binaries
2025-10-21 16:04:20 +03:00
Max Kotliar
1a68d4ac8a docs/CHANGELOG.md: update changelog with LTS release notes; bump LTS versions 2025-10-21 11:23:01 +03:00
Max Kotliar
be7039429d docs: bump latest version in docs 2025-10-21 10:58:47 +03:00
Max Kotliar
76e5cd2cd4 deployment/docker: bump version 2025-10-21 10:55:35 +03:00
475 changed files with 49674 additions and 16209 deletions

View File

@@ -5,7 +5,7 @@ body:
- type: textarea
id: describe-the-component
attributes:
label: Is your question request related to a specific component?
label: Is your question related to a specific component?
placeholder: |
VictoriaMetrics, vmagent, vmalert, vmui, etc...
validations:

48
.github/scripts/lint-changelog-tip.sh vendored Executable file
View File

@@ -0,0 +1,48 @@
#!/usr/bin/env sh
set -e
CHANGELOG_FILE="docs/victoriametrics/changelog/CHANGELOG.md"
GITHUB_BASE_REF=${GITHUB_BASE_REF:-"master"}
GIT_REMOTE=${GIT_REMOTE:-"origin"}
git diff "${GIT_REMOTE}/${GITHUB_BASE_REF}"...HEAD -- $CHANGELOG_FILE > diff.txt
if ! grep -q "^+" diff.txt; then
echo "No additions in CHANGELOG.md"
exit 0
fi
ADDED_LINES=$(grep "^+\S" diff.txt | sed 's/^+//')
START_TIP=$(grep -n "^## tip" "$CHANGELOG_FILE" | head -1 | cut -d: -f1)
if [ -z "$START_TIP" ]; then
echo "ERROR: ${CHANGELOG_FILE} does not contain a ## tip section"
exit 1
fi
END_TIP=$(awk "NR>$START_TIP && /^## / {print NR; exit}" "${CHANGELOG_FILE}")
if [ -z "$END_TIP" ]; then
END_TIP=$(wc -l < "$CHANGELOG_FILE")
fi
BAD=0
while IFS= read -r line; do
# Grep exact line inside the file and get line numbers
MATCHES=$(grep -n -F "$line" "$CHANGELOG_FILE" | cut -d: -f1)
for m in $MATCHES; do
if [ "$m" -lt "$START_TIP" ] || [ "$m" -gt "$END_TIP" ]; then
echo "'$line' on line ${m} is outside ## tip section (lines ${START_TIP}-${END_TIP})"
BAD=1
fi
done
done << EOF
$ADDED_LINES
EOF
if [ "$BAD" -ne 0 ]; then
echo "CHANGELOG modifications must be placed inside the ## tip section."
exit 1
fi
echo "CHANGELOG modifications are valid."

View File

@@ -47,6 +47,8 @@ jobs:
arch: arm
- os: linux
arch: ppc64le
- os: linux
arch: s390x
- os: darwin
arch: amd64
- os: darwin
@@ -59,7 +61,7 @@ jobs:
arch: amd64
steps:
- name: Code checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Setup Go
id: go

19
.github/workflows/changelog-linter.yml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: 'changelog-linter'
on:
pull_request:
paths:
- "docs/victoriametrics/changelog/CHANGELOG.md"
jobs:
tip-lint:
runs-on: 'ubuntu-latest'
steps:
- uses: 'actions/checkout@v6'
with:
# needed for proper diff
fetch-depth: 0
- name: 'Validate that changelog changes are under ## tip'
run: |
GITHUB_BASE_REF=${{ github.base_ref }} ./.github/scripts/lint-changelog-tip.sh

View File

@@ -8,7 +8,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0 # we need full history for commit verification

View File

@@ -29,7 +29,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Set up Go
id: go

View File

@@ -16,12 +16,12 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
path: __vm
- name: Checkout private code
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
repository: VictoriaMetrics/vmdocs
token: ${{ secrets.VM_BOT_GH_TOKEN }}

View File

@@ -32,7 +32,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Setup Go
id: go
@@ -71,7 +71,7 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Setup Go
id: go
@@ -97,7 +97,7 @@ jobs:
steps:
- name: Code checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Setup Go
id: go

View File

@@ -32,7 +32,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Setup Node
uses: actions/setup-node@v6

View File

@@ -17,7 +17,7 @@ EXTRA_GO_BUILD_TAGS ?=
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(DATEINFO_TAG)-$(BUILDINFO_TAG)'
TAR_OWNERSHIP ?= --owner=1000 --group=1000
GOLANGCI_LINT_VERSION := 2.4.0
GOLANGCI_LINT_VERSION := 2.7.2
.PHONY: $(MAKECMDGOALS)
@@ -125,6 +125,15 @@ vmutils-linux-ppc64le: \
vmrestore-linux-ppc64le \
vmctl-linux-ppc64le
vmutils-linux-s390x: \
vmagent-linux-s390x \
vmalert-linux-s390x \
vmalert-tool-linux-s390x \
vmauth-linux-s390x \
vmbackup-linux-s390x \
vmrestore-linux-s390x \
vmctl-linux-s390x
vmutils-darwin-amd64: \
vmagent-darwin-amd64 \
vmalert-darwin-amd64 \
@@ -257,6 +266,7 @@ release-victoria-metrics: \
release-victoria-metrics-linux-amd64 \
release-victoria-metrics-linux-arm \
release-victoria-metrics-linux-arm64 \
release-victoria-metrics-linux-s390x \
release-victoria-metrics-darwin-amd64 \
release-victoria-metrics-darwin-arm64 \
release-victoria-metrics-freebsd-amd64 \
@@ -275,6 +285,9 @@ release-victoria-metrics-linux-arm:
release-victoria-metrics-linux-arm64:
GOOS=linux GOARCH=arm64 $(MAKE) release-victoria-metrics-goos-goarch
release-victoria-metrics-linux-s390x:
GOOS=linux GOARCH=s390x $(MAKE) release-victoria-metrics-goos-goarch
release-victoria-metrics-darwin-amd64:
GOOS=darwin GOARCH=amd64 $(MAKE) release-victoria-metrics-goos-goarch
@@ -314,6 +327,7 @@ release-vmutils: \
release-vmutils-linux-amd64 \
release-vmutils-linux-arm64 \
release-vmutils-linux-arm \
release-vmutils-linux-s390x \
release-vmutils-darwin-amd64 \
release-vmutils-darwin-arm64 \
release-vmutils-freebsd-amd64 \
@@ -332,6 +346,9 @@ release-vmutils-linux-arm64:
release-vmutils-linux-arm:
GOOS=linux GOARCH=arm $(MAKE) release-vmutils-goos-goarch
release-vmutils-linux-s390x:
GOOS=linux GOARCH=s390x $(MAKE) release-vmutils-goos-goarch
release-vmutils-darwin-amd64:
GOOS=darwin GOARCH=amd64 $(MAKE) release-vmutils-goos-goarch
@@ -418,7 +435,7 @@ release-vmutils-windows-goarch: \
vmctl-windows-$(GOARCH)-prod.exe
pprof-cpu:
go tool pprof -trim_path=github.com/VictoriaMetrics/VictoriaMetrics@ $(PPROF_FILE)
go tool pprof -trim_path=github.com/VictoriaMetrics/VictoriaMetrics $(PPROF_FILE)
fmt:
gofmt -l -w -s ./lib
@@ -454,7 +471,23 @@ integration-test:
apptest:
$(MAKE) victoria-metrics vmagent vmalert vmauth vmctl vmbackup vmrestore
go test ./apptest/... -skip="^TestCluster.*"
go test ./apptest/... -skip="^Test(Cluster|Legacy).*"
integration-test-legacy: victoria-metrics vmbackup vmrestore
OS=$$(uname | tr '[:upper:]' '[:lower:]'); \
ARCH=$$(uname -m | tr '[:upper:]' '[:lower:]' | sed 's/x86_64/amd64/'); \
VERSION=v1.132.0; \
VMSINGLE=victoria-metrics-$${OS}-$${ARCH}-$${VERSION}.tar.gz; \
VMCLUSTER=victoria-metrics-$${OS}-$${ARCH}-$${VERSION}-cluster.tar.gz; \
URL=https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/$${VERSION}; \
DIR=/tmp/$${VERSION}; \
test -d $${DIR} || (mkdir $${DIR} && \
curl --output-dir /tmp -LO $${URL}/$${VMSINGLE} && tar xzf /tmp/$${VMSINGLE} -C $${DIR} && \
curl --output-dir /tmp -LO $${URL}/$${VMCLUSTER} && tar xzf /tmp/$${VMCLUSTER} -C $${DIR} \
); \
VM_LEGACY_VMSINGLE_PATH=$${DIR}/victoria-metrics-prod \
VM_LEGACY_VMSTORAGE_PATH=$${DIR}/vmstorage-prod \
go test ./apptest/tests -run="^TestLegacySingle.*"
benchmark:
GOEXPERIMENT=synctest go test -bench=. ./lib/...
@@ -483,7 +516,8 @@ app-local-windows-goarch:
CGO_ENABLED=0 GOOS=windows GOARCH=$(GOARCH) go build $(RACE) -ldflags "$(GO_BUILDINFO)" -tags "$(EXTRA_GO_BUILD_TAGS)" -o bin/$(APP_NAME)-windows-$(GOARCH)$(RACE).exe $(PKG_PREFIX)/app/$(APP_NAME)
quicktemplate-gen: install-qtc
qtc
qtc -dir=lib
qtc -dir=app
install-qtc:
which qtc || go install github.com/valyala/quicktemplate/qtc@latest

View File

@@ -27,6 +27,9 @@ victoria-metrics-linux-ppc64le-prod:
victoria-metrics-linux-386-prod:
APP_NAME=victoria-metrics $(MAKE) app-via-docker-linux-386
victoria-metrics-linux-s390x-prod:
APP_NAME=victoria-metrics $(MAKE) app-via-docker-linux-s390x
victoria-metrics-darwin-amd64-prod:
APP_NAME=victoria-metrics $(MAKE) app-via-docker-darwin-amd64

View File

@@ -10,9 +10,11 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prommetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeserieslimits"
)
@@ -48,6 +50,7 @@ func selfScraper(scrapeInterval time.Duration) {
var bb bytesutil.ByteBuffer
var rows prometheus.Rows
var metadataRows prometheus.MetadataRows
var mrs []storage.MetricRow
var labels []prompb.Label
t := time.NewTicker(scrapeInterval)
@@ -57,8 +60,12 @@ func selfScraper(scrapeInterval time.Duration) {
appmetrics.WritePrometheusMetrics(&bb)
s := bytesutil.ToUnsafeString(bb.B)
rows.Reset()
// VictoriaMetrics components don't expose metadata yet, only need to parse samples
rows.UnmarshalWithErrLogger(s, nil)
// Parse metrics and optionally metadata when enabled
if prommetadata.IsEnabled() {
rows, metadataRows = prometheus.UnmarshalWithMetadata(rows, metadataRows, s, nil)
} else {
rows.UnmarshalWithErrLogger(s, nil)
}
mrs = mrs[:0]
for i := range rows.Rows {
r := &rows.Rows[i]
@@ -91,6 +98,19 @@ func selfScraper(scrapeInterval time.Duration) {
if err := vmstorage.AddRows(mrs); err != nil {
logger.Errorf("cannot store self-scraped metrics: %s", err)
}
if len(metadataRows.Rows) > 0 {
mms := make([]metricsmetadata.Row, 0, len(metadataRows.Rows))
for _, mm := range metadataRows.Rows {
mms = append(mms, metricsmetadata.Row{
MetricFamilyName: bytesutil.ToUnsafeBytes(mm.Metric),
Help: bytesutil.ToUnsafeBytes(mm.Help),
Type: mm.Type,
})
}
if err := vmstorage.AddMetadataRows(mms); err != nil {
logger.Errorf("cannot store self-scraped metrics metadata: %s", err)
}
}
}
for {
select {

View File

@@ -27,6 +27,9 @@ vmagent-linux-ppc64le-prod:
vmagent-linux-386-prod:
APP_NAME=vmagent $(MAKE) app-via-docker-linux-386
vmagent-linux-s390x-prod:
APP_NAME=vmagent $(MAKE) app-via-docker-linux-s390x
vmagent-darwin-amd64-prod:
APP_NAME=vmagent $(MAKE) app-via-docker-darwin-amd64

View File

@@ -27,6 +27,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/zabbixconnector"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
@@ -74,7 +75,7 @@ var (
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flagutil.NewPassword("configAuthKey", "Authorization key for accessing /config page. It must be passed via authKey query arg. It overrides -httpAuth.*")
configAuthKey = flagutil.NewPassword("configAuthKey", "Authorization key for accessing /config and /remotewrite-.*-config pages. It must be passed via authKey query arg. It overrides -httpAuth.*")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed via authKey query arg. It overrides -httpAuth.*")
dryRun = flag.Bool("dryRun", false, "Whether to check config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig, -remoteWrite.streamAggr.config . "+
@@ -111,7 +112,6 @@ func main() {
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
flagutil.ApplySecretFlags()
remotewrite.InitSecretFlags()
buildinfo.Init()
logger.Init()
@@ -253,6 +253,8 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
{"metric-relabel-debug", "debug metric relabeling"},
{"api/v1/targets", "advanced information about discovered targets in JSON format"},
{"config", "-promscrape.config contents"},
{"remotewrite-relabel-config", "-remoteWrite.relabelConfig contents"},
{"remotewrite-url-relabel-config", "-remoteWrite.urlRelabelConfig contents"},
{"metrics", "available service metrics"},
{"flags", "command-line flags"},
{"-/reload", "reload configuration"},
@@ -349,6 +351,17 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
firehose.WriteSuccessResponse(w, r)
return true
case "/zabbixconnector/api/v1/history":
zabbixconnectorHistoryRequests.Inc()
if err := zabbixconnector.InsertHandlerForHTTP(nil, r); err != nil {
zabbixconnectorHistoryErrors.Inc()
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest)
fmt.Fprintf(w, `{"error":%q}`, err.Error())
return true
}
w.WriteHeader(http.StatusOK)
return true
case "/newrelic":
newrelicCheckRequest.Inc()
w.Header().Set("Content-Type", "application/json")
@@ -478,6 +491,42 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
promscrape.WriteConfigData(&bb)
fmt.Fprintf(w, `{"status":"success","data":{"yaml":%s}}`, stringsutil.JSONString(string(bb.B)))
return true
case "/remotewrite-relabel-config":
if !httpserver.CheckAuthFlag(w, r, configAuthKey) {
return true
}
remoteWriteRelabelConfigRequests.Inc()
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
remotewrite.WriteRelabelConfigData(w)
return true
case "/api/v1/status/remotewrite-relabel-config":
if !httpserver.CheckAuthFlag(w, r, configAuthKey) {
return true
}
remoteWriteStatusRelabelConfigRequests.Inc()
w.Header().Set("Content-Type", "application/json")
var bb bytesutil.ByteBuffer
remotewrite.WriteRelabelConfigData(&bb)
fmt.Fprintf(w, `{"status":"success","data":{"yaml":%s}}`, stringsutil.JSONString(string(bb.B)))
return true
case "/remotewrite-url-relabel-config":
if !httpserver.CheckAuthFlag(w, r, configAuthKey) {
return true
}
remoteWriteURLRelabelConfigRequests.Inc()
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
remotewrite.WriteURLRelabelConfigData(w)
return true
case "/api/v1/status/remotewrite-url-relabel-config":
if !httpserver.CheckAuthFlag(w, r, configAuthKey) {
return true
}
remoteWriteStatusURLRelabelConfigRequests.Inc()
w.Header().Set("Content-Type", "application/json")
var bb bytesutil.ByteBuffer
remotewrite.WriteURLRelabelConfigData(&bb)
fmt.Fprintf(w, `{"status":"success","data":{"yaml":%s}}`, stringsutil.JSONString(string(bb.B)))
return true
case "/prometheus/-/reload", "/-/reload":
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey) {
return true
@@ -607,6 +656,17 @@ func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path stri
}
firehose.WriteSuccessResponse(w, r)
return true
case "zabbixconnector/api/v1/history":
zabbixconnectorHistoryRequests.Inc()
if err := zabbixconnector.InsertHandlerForHTTP(at, r); err != nil {
zabbixconnectorHistoryErrors.Inc()
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest)
fmt.Fprintf(w, `{"error":%q}`, err.Error())
return true
}
w.WriteHeader(http.StatusOK)
return true
case "newrelic":
newrelicCheckRequest.Inc()
w.Header().Set("Content-Type", "application/json")
@@ -728,6 +788,9 @@ var (
opentelemetryPushRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/opentelemetry/v1/metrics", protocol="opentelemetry"}`)
opentelemetryPushErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/opentelemetry/v1/metrics", protocol="opentelemetry"}`)
zabbixconnectorHistoryRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/zabbixconnector/api/v1/history", protocol="zabbixconnector"}`)
zabbixconnectorHistoryErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/zabbixconnector/api/v1/history", protocol="zabbixconnector"}`)
newrelicWriteRequests = metrics.NewCounter(`vm_http_requests_total{path="/newrelic/infra/v2/metrics/events/bulk", protocol="newrelic"}`)
newrelicWriteErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/newrelic/infra/v2/metrics/events/bulk", protocol="newrelic"}`)
@@ -748,6 +811,12 @@ var (
promscrapeConfigRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/config"}`)
promscrapeStatusConfigRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/status/config"}`)
remoteWriteRelabelConfigRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/remotewrite-relabel-config"}`)
remoteWriteStatusRelabelConfigRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/status/remotewrite-relabel-config"}`)
remoteWriteURLRelabelConfigRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/remotewrite-url-relabel-config"}`)
remoteWriteStatusURLRelabelConfigRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/status/remotewrite-url-relabel-config"}`)
promscrapeConfigReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`)
)

View File

@@ -78,7 +78,7 @@ func insertRows(at *auth.Token, rows []newrelic.Row, extraLabels []prompb.Label)
if !remotewrite.TryPush(at, &ctx.WriteRequest) {
return remotewrite.ErrQueueFullHTTPRetry
}
rowsInserted.Add(len(rows))
rowsInserted.Add(samplesCount)
if at != nil {
rowsTenantInserted.Get(at).Add(samplesCount)
}

View File

@@ -25,7 +25,7 @@ var (
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="opentelemetry"}`)
)
// InsertHandler processes metrics from given reader.
// InsertHandlerForReader processes metrics from given reader.
func InsertHandlerForReader(at *auth.Token, r io.Reader, encoding string) error {
return stream.ParseStream(r, encoding, nil, func(tss []prompb.TimeSeries, mms []prompb.MetricMetadata) error {
return insertRows(at, tss, mms, nil)

View File

@@ -15,7 +15,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/awsapi"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -554,9 +553,9 @@ func getRetryDuration(retryAfterDuration, retryDuration, maxRetryDuration time.D
// For more details, see: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9417
func repackBlockFromZstdToSnappy(zstdBlock []byte) ([]byte, error) {
plainBlock := make([]byte, 0, len(zstdBlock)*2)
plainBlock, err := zstd.Decompress(plainBlock, zstdBlock)
plainBlock, err := encoding.DecompressZSTD(plainBlock, zstdBlock)
if err != nil {
return nil, fmt.Errorf("zstd: decompress: %s", err)
return nil, err
}
return snappy.Encode(nil, plainBlock), nil

View File

@@ -3,15 +3,18 @@ package remotewrite
import (
"flag"
"fmt"
"io"
"strconv"
"strings"
"sync"
"sync/atomic"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"go.yaml.in/yaml/v3"
"github.com/VictoriaMetrics/metrics"
)
@@ -32,9 +35,12 @@ var (
"See https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels")
)
var labelsGlobal []prompb.Label
var (
labelsGlobal []prompb.Label
remoteWriteRelabelConfigData atomic.Pointer[[]byte]
remoteWriteURLRelabelConfigData atomic.Pointer[[]interface{}]
relabelConfigReloads *metrics.Counter
relabelConfigReloadErrors *metrics.Counter
relabelConfigSuccess *metrics.Gauge
@@ -67,6 +73,42 @@ func initRelabelConfigs() {
}
}
// WriteRelabelConfigData writes -remoteWrite.relabelConfig contents to w
func WriteRelabelConfigData(w io.Writer) {
p := remoteWriteRelabelConfigData.Load()
if p == nil {
// Nothing to write to w
return
}
_, _ = w.Write(*p)
}
// WriteURLRelabelConfigData writes -remoteWrite.urlRelabelConfig contents to w
func WriteURLRelabelConfigData(w io.Writer) {
p := remoteWriteURLRelabelConfigData.Load()
if p == nil {
// Nothing to write to w
return
}
type urlRelabelCfg struct {
Url string `yaml:"url"`
RelabelConfig interface{} `yaml:"relabel_config"`
}
var cs []urlRelabelCfg
for i, url := range *remoteWriteURLs {
cfgData := (*p)[i]
if !*showRemoteWriteURL {
url = fmt.Sprintf("%d:secret-url", i+1)
}
cs = append(cs, urlRelabelCfg{
Url: url,
RelabelConfig: cfgData,
})
}
d, _ := yaml.Marshal(cs)
_, _ = w.Write(d)
}
func reloadRelabelConfigs() {
rcs := allRelabelConfigs.Load()
if !rcs.isSet() {
@@ -90,28 +132,42 @@ func reloadRelabelConfigs() {
func loadRelabelConfigs() (*relabelConfigs, error) {
var rcs relabelConfigs
if *relabelConfigPathGlobal != "" {
global, err := promrelabel.LoadRelabelConfigs(*relabelConfigPathGlobal)
global, rawCfg, err := promrelabel.LoadRelabelConfigs(*relabelConfigPathGlobal)
if err != nil {
return nil, fmt.Errorf("cannot load -remoteWrite.relabelConfig=%q: %w", *relabelConfigPathGlobal, err)
}
remoteWriteRelabelConfigData.Store(&rawCfg)
rcs.global = global
}
if len(*relabelConfigPaths) > len(*remoteWriteURLs) {
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url args: %d",
len(*relabelConfigPaths), (len(*remoteWriteURLs)))
}
var urlRelabelCfgs []interface{}
rcs.perURL = make([]*promrelabel.ParsedConfigs, len(*remoteWriteURLs))
for i, path := range *relabelConfigPaths {
if len(path) == 0 {
// Skip empty relabel config.
urlRelabelCfgs = append(urlRelabelCfgs, nil)
continue
}
prc, err := promrelabel.LoadRelabelConfigs(path)
prc, rawCfg, err := promrelabel.LoadRelabelConfigs(path)
if err != nil {
return nil, fmt.Errorf("cannot load relabel configs from -remoteWrite.urlRelabelConfig=%q: %w", path, err)
}
rcs.perURL[i] = prc
var parsedCfg interface{}
_ = yaml.Unmarshal(rawCfg, &parsedCfg)
urlRelabelCfgs = append(urlRelabelCfgs, parsedCfg)
}
if len(*remoteWriteURLs) > len(*relabelConfigPaths) {
// fill the urlRelabelCfgs with empty relabel configs if not set
for i := len(*relabelConfigPaths); i < len(*remoteWriteURLs); i++ {
urlRelabelCfgs = append(urlRelabelCfgs, nil)
}
}
remoteWriteURLRelabelConfigData.Store(&urlRelabelCfgs)
return &rcs, nil
}

View File

@@ -27,6 +27,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeserieslimits"
"github.com/VictoriaMetrics/metrics"
@@ -485,6 +486,9 @@ func tryPush(at *auth.Token, wr *prompb.WriteRequest, forceDropSamplesOnFailure
matchIdxs.B = sas.Push(tssBlock, matchIdxs.B)
if !*streamAggrGlobalKeepInput {
tssBlock = dropAggregatedSeries(tssBlock, matchIdxs.B, *streamAggrGlobalDropInput)
} else if *streamAggrGlobalDropInput {
// if both keep_input and drop_input are true, we keep only the aggregated series
tssBlock = dropUnaggregatedSeries(tssBlock, matchIdxs.B)
}
matchIdxsPool.Put(matchIdxs)
}
@@ -988,7 +992,17 @@ func (rwctx *remoteWriteCtx) TryPushTimeSeries(tss []prompb.TimeSeries, forceDro
tss = append(*v, tss...)
}
tss = dropAggregatedSeries(tss, matchIdxs.B, rwctx.streamAggrDropInput)
} else if rwctx.streamAggrDropInput {
// if both keep_input and drop_input are true, we keep only the aggregated series
if rctx == nil {
rctx = getRelabelCtx()
// Make a copy of tss before dropping aggregated series
v = tssPool.Get().(*[]prompb.TimeSeries)
tss = append(*v, tss...)
}
tss = dropUnaggregatedSeries(tss, matchIdxs.B)
}
matchIdxsPool.Put(matchIdxs)
}
if rwctx.deduplicator != nil {
@@ -1011,9 +1025,10 @@ func (rwctx *remoteWriteCtx) TryPushTimeSeries(tss []prompb.TimeSeries, forceDro
return false
}
var matchIdxsPool bytesutil.ByteBufferPool
var matchIdxsPool slicesutil.BufferPool[uint32]
func dropAggregatedSeries(src []prompb.TimeSeries, matchIdxs []byte, dropInput bool) []prompb.TimeSeries {
// dropAggregatedSeries drops matched series, also the unmatched if dropInput is true.
func dropAggregatedSeries(src []prompb.TimeSeries, matchIdxs []uint32, dropInput bool) []prompb.TimeSeries {
dst := src[:0]
if !dropInput {
for i, match := range matchIdxs {
@@ -1028,6 +1043,20 @@ func dropAggregatedSeries(src []prompb.TimeSeries, matchIdxs []byte, dropInput b
return dst
}
// dropUnaggregatedSeries drops unmatched series.
func dropUnaggregatedSeries(src []prompb.TimeSeries, matchIdxs []uint32) []prompb.TimeSeries {
dst := src[:0]
for i, match := range matchIdxs {
if match == 0 {
continue
}
dst = append(dst, src[i])
}
tail := src[len(dst):]
clear(tail)
return dst
}
func (rwctx *remoteWriteCtx) pushInternalTrackDropped(tss []prompb.TimeSeries) {
if rwctx.tryPushTimeSeriesInternal(tss) {
return

View File

@@ -10,6 +10,8 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/consistenthash"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
@@ -57,8 +59,8 @@ func TestGetLabelsHash_Distribution(t *testing.T) {
f(10)
}
func TestRemoteWriteContext_TryPush_ImmutableTimeseries(t *testing.T) {
f := func(streamAggrConfig, relabelConfig string, enableWindows bool, dedupInterval time.Duration, keepInput, dropInput bool, input string) {
func TestRemoteWriteContext_TryPushTimeSeries(t *testing.T) {
f := func(streamAggrConfig, relabelConfig string, enableWindows bool, dedupInterval time.Duration, keepInput, dropInput bool, input string, expectedRowsPushedAfterRelabel, expectedPushedSample int) {
t.Helper()
perURLRelabel, err := promrelabel.ParseRelabelConfigsData([]byte(relabelConfig))
if err != nil {
@@ -71,10 +73,16 @@ func TestRemoteWriteContext_TryPush_ImmutableTimeseries(t *testing.T) {
}
allRelabelConfigs.Store(rcs)
path := "fast-queue-write-test"
fs.MustRemoveDir(path)
fq := persistentqueue.MustOpenFastQueue(path, "test", 100, 0, false)
defer fs.MustRemoveDir(path)
defer fq.MustClose()
pss := make([]*pendingSeries, 1)
isVMProto := &atomic.Bool{}
isVMProto.Store(true)
pss[0] = newPendingSeries(nil, isVMProto, 0, 100)
pss[0] = newPendingSeries(fq, isVMProto, 0, 100)
rwctx := &remoteWriteCtx{
idx: 0,
streamAggrKeepInput: keepInput,
@@ -83,6 +91,8 @@ func TestRemoteWriteContext_TryPush_ImmutableTimeseries(t *testing.T) {
rowsPushedAfterRelabel: metrics.GetOrCreateCounter(`foo`),
rowsDroppedByRelabel: metrics.GetOrCreateCounter(`bar`),
}
defer metrics.UnregisterAllMetrics()
if dedupInterval > 0 {
rwctx.deduplicator = streamaggr.NewDeduplicator(nil, enableWindows, dedupInterval, nil, "dedup-global")
}
@@ -104,23 +114,27 @@ func TestRemoteWriteContext_TryPush_ImmutableTimeseries(t *testing.T) {
inputTss := prometheus.MustParsePromMetrics(input, offsetMsecs)
expectedTss := make([]prompb.TimeSeries, len(inputTss))
// copy inputTss to make sure it is not mutated during TryPush call
// check inputTss is not modified after TryPushTimeSeries
copy(expectedTss, inputTss)
if !rwctx.TryPushTimeSeries(inputTss, false) {
t.Fatalf("cannot push samples to rwctx")
}
if int(rwctx.rowsPushedAfterRelabel.Get()) != expectedRowsPushedAfterRelabel {
t.Fatalf("unexpected number of rows after relabel; got %d; want %d", rwctx.rowsPushedAfterRelabel.Get(), expectedRowsPushedAfterRelabel)
}
if len(pss[0].wr.tss) != expectedPushedSample {
t.Fatalf("unexpected number of pushed samples; got %d; want %d", len(pss[0].wr.tss), expectedPushedSample)
}
if !reflect.DeepEqual(expectedTss, inputTss) {
t.Fatalf("unexpected samples;\ngot\n%v\nwant\n%v", inputTss, expectedTss)
}
}
f(`
- interval: 1m
outputs: [sum_samples]
- interval: 2m
outputs: [count_series]
`, `
// relabeling
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
@@ -129,53 +143,66 @@ metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`)
`, 2, 2)
// relabeling + aggregation
f(`
- match: '{env="dev"}'
interval: 1m
outputs: [sum_samples]
`, `
- action: keep
source_labels: [env]
regex: ".*"
`, false, 0, false, false, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`, 4, 2)
// aggregation + keepInput
f(`
- match: '{env="dev"}'
interval: 1m
outputs: [sum_samples]
`, ``, false, 0, true, false, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`, 4, 4)
// aggregation + dropInput
f(`
- match: '{env="dev"}'
interval: 1m
outputs: [sum_samples]
`, ``, false, 0, false, true, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`, 4, 0)
// aggregation + keepInput + dropInput
f(`
- match: '{env="dev"}'
interval: 1m
outputs: [sum_samples]
`, ``, false, 0, true, true, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="bar"} 25
`, 3, 1)
// aggregation + deduplication
f(``, ``, true, time.Hour, false, false, `
metric{env="dev"} 10
metric{env="foo"} 20
metric{env="dev"} 15
metric{env="foo"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, true, time.Hour, false, false, `
metric{env="dev"} 10
metric{env="bar"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, true, time.Hour, true, false, `
metric{env="test"} 10
metric{env="dev"} 20
metric{env="foo"} 15
metric{env="dev"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, true, time.Hour, false, true, `
metric{env="foo"} 10
metric{env="dev"} 20
metric{env="foo"} 15
metric{env="dev"} 25
`)
f(``, `
- action: keep
source_labels: [env]
regex: "dev"
`, true, time.Hour, true, true, `
metric{env="dev"} 10
metric{env="test"} 20
metric{env="dev"} 15
metric{env="bar"} 25
`)
`, 4, 0)
}
func TestShardAmountRemoteWriteCtx(t *testing.T) {

View File

@@ -18,12 +18,12 @@ var (
streamAggrGlobalConfig = flag.String("streamAggr.config", "", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -streamAggr.keepInput, -streamAggr.dropInput and -streamAggr.dedupInterval")
streamAggrGlobalKeepInput = flag.Bool("streamAggr.keepInput", false, "Whether to keep all the input samples after the aggregation "+
"with -streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to remote storages write. See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDropInput = flag.Bool("streamAggr.dropInput", false, "Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config. By default, only aggregates samples are dropped, while the remaining samples "+
"are written to remote storages write. See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalKeepInput = flag.Bool("streamAggr.keepInput", false, "Whether to keep input samples that match any rule in "+
"-streamAggr.config. By default, matched raw samples are aggregated and dropped, while unmatched samples "+
"are written to the remote storage. See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDropInput = flag.Bool("streamAggr.dropInput", false, "Whether to drop input samples that not matching any rule in "+
"-streamAggr.config. By default, only matched raw samples are dropped, while unmatched samples "+
"are written to the remote storage. See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrGlobalDedupInterval = flag.Duration("streamAggr.dedupInterval", 0, "Input samples are de-duplicated with this interval on "+
"aggregator before optional aggregation with -streamAggr.config . "+
"See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
@@ -43,11 +43,11 @@ var (
streamAggrConfig = flagutil.NewArrayString("remoteWrite.streamAggr.config", "Optional path to file with stream aggregation config for the corresponding -remoteWrite.url. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -remoteWrite.streamAggr.keepInput, -remoteWrite.streamAggr.dropInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrDropInput = flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput", "Whether to drop all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. By default, only aggregates samples are dropped, while the remaining samples "+
streamAggrDropInput = flagutil.NewArrayBool("remoteWrite.streamAggr.dropInput", "Whether to drop input samples that not matching any rule in "+
"the corresponding -remoteWrite.streamAggr.config. By default, only matched raw samples are dropped, while unmatched samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep all the input samples after the aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. By default, only aggregates samples are dropped, while the remaining samples "+
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep input samples that match any rule in "+
"the corresponding -remoteWrite.streamAggr.config. By default, matched raw samples are aggregated and dropped, while unmatched samples "+
"are written to the corresponding -remoteWrite.url . See also -remoteWrite.streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDedupInterval = flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval", 0, "Input samples are de-duplicated with this interval before optional aggregation "+
"with -remoteWrite.streamAggr.config at the corresponding -remoteWrite.url. See also -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")

View File

@@ -0,0 +1,80 @@
package zabbixconnector
import (
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/zabbixconnector"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/zabbixconnector/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="zabbixconnector"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="zabbixconnector"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="zabbixconnector"}`)
)
// InsertHandlerForHTTP processes remote write for ZabbixConnector POST /zabbixconnector/v1/history request.
func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
extraLabels, err := protoparserutil.GetExtraLabels(req)
if err != nil {
return err
}
encoding := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, encoding, func(rows []zabbixconnector.Row) error {
return insertRows(at, rows, extraLabels)
})
}
func insertRows(at *auth.Token, rows []zabbixconnector.Row, extraLabels []prompb.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
rowsTotal := len(rows)
tssDst := ctx.WriteRequest.Timeseries[:0]
labels := ctx.Labels[:0]
samples := ctx.Samples[:0]
for i := range rows {
r := &rows[i]
labelsLen := len(labels)
for j := range r.Tags {
tag := &r.Tags[j]
labels = append(labels, prompb.Label{
Name: bytesutil.ToUnsafeString(tag.Key),
Value: bytesutil.ToUnsafeString(tag.Value),
})
}
labels = append(labels, extraLabels...)
samplesLen := len(samples)
samples = append(samples, prompb.Sample{
Value: r.Value,
Timestamp: r.Timestamp,
})
tssDst = append(tssDst, prompb.TimeSeries{
Labels: labels[labelsLen:],
Samples: samples[samplesLen:],
})
}
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
if !remotewrite.TryPush(at, &ctx.WriteRequest) {
return remotewrite.ErrQueueFullHTTPRetry
}
rowsInserted.Add(rowsTotal)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsTotal)
}
rowsPerInsert.Update(float64(rowsTotal))
return nil
}

View File

@@ -27,6 +27,9 @@ vmalert-tool-linux-ppc64le-prod:
vmalert-tool-linux-386-prod:
APP_NAME=vmalert-tool $(MAKE) app-via-docker-linux-386
vmalert-tool-linux-s390x-prod:
APP_NAME=vmalert-tool $(MAKE) app-via-docker-linux-s390x
vmalert-tool-darwin-amd64-prod:
APP_NAME=vmalert-tool $(MAKE) app-via-docker-darwin-amd64

View File

@@ -132,7 +132,7 @@ func UnitTest(files []string, disableGroupLabel bool, externalLabels []string, e
}
labels[s[:n]] = s[n+1:]
}
_, err = notifier.Init(labels, externalURL)
err = notifier.Init(labels, externalURL)
if err != nil {
logger.Fatalf("failed to init notifier: %v", err)
}
@@ -379,7 +379,7 @@ func (tg *testGroup) test(evalInterval time.Duration, groupOrderMap map[string]i
if len(g.Rules) == 0 {
continue
}
errs := g.ExecOnce(context.Background(), func() []notifier.Notifier { return nil }, rw, ts)
errs := g.ExecOnce(context.Background(), rw, ts)
for err := range errs {
if err != nil {
checkErrs = append(checkErrs, fmt.Errorf("\nfailed to exec group: %q, time: %s, err: %w", g.Name,

View File

@@ -27,6 +27,9 @@ vmalert-linux-ppc64le-prod:
vmalert-linux-386-prod:
APP_NAME=vmalert $(MAKE) app-via-docker-linux-386
vmalert-linux-s390x-prod:
APP_NAME=vmalert $(MAKE) app-via-docker-linux-s390x
vmalert-darwin-amd64-prod:
APP_NAME=vmalert $(MAKE) app-via-docker-darwin-amd64

View File

@@ -116,7 +116,7 @@ func TestParse_Failure(t *testing.T) {
f([]string{"testdata/rules/rules_interval_bad.rules"}, "eval_offset should be smaller than interval")
f([]string{"testdata/rules/rules0-bad.rules"}, "unexpected token")
f([]string{"testdata/dir/rules0-bad.rules"}, "error parsing annotation")
f([]string{"testdata/dir/rules0-bad.rules"}, "invalid annotations")
f([]string{"testdata/dir/rules1-bad.rules"}, "duplicate in file")
f([]string{"testdata/dir/rules2-bad.rules"}, "function \"unknown\" not defined")
f([]string{"testdata/dir/rules3-bad.rules"}, "either `record` or `alert` must be set")
@@ -343,7 +343,6 @@ func TestGroupValidate_Failure(t *testing.T) {
},
},
}, true, "bad prometheus expr")
}
func TestGroupValidate_Success(t *testing.T) {

View File

@@ -179,11 +179,11 @@ func (c *Client) Query(ctx context.Context, query string, ts time.Time) (Result,
var parseFn func(resp *http.Response) (Result, error)
switch c.dataSourceType {
case datasourcePrometheus:
parseFn = parsePrometheusResponse
parseFn = parsePrometheusInstantResponse
case datasourceGraphite:
parseFn = parseGraphiteResponse
case datasourceVLogs:
parseFn = parseVLogsResponse
parseFn = parseVLogsInstantResponse
default:
logger.Panicf("BUG: unsupported datasource type %q to parse query response", c.dataSourceType)
}
@@ -239,9 +239,9 @@ func (c *Client) QueryRange(ctx context.Context, query string, start, end time.T
var parseFn func(resp *http.Response) (Result, error)
switch c.dataSourceType {
case datasourcePrometheus:
parseFn = parsePrometheusResponse
parseFn = parsePrometheusRangeResponse
case datasourceVLogs:
parseFn = parseVLogsResponse
parseFn = parseVLogsRangeResponse
default:
logger.Panicf("BUG: unsupported datasource type %q to parse query range response", c.dataSourceType)
}

View File

@@ -172,17 +172,26 @@ const (
rtVector, rtMatrix, rScalar = "vector", "matrix", "scalar"
)
func parsePrometheusResponse(resp *http.Response) (res Result, err error) {
func parsePromResponse(resp *http.Response) (*promResponse, error) {
r := &promResponse{}
if err = json.NewDecoder(resp.Body).Decode(r); err != nil {
return res, fmt.Errorf("failed to decode response: %w", err)
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("failed to decode response: %w", err)
}
if r.Status == statusError {
return res, fmt.Errorf("response error %q: %s", r.ErrorType, r.Error)
return nil, fmt.Errorf("response error %q: %s", r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return res, fmt.Errorf("unknown response status %q", r.Status)
return nil, fmt.Errorf("unknown response status %q", r.Status)
}
return r, nil
}
func parsePrometheusInstantResponse(resp *http.Response) (res Result, err error) {
r, err := parsePromResponse(resp)
if err != nil {
return res, fmt.Errorf("failed to parse response: %w", err)
}
var parseFn func() ([]Metric, error)
switch r.Data.ResultType {
case rtVector:
@@ -191,12 +200,6 @@ func parsePrometheusResponse(resp *http.Response) (res Result, err error) {
return res, fmt.Errorf("unmarshal err %w; \n %#v", err, string(r.Data.Result))
}
parseFn = pi.metrics
case rtMatrix:
var pr promRange
if err := json.Unmarshal(r.Data.Result, &pr.Result); err != nil {
return res, err
}
parseFn = pr.metrics
case rScalar:
var ps promScalar
if err := json.Unmarshal(r.Data.Result, &ps); err != nil {
@@ -206,7 +209,6 @@ func parsePrometheusResponse(resp *http.Response) (res Result, err error) {
default:
return res, fmt.Errorf("unknown result type %q", r.Data.ResultType)
}
ms, err := parseFn()
if err != nil {
return res, err
@@ -222,6 +224,34 @@ func parsePrometheusResponse(resp *http.Response) (res Result, err error) {
return res, nil
}
func parsePrometheusRangeResponse(resp *http.Response) (res Result, err error) {
r, err := parsePromResponse(resp)
if err != nil {
return res, fmt.Errorf("failed to parse response: %w", err)
}
if r.Data.ResultType != rtMatrix {
return res, fmt.Errorf("unexpected result type %q; expected result type %q", r.Data.ResultType, rtMatrix)
}
var pr promRange
if err := json.Unmarshal(r.Data.Result, &pr.Result); err != nil {
return res, err
}
ms, err := pr.metrics()
if err != nil {
return res, err
}
res = Result{Data: ms, IsPartial: r.IsPartial}
if r.Stats.SeriesFetched != nil {
intV, err := strconv.Atoi(*r.Stats.SeriesFetched)
if err != nil {
return res, fmt.Errorf("failed to convert stats.seriesFetched to int: %w", err)
}
res.SeriesFetched = &intV
}
return res, nil
}
func (c *Client) setPrometheusInstantReqParams(r *http.Request, query string, timestamp time.Time) {
if c.appendTypePrefix {
r.URL.Path += "/prometheus"

View File

@@ -65,21 +65,23 @@ func TestVMInstantQuery(t *testing.T) {
case 3:
w.Write([]byte(`{"status":"unknown"}`))
case 4:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix"}}`))
w.Write([]byte(`{"status":"success","data":{"resultType":"vector"}}`))
case 5:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows","foo":"bar"},"value":[1583786142,"13763"]},{"metric":{"__name__":"vm_requests","foo":"baz"},"value":[1583786140,"2000"]}]}}`))
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"vm_rows"},"values":[[1583786142,"13763"]]}]}}`))
case 6:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows","foo":"bar"},"value":[1583786142,"13763"]},{"metric":{"__name__":"vm_requests","foo":"baz"},"value":[1583786140,"2000"]}]}}`))
case 7:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]},"stats":{"seriesFetched": "42"}}`))
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
case 8:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]},"stats":{"seriesFetched": "42"}}`))
case 9:
w.Write([]byte(`{"status":"success", "isPartial":true, "data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
}
})
mux.HandleFunc("/render", func(w http.ResponseWriter, _ *http.Request) {
c++
switch c {
case 9:
case 10:
w.Write([]byte(`[{"target":"constantLine(10)","tags":{"name":"constantLine(10)"},"datapoints":[[10,1611758343],[10,1611758373],[10,1611758403]]}]`))
}
})
@@ -102,9 +104,9 @@ func TestVMInstantQuery(t *testing.T) {
t.Fatalf("failed to parse 'time' query param %q: %s", timeParam, err)
}
switch c {
case 10:
w.Write([]byte("[]"))
case 11:
w.Write([]byte("[]"))
case 12:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"total","foo":"bar"},"value":[1583786142,"13763"]},{"metric":{"__name__":"total","foo":"baz"},"value":[1583786140,"2000"]}]}}`))
}
})
@@ -123,6 +125,7 @@ func TestVMInstantQuery(t *testing.T) {
ts := time.Now()
expErr := func(query, err string) {
t.Helper()
_, _, gotErr := pq.Query(ctx, query, ts)
if gotErr == nil {
t.Fatalf("expected %q got nil", err)
@@ -137,8 +140,9 @@ func TestVMInstantQuery(t *testing.T) {
expErr(vmQuery, "response error") // 2
expErr(vmQuery, "unknown response status") // 3
expErr(vmQuery, "unexpected end of JSON input") // 4
expErr(vmQuery, "unknown result type") // 5
res, _, err := pq.Query(ctx, vmQuery, ts) // 5 - vector
res, _, err := pq.Query(ctx, vmQuery, ts) // 6 - vector
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -159,7 +163,7 @@ func TestVMInstantQuery(t *testing.T) {
}
metricsEqual(t, res.Data, expected)
res, req, err := pq.Query(ctx, vmQuery, ts) // 6 - scalar
res, req, err := pq.Query(ctx, vmQuery, ts) // 7 - scalar
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -184,7 +188,7 @@ func TestVMInstantQuery(t *testing.T) {
res.SeriesFetched)
}
res, _, err = pq.Query(ctx, vmQuery, ts) // 7 - scalar with stats
res, _, err = pq.Query(ctx, vmQuery, ts) // 8 - scalar with stats
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -205,7 +209,7 @@ func TestVMInstantQuery(t *testing.T) {
*res.SeriesFetched)
}
res, _, err = pq.Query(ctx, vmQuery, ts) // 8
res, _, err = pq.Query(ctx, vmQuery, ts) // 9
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -216,7 +220,7 @@ func TestVMInstantQuery(t *testing.T) {
// test graphite
gq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourceGraphite)})
res, _, err = gq.Query(ctx, queryRender, ts) // 9 - graphite
res, _, err = gq.Query(ctx, queryRender, ts) // 10 - graphite
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -236,9 +240,9 @@ func TestVMInstantQuery(t *testing.T) {
vlogs := datasourceVLogs
pq = s.BuildWithParams(QuerierParams{DataSourceType: string(vlogs), EvaluationInterval: 15 * time.Second})
expErr(vlogsQuery, "error parsing response") // 10
expErr(vlogsQuery, "error parsing response") // 11
res, _, err = pq.Query(ctx, vlogsQuery, ts) // 11
res, _, err = pq.Query(ctx, vlogsQuery, ts) // 12
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -390,6 +394,8 @@ func TestVMRangeQuery(t *testing.T) {
switch c {
case 0:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"vm_rows"},"values":[[1583786142,"13763"]]}]}}`))
case 1:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[1583786142, "1"]}}`))
}
})
mux.HandleFunc("/select/logsql/stats_query_range", func(w http.ResponseWriter, r *http.Request) {
@@ -422,7 +428,7 @@ func TestVMRangeQuery(t *testing.T) {
t.Fatalf("expected 'step' query param to be 60s; got %q instead", step)
}
switch c {
case 1:
case 2:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"total"},"values":[[1583786142,"10"]]}]}}`))
}
})
@@ -446,13 +452,13 @@ func TestVMRangeQuery(t *testing.T) {
start, end := time.Now().Add(-time.Minute), time.Now()
res, err := pq.QueryRange(ctx, vmQuery, start, end)
res, err := pq.QueryRange(ctx, vmQuery, start, end) // case 0
if err != nil {
t.Fatalf("unexpected %s", err)
}
m := res.Data
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}
expected := Metric{
Labels: []prompb.Label{{Value: "vm_rows", Name: "__name__"}},
@@ -463,6 +469,9 @@ func TestVMRangeQuery(t *testing.T) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
_, err = pq.QueryRange(ctx, vmQuery, start, end) // case 1
expectError(t, err, "unexpected result type")
// test unsupported graphite
gq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourceGraphite)})

View File

@@ -40,8 +40,28 @@ func (c *Client) setVLogsRangeReqParams(r *http.Request, query string, start, en
c.setReqParams(r, query)
}
func parseVLogsResponse(resp *http.Response) (res Result, err error) {
res, err = parsePrometheusResponse(resp)
func parseVLogsInstantResponse(resp *http.Response) (res Result, err error) {
res, err = parsePrometheusInstantResponse(resp)
if err != nil {
return Result{}, err
}
for i := range res.Data {
m := &res.Data[i]
for j := range m.Labels {
// reserve the stats func result name with a new label `stats_result` instead of dropping it,
// since there could be multiple stats results in a single query, for instance:
// _time:5m | stats quantile(0.5, request_duration_seconds) p50, quantile(0.9, request_duration_seconds) p90
if m.Labels[j].Name == "__name__" {
m.Labels[j].Name = "stats_result"
break
}
}
}
return
}
func parseVLogsRangeResponse(resp *http.Response) (res Result, err error) {
res, err = parsePrometheusRangeResponse(resp)
if err != nil {
return Result{}, err
}

View File

@@ -76,7 +76,7 @@ absolute path to all .tpl files in root.
`Link to VMUI: -external.alert.source='vmui/#/?g0.expr={{.Expr|queryEscape}}'. `+
`If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.`)
externalLabels = flagutil.NewArrayString("external.label", "Optional label in the form 'Name=value' to add to all generated recording rules and alerts. "+
"In case of conflicts, original labels are kept with prefix `exported_`.")
"In case of conflicts, original labels are kept with prefix 'exported_'.")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.")
)
@@ -90,7 +90,6 @@ func main() {
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
flagutil.ApplySecretFlags()
remoteread.InitSecretFlags()
remotewrite.InitSecretFlags()
datasource.InitSecretFlags()
@@ -227,14 +226,13 @@ func newManager(ctx context.Context) (*manager, error) {
labels[s[:n]] = s[n+1:]
}
nts, err := notifier.Init(labels, *externalURL)
err = notifier.Init(labels, *externalURL)
if err != nil {
return nil, fmt.Errorf("failed to init notifier: %w", err)
}
manager := &manager{
groups: make(map[uint64]*rule.Group),
querierBuilder: q,
notifiers: nts,
labels: labels,
}
rw, err := remotewrite.Init(ctx)

View File

@@ -96,9 +96,10 @@ groups:
querierBuilder: &datasource.FakeQuerier{},
groups: make(map[uint64]*rule.Group),
labels: map[string]string{},
notifiers: func() []notifier.Notifier { return []notifier.Notifier{&notifier.FakeNotifier{}} },
rw: &remotewrite.Client{},
}
_, cleanup := notifier.InitFakeNotifier()
defer cleanup()
syncCh := make(chan struct{})
sighupCh := procutil.NewSighupChan()

View File

@@ -3,6 +3,7 @@ package main
import (
"context"
"fmt"
"strconv"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
@@ -16,7 +17,6 @@ import (
// manager controls group states
type manager struct {
querierBuilder datasource.QuerierBuilder
notifiers func() []notifier.Notifier
rw remotewrite.RWClient
// remote read builder.
@@ -46,13 +46,15 @@ func (m *manager) ruleAPI(gID, rID uint64) (rule.ApiRule, error) {
m.groupsMu.RLock()
defer m.groupsMu.RUnlock()
g, ok := m.groups[gID]
group, ok := m.groups[gID]
if !ok {
return rule.ApiRule{}, fmt.Errorf("can't find group with id %d", gID)
}
g := group.ToAPI()
ruleID := strconv.FormatUint(rID, 10)
for _, r := range g.Rules {
if r.ID() == rID {
return r.ToAPI(), nil
if r.ID == ruleID {
return r, nil
}
}
return rule.ApiRule{}, fmt.Errorf("can't find rule with id %d in group %q", rID, g.Name)
@@ -63,17 +65,20 @@ func (m *manager) alertAPI(gID, aID uint64) (*rule.ApiAlert, error) {
m.groupsMu.RLock()
defer m.groupsMu.RUnlock()
g, ok := m.groups[gID]
group, ok := m.groups[gID]
if !ok {
return nil, fmt.Errorf("can't find group with id %d", gID)
}
g := group.ToAPI()
for _, r := range g.Rules {
ar, ok := r.(*rule.AlertingRule)
if !ok {
if r.Type != rule.TypeAlerting {
continue
}
if apiAlert := ar.AlertToAPI(aID); apiAlert != nil {
return apiAlert, nil
alertID := strconv.FormatUint(aID, 10)
for _, a := range r.Alerts {
if a.ID == alertID {
return a, nil
}
}
}
return nil, fmt.Errorf("can't find alert with id %d in group %q", aID, g.Name)
@@ -94,17 +99,16 @@ func (m *manager) close() {
}
func (m *manager) startGroup(ctx context.Context, g *rule.Group, restore bool) error {
m.wg.Add(1)
id := g.GetID()
g.Init()
go func() {
defer m.wg.Done()
m.wg.Go(func() {
if restore {
g.Start(ctx, m.notifiers, m.rw, m.rr)
g.Start(ctx, m.rw, m.rr)
} else {
g.Start(ctx, m.notifiers, m.rw, nil)
g.Start(ctx, m.rw, nil)
}
}()
})
m.groups[id] = g
return nil
}
@@ -131,7 +135,7 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
if rrPresent && m.rw == nil {
return fmt.Errorf("config contains recording rules but `-remoteWrite.url` isn't set")
}
if arPresent && m.notifiers == nil {
if arPresent && notifier.GetTargets() == nil {
return fmt.Errorf("config contains alerting rules but neither `-notifier.url` nor `-notifier.config` nor `-notifier.blackhole` aren't set")
}
@@ -168,15 +172,15 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
if len(toUpdate) > 0 {
var wg sync.WaitGroup
for _, item := range toUpdate {
wg.Add(1)
// cancel evaluation so the Update will be applied as fast as possible.
// it is important to call InterruptEval before the update, because cancel fn
// can be re-assigned during the update.
item.old.InterruptEval()
go func(oldGroup *rule.Group, newGroup *rule.Group) {
oldGroup.UpdateWith(newGroup)
wg.Done()
}(item.old, item.new)
oldG := item.old
newG := item.new
wg.Go(func() {
// cancel evaluation so the Update will be applied as fast as possible.
// it is important to call InterruptEval before the update, because cancel fn
// can be re-assigned during the update.
oldG.InterruptEval()
oldG.UpdateWith(newG)
})
}
wg.Wait()
}

View File

@@ -40,10 +40,11 @@ func TestManagerEmptyRulesDir(t *testing.T) {
// execution of configuration update.
// Should be executed with -race flag
func TestManagerUpdateConcurrent(t *testing.T) {
_, cleanup := notifier.InitFakeNotifier()
defer cleanup()
m := &manager{
groups: make(map[uint64]*rule.Group),
querierBuilder: &datasource.FakeQuerier{},
notifiers: func() []notifier.Notifier { return []notifier.Notifier{&notifier.FakeNotifier{}} },
}
paths := []string{
"config/testdata/dir/rules0-good.rules",
@@ -127,8 +128,9 @@ func TestManagerUpdate_Success(t *testing.T) {
m := &manager{
groups: make(map[uint64]*rule.Group),
querierBuilder: &datasource.FakeQuerier{},
notifiers: func() []notifier.Notifier { return []notifier.Notifier{&notifier.FakeNotifier{}} },
}
_, cleanup := notifier.InitFakeNotifier()
defer cleanup()
cfgInit := loadCfg(t, []string{initPath}, true, true)
if err := m.update(ctx, cfgInit, false); err != nil {
@@ -277,7 +279,8 @@ func TestManagerUpdate_Failure(t *testing.T) {
rw: rw,
}
if notifiers != nil {
m.notifiers = func() []notifier.Notifier { return notifiers }
_, cleanup := notifier.InitFakeNotifier()
defer cleanup()
}
err := m.update(context.Background(), []config.Group{cfg}, false)
if err == nil {

View File

@@ -166,8 +166,8 @@ func templateAnnotations(annotations map[string]string, data AlertTplData, tmpl
ctmpl, _ := tmpl.Clone()
ctmpl = ctmpl.Option("missingkey=zero")
if err := templateAnnotation(&buf, builder.String(), tData, ctmpl, execute); err != nil {
r[key] = text
eg.Add(fmt.Errorf("key %q, template %q: %w", key, text, err))
r[key] = err.Error()
eg.Add(fmt.Errorf("(key: %q, value: %q): %w", key, text, err))
continue
}
r[key] = buf.String()
@@ -184,13 +184,13 @@ type tplData struct {
func templateAnnotation(dst io.Writer, text string, data tplData, tpl *textTpl.Template, execute bool) error {
tpl, err := tpl.Parse(text)
if err != nil {
return fmt.Errorf("error parsing annotation template: %w", err)
return fmt.Errorf("error parsing template: %w", err)
}
if !execute {
return nil
}
if err = tpl.Execute(dst, data); err != nil {
return fmt.Errorf("error evaluating annotation template: %w", err)
return fmt.Errorf("error evaluating template: %w", err)
}
return nil
}

View File

@@ -20,7 +20,7 @@ func TestAlertExecTemplate(t *testing.T) {
)
extLabels["cluster"] = extCluster
extLabels["dc"] = extDC
_, err := Init(extLabels, extURL)
err := Init(extLabels, extURL)
checkErr(t, err)
f := func(alert *Alert, annotations map[string]string, tplExpected map[string]string) {

View File

@@ -3,6 +3,7 @@ package notifier
import (
"bytes"
"context"
"errors"
"fmt"
"io"
"net/http"
@@ -77,12 +78,20 @@ func (am *AlertManager) LastError() string {
}
// Send an alert or resolve message
func (am *AlertManager) Send(ctx context.Context, alerts []Alert, headers map[string]string) error {
func (am *AlertManager) Send(ctx context.Context, alerts []Alert, alertLabels [][]prompb.Label, headers map[string]string) error {
if len(alerts) != len(alertLabels) {
return fmt.Errorf("mismatched number of alerts and label sets after global alert relabeling")
}
am.metrics.alertsSent.Add(len(alerts))
startTime := time.Now()
err := am.send(ctx, alerts, headers)
err := am.send(ctx, alerts, alertLabels, headers)
am.metrics.alertsSendDuration.UpdateDuration(startTime)
if err != nil {
// the context can be cancelled on graceful shutdown
// or on group update. So no need to handle the error as usual.
if errors.Is(err, context.Canceled) {
return nil
}
am.metrics.alertsSendErrors.Add(len(alerts))
am.lastError = err.Error()
} else {
@@ -91,12 +100,15 @@ func (am *AlertManager) Send(ctx context.Context, alerts []Alert, headers map[st
return err
}
func (am *AlertManager) send(ctx context.Context, alerts []Alert, headers map[string]string) error {
func (am *AlertManager) send(ctx context.Context, alerts []Alert, alertLabels [][]prompb.Label, headers map[string]string) error {
b := &bytes.Buffer{}
alertsToSend := make([]Alert, 0, len(alerts))
lblss := make([][]prompb.Label, 0, len(alerts))
for _, a := range alerts {
lbls := a.applyRelabelingIfNeeded(am.relabelConfigs)
for i, a := range alerts {
lbls := alertLabels[i]
if am.relabelConfigs != nil {
lbls = am.relabelConfigs.Apply(lbls, 0)
}
if len(lbls) == 0 {
continue
}

View File

@@ -11,6 +11,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
)
@@ -145,11 +146,11 @@ func TestAlertManager_Send(t *testing.T) {
t.Fatalf("unexpected error: %s", err)
}
if err := am.Send(context.Background(), []Alert{{Labels: map[string]string{"a": "b"}}}, nil); err == nil {
if err := am.Send(context.Background(), []Alert{{Labels: map[string]string{"a": "b"}}}, [][]prompb.Label{{{Name: "a", Value: "b"}}}, nil); err == nil {
t.Fatalf("expected connection error got nil")
}
if err := am.Send(context.Background(), []Alert{{Labels: map[string]string{"a": "b"}}}, nil); err == nil {
if err := am.Send(context.Background(), []Alert{{Labels: map[string]string{"a": "b"}}}, [][]prompb.Label{{{Name: "a", Value: "b"}}}, nil); err == nil {
t.Fatalf("expected wrong http code error got nil")
}
@@ -160,7 +161,7 @@ func TestAlertManager_Send(t *testing.T) {
End: time.Now().UTC(),
Labels: map[string]string{"alertname": "alert0"},
Annotations: map[string]string{"a": "b", "c": "d"},
}}, map[string]string{headerKey: "bar"}); err != nil {
}}, [][]prompb.Label{{{Name: "alertname", Value: "alert0"}}}, map[string]string{headerKey: "bar"}); err != nil {
t.Fatalf("unexpected error %s", err)
}
@@ -174,7 +175,7 @@ func TestAlertManager_Send(t *testing.T) {
Name: "alert2",
Labels: map[string]string{"rule": "test", "tenant": "1"},
},
}, map[string]string{headerKey: "bar"}); err != nil {
}, [][]prompb.Label{{{Name: "rule", Value: "test"}, {Name: "tenant", Value: "0"}}, {{Name: "rule", Value: "test"}, {Name: "tenant", Value: "1"}}}, map[string]string{headerKey: "bar"}); err != nil {
t.Fatalf("unexpected error %s", err)
}
@@ -187,7 +188,7 @@ func TestAlertManager_Send(t *testing.T) {
Name: "alert2",
Labels: map[string]string{},
},
}, map[string]string{}); err != nil {
}, [][]prompb.Label{{{Name: "rule", Value: "test"}}, {{}}}, map[string]string{}); err != nil {
t.Fatalf("unexpected error %s", err)
}

View File

@@ -27,15 +27,9 @@ type Config struct {
// PathPrefix is added to URL path before adding alertManagerPath value
PathPrefix string `yaml:"path_prefix,omitempty"`
// ConsulSDConfigs contains list of settings for service discovery via Consul
// see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
ConsulSDConfigs []consul.SDConfig `yaml:"consul_sd_configs,omitempty"`
// DNSSDConfigs contains list of settings for service discovery via DNS.
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
DNSSDConfigs []dns.SDConfig `yaml:"dns_sd_configs,omitempty"`
// StaticConfigs contains list of static targets
StaticConfigs []StaticConfig `yaml:"static_configs,omitempty"`
ConsulSDConfigs []ConsulSDConfigs `yaml:"consul_sd_configs,omitempty"`
DNSSDConfigs []DNSSDConfigs `yaml:"dns_sd_configs,omitempty"`
StaticConfigs []StaticConfig `yaml:"static_configs,omitempty"`
// HTTPClientConfig contains HTTP configuration for Notifier clients
HTTPClientConfig promauth.HTTPClientConfig `yaml:",inline"`
@@ -62,14 +56,29 @@ type Config struct {
parsedAlertRelabelConfigs *promrelabel.ParsedConfigs
}
// StaticConfig contains list of static targets in the following form:
// staticConfig contains list of static targets in the following form:
//
// targets:
// [ - '<host>' ]
type StaticConfig struct {
Targets []string `yaml:"targets"`
// HTTPClientConfig contains HTTP configuration for the Targets
HTTPClientConfig promauth.HTTPClientConfig `yaml:",inline"`
HTTPClientConfig promauth.HTTPClientConfig `yaml:",inline"`
AlertRelabelConfigs []promrelabel.RelabelConfig `yaml:"alert_relabel_configs,omitempty"`
}
// ConsulSDConfigs contains list of settings for service discovery via Consul,
// see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
type ConsulSDConfigs struct {
consul.SDConfig `yaml:",inline"`
AlertRelabelConfigs []promrelabel.RelabelConfig `yaml:"alert_relabel_configs,omitempty"`
}
// DNSSDConfigs contains list of settings for service discovery via DNS,
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
type DNSSDConfigs struct {
dns.SDConfig `yaml:",inline"`
AlertRelabelConfigs []promrelabel.RelabelConfig `yaml:"alert_relabel_configs,omitempty"`
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
@@ -95,6 +104,31 @@ func (cfg *Config) UnmarshalYAML(unmarshal func(any) error) error {
}
cfg.parsedAlertRelabelConfigs = arCfg
for _, s := range cfg.StaticConfigs {
if len(s.AlertRelabelConfigs) > 0 {
_, err := promrelabel.ParseRelabelConfigs(s.AlertRelabelConfigs)
if err != nil {
return fmt.Errorf("failed to parse alert_relabel_configs in static_config: %w", err)
}
}
}
for _, s := range cfg.ConsulSDConfigs {
if len(s.AlertRelabelConfigs) > 0 {
_, err := promrelabel.ParseRelabelConfigs(s.AlertRelabelConfigs)
if err != nil {
return fmt.Errorf("failed to parse alert_relabel_configs in consul_sd_config: %w", err)
}
}
}
for _, s := range cfg.DNSSDConfigs {
if len(s.AlertRelabelConfigs) > 0 {
_, err := promrelabel.ParseRelabelConfigs(s.AlertRelabelConfigs)
if err != nil {
return fmt.Errorf("failed to parse alert_relabel_configs in dns_sd_config: %w", err)
}
}
}
b, err := yaml.Marshal(cfg)
if err != nil {
return fmt.Errorf("failed to marshal configuration for checksum: %w", err)

View File

@@ -35,4 +35,6 @@ func TestParseConfig_Failure(t *testing.T) {
f("testdata/unknownFields.bad.yaml", "unknown field")
f("non-existing-file", "error reading")
f("testdata/consul.bad.yaml", "failed to parse alert_relabel_configs in consul_sd_config")
f("testdata/dns.bad.yaml", "failed to parse alert relabeling config")
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discovery/consul"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discovery/dns"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
@@ -28,11 +29,7 @@ type configWatcher struct {
targets map[TargetType][]Target
}
func newWatcher(path string, gen AlertURLGenerator) (*configWatcher, error) {
cfg, err := parseConfig(path)
if err != nil {
return nil, err
}
func newWatcher(cfg *Config, gen AlertURLGenerator) (*configWatcher, error) {
cw := &configWatcher{
cfg: cfg,
wg: sync.WaitGroup{},
@@ -88,18 +85,15 @@ func (cw *configWatcher) reload(path string) error {
return cw.start()
}
func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn getLabels) error {
targetMetadata, errors := getTargetMetadata(labelsFn, cw.cfg)
func (cw *configWatcher) add(typeK TargetType, interval time.Duration, targetsFn getTargets) error {
targetMetadata, errors := getTargetMetadata(targetsFn, cw.cfg)
for _, err := range errors {
return fmt.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.updateTargets(typeK, targetMetadata, cw.cfg, cw.genFn)
cw.wg.Add(1)
go func() {
defer cw.wg.Done()
cw.wg.Go(func() {
ticker := time.NewTicker(interval)
defer ticker.Stop()
@@ -109,62 +103,77 @@ func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn
return
case <-ticker.C:
}
targetMetadata, errors := getTargetMetadata(labelsFn, cw.cfg)
targetMetadata, errors := getTargetMetadata(targetsFn, cw.cfg)
for _, err := range errors {
logger.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.updateTargets(typeK, targetMetadata, cw.cfg, cw.genFn)
}
}()
})
return nil
}
func getTargetMetadata(labelsFn getLabels, cfg *Config) (map[string]*promutil.Labels, []error) {
metaLabels, err := labelsFn()
type targetMetadata struct {
*promutil.Labels
alertRelabelConfigs *promrelabel.ParsedConfigs
}
func getTargetMetadata(targetsFn getTargets, cfg *Config) (map[string]targetMetadata, []error) {
metaLabelsList, alertRelabelCfgs, err := targetsFn()
if err != nil {
return nil, []error{fmt.Errorf("failed to get labels: %w", err)}
}
targetMetadata := make(map[string]*promutil.Labels, len(metaLabels))
targetMts := make(map[string]targetMetadata, len(metaLabelsList))
var errors []error
duplicates := make(map[string]struct{})
for _, labels := range metaLabels {
target := labels.Get("__address__")
u, processedLabels, err := parseLabels(target, labels, cfg)
if err != nil {
errors = append(errors, err)
continue
}
if len(u) == 0 {
continue
}
if _, ok := duplicates[u]; ok { // check for duplicates
if !*suppressDuplicateTargetErrors {
logger.Errorf("skipping duplicate target with identical address %q; "+
"make sure service discovery and relabeling is set up properly; "+
"original labels: %s; resulting labels: %s",
u, labels, processedLabels)
for i := range metaLabelsList {
metaLabels := metaLabelsList[i]
alertRelabelCfg := alertRelabelCfgs[i]
for _, labels := range metaLabels {
target := labels.Get("__address__")
u, processedLabels, err := parseLabels(target, labels, cfg)
if err != nil {
errors = append(errors, err)
continue
}
if len(u) == 0 {
continue
}
// check for duplicated targets
// targets with same address but different alert_relabel_configs are still considered duplicates since it's mostly due to misconfiguration and could cause duplicated notifications.
if _, ok := duplicates[u]; ok {
if !*suppressDuplicateTargetErrors {
logger.Errorf("skipping duplicate target with identical address %q; "+
"make sure service discovery and relabeling is set up properly; "+
"original labels: %s; resulting labels: %s",
u, labels, processedLabels)
}
continue
}
duplicates[u] = struct{}{}
targetMts[u] = targetMetadata{
Labels: processedLabels,
alertRelabelConfigs: alertRelabelCfg,
}
continue
}
duplicates[u] = struct{}{}
targetMetadata[u] = processedLabels
}
return targetMetadata, errors
return targetMts, errors
}
type getLabels func() ([]*promutil.Labels, error)
type getTargets func() ([][]*promutil.Labels, []*promrelabel.ParsedConfigs, error)
func (cw *configWatcher) start() error {
if len(cw.cfg.StaticConfigs) > 0 {
var targets []Target
for _, cfg := range cw.cfg.StaticConfigs {
for i, cfg := range cw.cfg.StaticConfigs {
alertRelabelConfig, _ := promrelabel.ParseRelabelConfigs(cw.cfg.StaticConfigs[i].AlertRelabelConfigs)
httpCfg := mergeHTTPClientConfigs(cw.cfg.HTTPClientConfig, cfg.HTTPClientConfig)
for _, target := range cfg.Targets {
address, labels, err := parseLabels(target, nil, cw.cfg)
if err != nil {
return fmt.Errorf("failed to parse labels for target %q: %w", target, err)
}
notifier, err := NewAlertManager(address, cw.genFn, httpCfg, cw.cfg.parsedAlertRelabelConfigs, cw.cfg.Timeout.Duration())
notifier, err := NewAlertManager(address, cw.genFn, httpCfg, alertRelabelConfig, cw.cfg.Timeout.Duration())
if err != nil {
return fmt.Errorf("failed to init alertmanager for addr %q: %w", address, err)
}
@@ -178,17 +187,20 @@ func (cw *configWatcher) start() error {
}
if len(cw.cfg.ConsulSDConfigs) > 0 {
err := cw.add(TargetConsul, *consul.SDCheckInterval, func() ([]*promutil.Labels, error) {
var labels []*promutil.Labels
err := cw.add(TargetConsul, *consul.SDCheckInterval, func() ([][]*promutil.Labels, []*promrelabel.ParsedConfigs, error) {
var labels [][]*promutil.Labels
var alertRelabelConfigs []*promrelabel.ParsedConfigs
for i := range cw.cfg.ConsulSDConfigs {
alertRelabelConfig, _ := promrelabel.ParseRelabelConfigs(cw.cfg.ConsulSDConfigs[i].AlertRelabelConfigs)
sdc := &cw.cfg.ConsulSDConfigs[i]
targetLabels, err := sdc.GetLabels(cw.cfg.baseDir)
if err != nil {
return nil, fmt.Errorf("got labels err: %w", err)
return nil, nil, fmt.Errorf("got labels err: %w", err)
}
labels = append(labels, targetLabels...)
labels = append(labels, targetLabels)
alertRelabelConfigs = append(alertRelabelConfigs, alertRelabelConfig)
}
return labels, nil
return labels, alertRelabelConfigs, nil
})
if err != nil {
return fmt.Errorf("failed to start consulSD discovery: %w", err)
@@ -196,17 +208,21 @@ func (cw *configWatcher) start() error {
}
if len(cw.cfg.DNSSDConfigs) > 0 {
err := cw.add(TargetDNS, *dns.SDCheckInterval, func() ([]*promutil.Labels, error) {
var labels []*promutil.Labels
err := cw.add(TargetDNS, *dns.SDCheckInterval, func() ([][]*promutil.Labels, []*promrelabel.ParsedConfigs, error) {
var labels [][]*promutil.Labels
var alertRelabelConfigs []*promrelabel.ParsedConfigs
for i := range cw.cfg.DNSSDConfigs {
alertRelabelConfig, _ := promrelabel.ParseRelabelConfigs(cw.cfg.DNSSDConfigs[i].AlertRelabelConfigs)
sdc := &cw.cfg.DNSSDConfigs[i]
targetLabels, err := sdc.GetLabels(cw.cfg.baseDir)
if err != nil {
return nil, fmt.Errorf("got labels err: %w", err)
return nil, nil, fmt.Errorf("got labels err: %w", err)
}
labels = append(labels, targetLabels...)
labels = append(labels, targetLabels)
alertRelabelConfigs = append(alertRelabelConfigs, alertRelabelConfig)
}
return labels, nil
return labels, alertRelabelConfigs, nil
})
if err != nil {
return fmt.Errorf("failed to start DNSSD discovery: %w", err)
@@ -240,30 +256,30 @@ func (cw *configWatcher) setTargets(key TargetType, targets []Target) {
cw.targetsMu.Unlock()
}
func (cw *configWatcher) updateTargets(key TargetType, targetMetadata map[string]*promutil.Labels, cfg *Config, genFn AlertURLGenerator) {
func (cw *configWatcher) updateTargets(key TargetType, targetMts map[string]targetMetadata, cfg *Config, genFn AlertURLGenerator) {
cw.targetsMu.Lock()
defer cw.targetsMu.Unlock()
oldTargets := cw.targets[key]
var updatedTargets []Target
for _, ot := range oldTargets {
if _, ok := targetMetadata[ot.Addr()]; !ok {
if _, ok := targetMts[ot.Addr()]; !ok {
// if target not exists in currentTargets, close it
ot.Close()
} else {
updatedTargets = append(updatedTargets, ot)
delete(targetMetadata, ot.Addr())
delete(targetMts, ot.Addr())
}
}
// create new resources for the new targets
for addr, labels := range targetMetadata {
am, err := NewAlertManager(addr, genFn, cfg.HTTPClientConfig, cfg.parsedAlertRelabelConfigs, cfg.Timeout.Duration())
for addr, metadata := range targetMts {
am, err := NewAlertManager(addr, genFn, cfg.HTTPClientConfig, metadata.alertRelabelConfigs, cfg.Timeout.Duration())
if err != nil {
logger.Errorf("failed to init %s notifier with addr %q: %w", key, addr, err)
continue
}
updatedTargets = append(updatedTargets, Target{
Notifier: am,
Labels: labels,
Labels: metadata.Labels,
})
}

View File

@@ -7,6 +7,7 @@ import (
"net/http/httptest"
"os"
"sync"
"sync/atomic"
"testing"
"time"
@@ -28,7 +29,11 @@ static_configs:
- localhost:9093
- localhost:9094
`)
cw, err := newWatcher(f.Name(), nil)
cfg, err := parseConfig(f.Name())
if err != nil {
t.Fatalf("failed to parse config: %s", err)
}
cw, err := newWatcher(cfg, nil)
if err != nil {
t.Fatalf("failed to start config watcher: %s", err)
}
@@ -83,33 +88,64 @@ consul_sd_configs:
- server: %s
services:
- alertmanager
`, consulSDServer.URL))
- server: %s
services:
- alertmanager
alert_relabel_configs:
- target_label: "foo"
replacement: "tar"
`, consulSDServer.URL, consulSDServer.URL))
cw, err := newWatcher(consulSDFile.Name(), nil)
cfg, err := parseConfig(consulSDFile.Name())
if err != nil {
t.Fatalf("failed to parse config: %s", err)
}
cw, err := newWatcher(cfg, nil)
if err != nil {
t.Fatalf("failed to start config watcher: %s", err)
}
defer cw.mustStop()
if len(cw.notifiers()) != 2 {
t.Fatalf("expected to get 2 notifiers; got %d", len(cw.notifiers()))
if len(cw.notifiers()) != 3 {
t.Fatalf("expected to get 3 notifiers; got %d", len(cw.notifiers()))
}
expAddr1 := fmt.Sprintf("https://%s/proxy/api/v2/alerts", fakeConsulService1)
expAddr2 := fmt.Sprintf("https://%s/proxy/api/v2/alerts", fakeConsulService2)
expAddr3 := fmt.Sprintf("https://%s/proxy/api/v2/alerts", fakeConsulService3)
n1, n2 := cw.notifiers()[0], cw.notifiers()[1]
n1, n2, n3 := cw.notifiers()[0], cw.notifiers()[1], cw.notifiers()[2]
if n1.Addr() != expAddr1 {
t.Fatalf("exp address %q; got %q", expAddr1, n1.Addr())
}
if n2.Addr() != expAddr2 {
t.Fatalf("exp address %q; got %q", expAddr2, n2.Addr())
}
if n3.Addr() != expAddr3 {
t.Fatalf("exp address %q; got %q", expAddr3, n3.Addr())
}
if n1.(*AlertManager).relabelConfigs.String() != "" {
t.Fatalf("unexpected relabel configs: %q", n1.(*AlertManager).relabelConfigs.String())
}
if n2.(*AlertManager).relabelConfigs.String() != "" {
t.Fatalf("unexpected relabel configs: %q", n2.(*AlertManager).relabelConfigs.String())
}
if n3.(*AlertManager).relabelConfigs.String() != "- target_label: foo\n replacement: tar\n" {
t.Fatalf("unexpected relabel configs: %q", n3.(*AlertManager).relabelConfigs.String())
}
f := func() bool { return len(cw.notifiers()) == 1 }
if !waitFor(f, time.Second) {
t.Fatalf("expected to get 1 notifiers; got %d", len(cw.notifiers()))
}
n3 = cw.notifiers()[0]
if n3.Addr() != expAddr3 {
t.Fatalf("exp address %q; got %q", expAddr3, n3.Addr())
}
if n3.(*AlertManager).relabelConfigs.String() != "- target_label: foo\n replacement: tar\n" {
t.Fatalf("unexpected relabel configs: %q", n3.(*AlertManager).relabelConfigs.String())
}
}
// TestConfigWatcherReloadConcurrent supposed to test concurrent
@@ -164,7 +200,11 @@ consul_sd_configs:
"unknownFields.bad.yaml",
}
cw, err := newWatcher(paths[0], nil)
cfg, err := parseConfig(paths[0])
if err != nil {
t.Fatalf("failed to parse config: %s", err)
}
cw, err := newWatcher(cfg, nil)
if err != nil {
t.Fatalf("failed to start config watcher: %s", err)
}
@@ -202,10 +242,11 @@ func checkErr(t *testing.T, err error) {
const (
fakeConsulService1 = "127.0.0.1:9093"
fakeConsulService2 = "127.0.0.1:9095"
fakeConsulService3 = "127.0.0.1:9097"
)
func newFakeConsulServer() *httptest.Server {
requestCount := 0
var requestCount atomic.Int32
mux := http.NewServeMux()
mux.HandleFunc("/v1/agent/self", func(rw http.ResponseWriter, _ *http.Request) {
rw.Write([]byte(`{"Config": {"Datacenter": "dc1"}}`))
@@ -220,7 +261,7 @@ func newFakeConsulServer() *httptest.Server {
}`))
})
mux.HandleFunc("/v1/health/service/alertmanager", func(rw http.ResponseWriter, _ *http.Request) {
if requestCount == 0 {
if requestCount.Load() == 0 {
rw.Header().Set("X-Consul-Index", "1")
rw.Write([]byte(`
[
@@ -360,7 +401,7 @@ func newFakeConsulServer() *httptest.Server {
}
]`))
}
requestCount++
requestCount.Add(1)
})
return httptest.NewServer(mux)

View File

@@ -5,6 +5,8 @@ import (
"fmt"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
)
// FakeNotifier is a mock notifier
@@ -15,6 +17,19 @@ type FakeNotifier struct {
counter int
}
// InitFakeNotifier initializes global notifier to FakeNotifier,
// and returns a cleanup function to restore the original getActiveNotifiers.
func InitFakeNotifier() (*FakeNotifier, func()) {
originalGetActiveNotifiers := getActiveNotifiers
fn := &FakeNotifier{}
getActiveNotifiers = func() []Notifier {
return []Notifier{fn}
}
return fn, func() {
getActiveNotifiers = originalGetActiveNotifiers
}
}
// Close does nothing
func (*FakeNotifier) Close() {}
@@ -27,7 +42,7 @@ func (*FakeNotifier) LastError() string {
func (*FakeNotifier) Addr() string { return "" }
// Send sets alerts and increases counter
func (fn *FakeNotifier) Send(_ context.Context, alerts []Alert, _ map[string]string) error {
func (fn *FakeNotifier) Send(_ context.Context, alerts []Alert, _ [][]prompb.Label, _ map[string]string) error {
fn.Lock()
defer fn.Unlock()
fn.counter += len(alerts)

View File

@@ -1,17 +1,22 @@
package notifier
import (
"context"
"flag"
"fmt"
"net/url"
"strconv"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/vmalertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutil"
)
@@ -96,11 +101,25 @@ func InitAlertURLGeneratorFn(externalURL *url.URL, externalAlertSource string, v
return nil
}
// cw holds a configWatcher for configPath configuration file
// configWatcher provides a list of Notifier objects discovered
// from static config or via service discovery.
// cw is not nil only if configPath is provided.
var cw *configWatcher
var (
// getActiveNotifiers returns the current list of Notifier objects.
getActiveNotifiers func() []Notifier
// globalRelabelCfg stores the parsed alert relabeling config from the config file if there is
globalRelabelCfg *promrelabel.ParsedConfigs
// cw holds a configWatcher for configPath configuration file
// configWatcher provides a list of Notifier objects discovered
// from static config or via service discovery.
// cw is not nil only if configPath is provided.
cw *configWatcher
// externalLabels is a global variable for holding external labels configured via flags
// It is supposed to be inited via Init function only.
externalLabels map[string]string
// externalURL is a global variable for holding external URL value configured via flag
// It is supposed to be inited via Init function only.
externalURL string
)
// Reload checks the changes in configPath configuration file
// and applies changes if any.
@@ -111,66 +130,62 @@ func Reload() error {
return cw.reload(*configPath)
}
var staticNotifiersFn func() []Notifier
var (
// externalLabels is a global variable for holding external labels configured via flags
// It is supposed to be inited via Init function only.
externalLabels map[string]string
// externalURL is a global variable for holding external URL value configured via flag
// It is supposed to be inited via Init function only.
externalURL string
)
// Init returns a function for retrieving actual list of Notifier objects.
// Init works in two mods:
// - configuration via flags (for backward compatibility). Is always static
// and don't support live reloads.
// - configuration via file. Supports live reloads and service discovery.
//
// Init returns an error if both mods are used.
func Init(extLabels map[string]string, extURL string) (func() []Notifier, error) {
func Init(extLabels map[string]string, extURL string) error {
externalURL = extURL
externalLabels = extLabels
_, err := url.Parse(externalURL)
if err != nil {
return nil, fmt.Errorf("failed to parse external URL: %w", err)
return fmt.Errorf("failed to parse external URL: %w", err)
}
if *blackHole {
if len(*addrs) > 0 || *configPath != "" {
return nil, fmt.Errorf("only one of -notifier.blackhole, -notifier.url and -notifier.config flags must be specified")
return fmt.Errorf("only one of -notifier.blackhole, -notifier.url and -notifier.config flags must be specified")
}
notifier := newBlackHoleNotifier()
staticNotifiersFn = func() []Notifier {
getActiveNotifiers = func() []Notifier {
return []Notifier{notifier}
}
return staticNotifiersFn, nil
return nil
}
if *configPath == "" && len(*addrs) == 0 {
return nil, nil
return nil
}
if *configPath != "" && len(*addrs) > 0 {
return nil, fmt.Errorf("only one of -notifier.config or -notifier.url flags must be specified")
return fmt.Errorf("only one of -notifier.config or -notifier.url flags must be specified")
}
if len(*addrs) > 0 {
notifiers, err := notifiersFromFlags(AlertURLGeneratorFn)
if err != nil {
return nil, fmt.Errorf("failed to create notifier from flag values: %w", err)
return fmt.Errorf("failed to create notifier from flag values: %w", err)
}
staticNotifiersFn = func() []Notifier {
getActiveNotifiers = func() []Notifier {
return notifiers
}
return staticNotifiersFn, nil
return nil
}
cw, err = newWatcher(*configPath, AlertURLGeneratorFn)
cfg, err := parseConfig(*configPath)
if err != nil {
return nil, fmt.Errorf("failed to init config watcher: %w", err)
return err
}
return cw.notifiers, nil
if cfg.AlertRelabelConfigs != nil {
globalRelabelCfg = cfg.parsedAlertRelabelConfigs
}
cw, err = newWatcher(cfg, AlertURLGeneratorFn)
if err != nil {
return fmt.Errorf("failed to init config watcher: %w", err)
}
getActiveNotifiers = cw.notifiers
return nil
}
// InitSecretFlags must be called after flag.Parse and before any logging
@@ -245,23 +260,57 @@ const (
// GetTargets returns list of static or discovered targets
// via notifier configuration.
//
// Must be called after Init.
func GetTargets() map[TargetType][]Target {
var targets = make(map[TargetType][]Target)
if staticNotifiersFn != nil {
for _, ns := range staticNotifiersFn() {
targets[TargetStatic] = append(targets[TargetStatic], Target{
Notifier: ns,
})
}
if getActiveNotifiers == nil {
return nil
}
var targets = make(map[TargetType][]Target)
// use cached targets from configWatcher instead of getActiveNotifiers for the extra target labels
if cw != nil {
cw.targetsMu.RLock()
for key, ns := range cw.targets {
targets[key] = append(targets[key], ns...)
}
cw.targetsMu.RUnlock()
return targets
}
// static notifiers don't have labels
for _, ns := range getActiveNotifiers() {
targets[TargetStatic] = append(targets[TargetStatic], Target{
Notifier: ns,
})
}
return targets
}
// Send sends alerts to all active notifiers
func Send(ctx context.Context, alerts []Alert, notifierHeaders map[string]string) *vmalertutil.ErrGroup {
alertsToSend := make([]Alert, 0, len(alerts))
lblss := make([][]prompb.Label, 0, len(alerts))
// apply global relabel config first without modifying original alerts in alerts
for _, a := range alerts {
lbls := a.applyRelabelingIfNeeded(globalRelabelCfg)
if len(lbls) == 0 {
continue
}
alertsToSend = append(alertsToSend, a)
lblss = append(lblss, lbls)
}
errGr := new(vmalertutil.ErrGroup)
wg := sync.WaitGroup{}
activeNotifiers := getActiveNotifiers()
for i := range activeNotifiers {
nt := activeNotifiers[i]
wg.Go(func() {
if err := nt.Send(ctx, alertsToSend, lblss, notifierHeaders); err != nil {
errGr.Add(fmt.Errorf("failed to send alerts to addr %q: %w", nt.Addr(), err))
}
})
}
wg.Wait()
return errGr
}

View File

@@ -1,11 +1,17 @@
package notifier
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/http/httptest"
"net/url"
"os"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
)
func TestInit(t *testing.T) {
@@ -14,14 +20,13 @@ func TestInit(t *testing.T) {
*addrs = flagutil.ArrayString{"127.0.0.1", "127.0.0.2"}
fn, err := Init(nil, "")
err := Init(nil, "")
if err != nil {
t.Fatalf("%s", err)
}
nfs := fn()
if len(nfs) != 2 {
t.Fatalf("expected to get 2 notifiers; got %d", len(nfs))
if len(getActiveNotifiers()) != 2 {
t.Fatalf("expected to get 2 notifiers; got %d", len(getActiveNotifiers()))
}
targets := GetTargets()
@@ -54,7 +59,7 @@ func TestInitNegative(t *testing.T) {
*configPath = path
*addrs = flagutil.ArrayString{addr}
*blackHole = bh
if _, err := Init(nil, ""); err == nil {
if err := Init(nil, ""); err == nil {
t.Fatalf("expected to get error; got nil instead")
}
}
@@ -71,14 +76,13 @@ func TestBlackHole(t *testing.T) {
*blackHole = true
fn, err := Init(nil, "")
err := Init(nil, "")
if err != nil {
t.Fatalf("%s", err)
}
nfs := fn()
if len(nfs) != 1 {
t.Fatalf("expected to get 1 notifier; got %d", len(nfs))
if len(getActiveNotifiers()) != 1 {
t.Fatalf("expected to get 1 notifier; got %d", len(getActiveNotifiers()))
}
targets := GetTargets()
@@ -120,3 +124,85 @@ func TestGetAlertURLGenerator(t *testing.T) {
t.Fatalf("unexpected url want %s, got %s", exp, AlertURLGeneratorFn(testAlert))
}
}
func TestSendAlerts(t *testing.T) {
oldAlertURLGeneratorFn := AlertURLGeneratorFn
defer func() { AlertURLGeneratorFn = oldAlertURLGeneratorFn }()
AlertURLGeneratorFn = func(alert Alert) string {
return ""
}
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Fatalf("should not be called")
})
mux.HandleFunc(alertManagerPath, func(w http.ResponseWriter, r *http.Request) {
var a []struct {
Labels map[string]string `json:"labels"`
}
if err := json.NewDecoder(r.Body).Decode(&a); err != nil {
t.Fatalf("can not unmarshal data into alert %s", err)
}
if len(a) != 2 {
t.Fatalf("expected 2 alert in array got %d", len(a))
}
if len(a[0].Labels) != 4 {
t.Fatalf("expected 4 labels got %d", len(a[0].Labels))
}
if a[0].Labels["env"] != "prod" {
t.Fatalf("expected env label to be prod during relabeling, got %s", a[0].Labels["env"])
}
if a[0].Labels["c"] != "baz" {
t.Fatalf("expected c label to be baz during relabeling, got %s", a[0].Labels["c"])
}
if len(a[1].Labels) != 1 {
t.Fatalf("expected 1 labels got %d", len(a[1].Labels))
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
f, err := os.CreateTemp("", "")
if err != nil {
t.Fatal(err)
}
defer fs.MustRemovePath(f.Name())
rawConfig := `
static_configs:
- targets:
- %s
alert_relabel_configs:
- source_labels: [b]
target_label: "c"
alert_relabel_configs:
- source_labels: [a]
target_label: "b"
- target_label: "env"
replacement: "prod"
`
config := fmt.Sprintf(rawConfig, srv.URL+alertManagerPath)
writeToFile(f.Name(), config)
oldConfigPath := configPath
defer func() { configPath = oldConfigPath }()
*configPath = f.Name()
err = Init(nil, "")
if err != nil {
t.Fatalf("unexpected error when parse notifier config: %s", err)
}
firingAlerts := []Alert{
{
Name: "alert1",
Labels: map[string]string{"a": "baz"},
},
{
Name: "alert2",
Labels: map[string]string{},
},
}
errG := Send(context.Background(), firingAlerts, nil)
if errG.Err() != nil {
t.Fatalf("unexpected error when sending alerts: %s", err)
}
}

View File

@@ -1,13 +1,17 @@
package notifier
import "context"
import (
"context"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
)
// Notifier is a common interface for alert manager provider
type Notifier interface {
// Send sends the given list of alerts.
// Returns an error if fails to send the alerts.
// Must unblock if the given ctx is cancelled.
Send(ctx context.Context, alerts []Alert, notifierHeaders map[string]string) error
Send(ctx context.Context, alerts []Alert, alertLabels [][]prompb.Label, notifierHeaders map[string]string) error
// Addr returns address where alerts are sent.
Addr() string
// LastError returns error, that occured during last attempt to send data

View File

@@ -1,6 +1,10 @@
package notifier
import "context"
import (
"context"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
)
// blackHoleNotifier is a Notifier stub, used when no notifications need
// to be sent.
@@ -10,7 +14,7 @@ type blackHoleNotifier struct {
}
// Send will send no notifications, but increase the metric.
func (bh *blackHoleNotifier) Send(_ context.Context, alerts []Alert, _ map[string]string) error { //nolint:revive
func (bh *blackHoleNotifier) Send(_ context.Context, alerts []Alert, _ [][]prompb.Label, _ map[string]string) error { //nolint:revive
bh.metrics.alertsSent.Add(len(alerts))
return nil
}

View File

@@ -5,6 +5,7 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
metricset "github.com/VictoriaMetrics/metrics"
)
@@ -16,7 +17,7 @@ func TestBlackHoleNotifier_Send(t *testing.T) {
Start: time.Now().UTC(),
End: time.Now().UTC(),
Annotations: map[string]string{"a": "b", "c": "d", "e": "f"},
}}, nil); err != nil {
}}, [][]prompb.Label{{}}, nil); err != nil {
t.Fatalf("unexpected error %s", err)
}
@@ -34,7 +35,7 @@ func TestBlackHoleNotifier_Close(t *testing.T) {
Start: time.Now().UTC(),
End: time.Now().UTC(),
Annotations: map[string]string{"a": "b", "c": "d", "e": "f"},
}}, nil); err != nil {
}}, [][]prompb.Label{{}}, nil); err != nil {
t.Fatalf("unexpected error %s", err)
}

View File

@@ -0,0 +1,19 @@
consul_sd_configs:
- server: localhost:8500
scheme: http
services:
- alertmanager
alert_relabel_configs:
- action: keep
source_labels: [env]
regex: "prod"
- server: localhost:8500
services:
- consul
alert_relabel_configs:
- action: keep
source_labels: [env]
regex: "(abc"
alert_relabel_configs:
- target_label: "foo"
replacement: "aaa"

View File

@@ -0,0 +1,13 @@
dns_sd_configs:
- names:
- cloudflare.com
type: 'A'
port: 9093
relabel_configs:
- source_labels: [__meta_dns_name]
replacement: '${1}'
target_label: dns_name
alert_relabel_configs:
- action: keep
source_labels: [env]
regex: "(abc"

View File

@@ -2,12 +2,19 @@ static_configs:
- targets:
- localhost:9093
- localhost:9095
alert_relabel_configs:
- action: keep
source_labels: [env]
regex: "static"
consul_sd_configs:
- server: localhost:8500
scheme: http
services:
- alertmanager
alert_relabel_configs:
- action: keep
source_labels: [env]
regex: "consul"
- server: localhost:8500
services:
- consul
@@ -17,6 +24,10 @@ dns_sd_configs:
- cloudflare.com
type: 'A'
port: 9093
alert_relabel_configs:
- action: keep
source_labels: [env]
regex: "dns"
relabel_configs:
- source_labels: [__meta_consul_tags]
@@ -25,4 +36,4 @@ relabel_configs:
target_label: __scheme__
- source_labels: [__meta_dns_name]
replacement: '${1}'
target_label: dns_name
target_label: dns_name

View File

@@ -1,22 +1,14 @@
headers:
- 'CustomHeader: foo'
static_configs:
- targets:
- localhost:9093
- localhost:9095
- https://localhost:9093/test/api/v2/alerts
basic_auth:
username: foo
password: bar
- http://192.168.0.101:9093
alert_relabel_configs:
- target_label: "foo"
replacement: "aaa"
- targets:
- localhost:9096
- localhost:9097
basic_auth:
username: foo
password: baz
- http://192.168.0.101:9093
alert_relabel_configs:
- target_label: "foo"
replacement: "ccc"
alert_relabel_configs:
- target_label: "foo"
replacement: "aaa"

View File

@@ -14,9 +14,9 @@ import (
)
var (
addr = flag.String("remoteRead.url", "", "Optional URL to datasource compatible with MetricsQL. It can be single node VictoriaMetrics or vmselect."+
"Remote read is used to restore alerts state."+
"This configuration makes sense only if `vmalert` was configured with `remoteWrite.url` before and has been successfully persisted its state. "+
addr = flag.String("remoteRead.url", "", "Optional URL to datasource compatible with MetricsQL. It can be single node VictoriaMetrics or vmselect. "+
"Remote read is used to restore alerts state. "+
"This configuration makes sense only if vmalert was configured with '-remoteWrite.url' before and has been successfully persisted its state. "+
"Supports address in the form of IP address with a port (e.g., http://127.0.0.1:8428) or DNS SRV record. "+
"See also '-remoteRead.disablePathAppend', '-remoteRead.showURL'.")

View File

@@ -173,9 +173,8 @@ func (c *Client) run(ctx context.Context) {
cancel()
}
c.wg.Add(1)
go func() {
defer c.wg.Done()
c.wg.Go(func() {
defer ticker.Stop()
for {
select {
@@ -197,7 +196,7 @@ func (c *Client) run(ctx context.Context) {
}
}
}
}()
})
}
var (

View File

@@ -2,6 +2,7 @@ package rule
import (
"context"
"errors"
"fmt"
"hash/fnv"
"math"
@@ -246,16 +247,6 @@ func (ar *AlertingRule) GetAlerts() []*notifier.Alert {
return alerts
}
// GetAlert returns alert if id exists
func (ar *AlertingRule) GetAlert(id uint64) *notifier.Alert {
ar.alertsMu.RLock()
defer ar.alertsMu.RUnlock()
if ar.alerts == nil {
return nil
}
return ar.alerts[id]
}
func (ar *AlertingRule) logDebugf(at time.Time, a *notifier.Alert, format string, args ...any) {
if !ar.Debug {
return
@@ -321,6 +312,11 @@ type labelSet struct {
// On k conflicts in origin set, the original value is preferred and copied
// to processed with `exported_%k` key. The copy happens only if passed v isn't equal to origin[k] value.
func (ls *labelSet) add(k, v string) {
// do not add label with empty value, since it has no meaning.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
if v == "" {
return
}
ls.processed[k] = v
ov, ok := ls.origin[k]
if !ok {
@@ -355,9 +351,6 @@ func (ar *AlertingRule) toLabels(m datasource.Metric, qFn templates.QueryFn) (*l
Value: m.Values[0],
Expr: ar.Expr,
})
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %w", err)
}
for k, v := range extraLabels {
ls.add(k, v)
}
@@ -368,7 +361,7 @@ func (ar *AlertingRule) toLabels(m datasource.Metric, qFn templates.QueryFn) (*l
if !*disableAlertGroupLabel && ar.GroupName != "" {
ls.add(alertGroupNameLabel, ar.GroupName)
}
return ls, nil
return ls, err
}
// execRange executes alerting rule on the given time range similarly to exec.
@@ -461,7 +454,7 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
defer func() {
ar.state.add(curState)
if curState.Err != nil {
if curState.Err != nil && !errors.Is(curState.Err, context.Canceled) {
ar.metrics.errors.Inc()
}
}()
@@ -484,8 +477,9 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
for i, m := range res.Data {
ls, err := ar.expandLabelTemplates(m, qFn)
if err != nil {
// only set error in current state, but do not break alert processing
curState.Err = err
return nil, curState.Err
logger.Errorf("got templating error in rule %s: %q", ar.Name, err)
}
at := ts
alertID := hash(ls.processed)
@@ -497,8 +491,9 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
}
as, err := ar.expandAnnotationTemplates(m, qFn, at, ls)
if err != nil {
// only set error in current state, but do not break alert processing
curState.Err = err
return nil, curState.Err
logger.Errorf("got templating error in rule %s: %q", ar.Name, err)
}
expandedLabels[i] = ls
expandedAnnotations[i] = as
@@ -607,7 +602,7 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
func (ar *AlertingRule) expandLabelTemplates(m datasource.Metric, qFn templates.QueryFn) (*labelSet, error) {
ls, err := ar.toLabels(m, qFn)
if err != nil {
return nil, fmt.Errorf("failed to expand label templates: %s", err)
return ls, fmt.Errorf("failed to expand label templates: %s", err)
}
return ls, nil
}
@@ -625,7 +620,7 @@ func (ar *AlertingRule) expandAnnotationTemplates(m datasource.Metric, qFn templ
}
as, err := notifier.ExecTemplate(qFn, ar.Annotations, tplData)
if err != nil {
return nil, fmt.Errorf("failed to expand annotation templates: %s", err)
return as, fmt.Errorf("failed to expand annotation templates: %s", err)
}
return as, nil
}

View File

@@ -827,12 +827,9 @@ func TestGroup_Restore(t *testing.T) {
fg := NewGroup(config.Group{Name: "TestRestore", Rules: rules}, fqr, time.Second, nil)
fg.Init()
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
nts := func() []notifier.Notifier { return []notifier.Notifier{&notifier.FakeNotifier{}} }
fg.Start(context.Background(), nts, nil, fqr)
wg.Done()
}()
wg.Go(func() {
fg.Start(context.Background(), nil, fqr)
})
fg.Close()
wg.Wait()
@@ -1373,8 +1370,10 @@ func TestAlertingRule_ToLabels(t *testing.T) {
ar := &AlertingRule{
Labels: map[string]string{
"instance": "override", // this should override instance with new value
"group": "vmalert", // this shouldn't have effect since value in metric is equal
"instance": "override", // this should override instance with new value
"group": "vmalert", // this shouldn't have effect since value in metric is equal
"invalid_label": "{{ .Values.mustRuntimeFail }}",
"empty_label": "", // this should be dropped
},
Expr: "sum(vmalert_alerting_rules_error) by(instance, group, alertname) > 0",
Name: "AlertingRulesError",
@@ -1382,10 +1381,11 @@ func TestAlertingRule_ToLabels(t *testing.T) {
}
expectedOriginLabels := map[string]string{
"instance": "0.0.0.0:8800",
"group": "vmalert",
"alertname": "ConfigurationReloadFailure",
"alertgroup": "vmalert",
"instance": "0.0.0.0:8800",
"group": "vmalert",
"alertname": "ConfigurationReloadFailure",
"alertgroup": "vmalert",
"invalid_label": `error evaluating template: template: :1:268: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
}
expectedProcessedLabels := map[string]string{
@@ -1395,11 +1395,12 @@ func TestAlertingRule_ToLabels(t *testing.T) {
"exported_alertname": "ConfigurationReloadFailure",
"group": "vmalert",
"alertgroup": "vmalert",
"invalid_label": `error evaluating template: template: :1:268: executing "" at <.Values.mustRuntimeFail>: can't evaluate field Values in type notifier.tplData`,
}
ls, err := ar.toLabels(metric, nil)
if err != nil {
t.Fatalf("unexpected error: %s", err)
if err == nil || !strings.Contains(err.Error(), "error evaluating template") {
t.Fatalf("unexpected error %q", err.Error())
}
if !reflect.DeepEqual(ls.origin, expectedOriginLabels) {

View File

@@ -18,7 +18,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/vmalertutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
)
@@ -39,6 +38,8 @@ var (
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries. "+
"For example, if lookback=1h then range from now() to now()-1h will be scanned.")
maxStartDelay = flag.Duration("group.maxStartDelay", 5*time.Minute, "Defines the max delay before starting the group evaluation. Group's start is artificially delayed for random duration on interval"+
" [0..min(--group.maxStartDelay, group.interval)]. This helps smoothing out the load on the configured datasource, so evaluations aren't executed too close to each other.")
)
// Group is an entity for grouping rules
@@ -330,13 +331,13 @@ func (g *Group) Init() {
}
// Start starts group's evaluation
func (g *Group) Start(ctx context.Context, nts func() []notifier.Notifier, rw remotewrite.RWClient, rr datasource.QuerierBuilder) {
func (g *Group) Start(ctx context.Context, rw remotewrite.RWClient, rr datasource.QuerierBuilder) {
defer func() { close(g.finishedCh) }()
evalTS := time.Now()
// sleep random duration to spread group rules evaluation
// over time to reduce the load on datasource.
// over maxStartDelay to reduce the load on datasource.
if !SkipRandSleepOnGroupStart {
sleepBeforeStart := delayBeforeStart(evalTS, g.GetID(), g.Interval, g.EvalOffset)
sleepBeforeStart := g.delayBeforeStart(evalTS, *maxStartDelay)
g.infof("will start in %v", sleepBeforeStart)
sleepTimer := time.NewTimer(sleepBeforeStart)
@@ -368,7 +369,6 @@ func (g *Group) Start(ctx context.Context, nts func() []notifier.Notifier, rw re
e := &executor{
Rw: rw,
Notifiers: nts,
notifierHeaders: g.NotifierHeaders,
}
@@ -475,20 +475,31 @@ func (g *Group) UpdateWith(newGroup *Group) {
g.updateCh <- newGroup
}
// if offset is specified, delayBeforeStart returns a duration to help aligning timestamp with offset;
// otherwise, it returns a random duration between [0..interval] based on group key.
func delayBeforeStart(ts time.Time, key uint64, interval time.Duration, offset *time.Duration) time.Duration {
if offset != nil {
currentOffsetPoint := ts.Truncate(interval).Add(*offset)
// delayBeforeStart returns duration for delaying the evaluation start
// based on given ts and Group settings. The delay can't exceed maxDelay.
// maxDelay is ignored if g.EvalOffset != nil.
//
// Delaying is important to smooth out the load on the datasource when all groups start at the same time.
// delayBeforeStart calculates delay based on Group ID, so all groups will start at different moments of time.
func (g *Group) delayBeforeStart(ts time.Time, maxDelay time.Duration) time.Duration {
if g.EvalOffset != nil {
// if offset is specified, ignore the maxDelay and return a duration aligned with offset
currentOffsetPoint := ts.Truncate(g.Interval).Add(*g.EvalOffset)
if currentOffsetPoint.Before(ts) {
// wait until the next offset point
return currentOffsetPoint.Add(interval).Sub(ts)
return currentOffsetPoint.Add(g.Interval).Sub(ts)
}
return currentOffsetPoint.Sub(ts)
}
// otherwise, return a random duration between [0..min(interval, maxDelay)] based on group ID
interval := g.Interval
if interval > maxDelay {
// artificially limit interval, so groups with big intervals could start sooner.
interval = maxDelay
}
var randSleep time.Duration
randSleep = time.Duration(float64(interval) * (float64(key) / (1 << 64)))
randSleep = time.Duration(float64(interval) * (float64(g.GetID()) / (1 << 64)))
sleepOffset := time.Duration(ts.UnixNano() % interval.Nanoseconds())
if randSleep < sleepOffset {
randSleep += interval
@@ -550,15 +561,13 @@ func (g *Group) Replay(start, end time.Time, rw remotewrite.RWClient, maxDataPoi
if !disableProgressBar {
bar = pb.StartNew(iterations * len(g.Rules))
}
for _, r := range g.Rules {
for i := range g.Rules {
rule := g.Rules[i]
sem <- struct{}{}
wg.Add(1)
go func(r Rule, ri rangeIterator) {
// pass ri as a copy, so it can be modified within the replayRuleRange
res <- replayRuleRange(r, ri, bar, rw, replayRuleRetryAttempts, ruleEvaluationConcurrency)
wg.Go(func() {
res <- replayRuleRange(rule, ri, bar, rw, replayRuleRetryAttempts, ruleEvaluationConcurrency)
<-sem
wg.Done()
}(r, ri)
})
}
wg.Wait()
@@ -588,10 +597,10 @@ func replayRuleRange(r Rule, ri rangeIterator, bar *pb.ProgressBar, rw remotewri
res := make(chan int, int(ri.end.Sub(ri.start)/ri.step)+1)
for ri.next() {
sem <- struct{}{}
wg.Add(1)
go func(s, e time.Time) {
n, err := replayRule(r, s, e, rw, replayRuleRetryAttempts)
start := ri.s
end := ri.e
wg.Go(func() {
n, err := replayRule(r, start, end, rw, replayRuleRetryAttempts)
if err != nil {
logger.Fatalf("rule %q: %s", r, err)
}
@@ -600,8 +609,7 @@ func replayRuleRange(r Rule, ri rangeIterator, bar *pb.ProgressBar, rw remotewri
}
res <- n
<-sem
wg.Done()
}(ri.s, ri.e)
})
}
wg.Wait()
close(res)
@@ -615,10 +623,9 @@ func replayRuleRange(r Rule, ri rangeIterator, bar *pb.ProgressBar, rw remotewri
}
// ExecOnce evaluates all the rules under group for once with given timestamp.
func (g *Group) ExecOnce(ctx context.Context, nts func() []notifier.Notifier, rw remotewrite.RWClient, evalTS time.Time) chan error {
func (g *Group) ExecOnce(ctx context.Context, rw remotewrite.RWClient, evalTS time.Time) chan error {
e := &executor{
Rw: rw,
Notifiers: nts,
notifierHeaders: g.NotifierHeaders,
}
if len(g.Rules) < 1 {
@@ -693,7 +700,6 @@ func (g *Group) getEvalDelay() time.Duration {
// executor contains group's notify and rw configs
type executor struct {
Notifiers func() []notifier.Notifier
notifierHeaders map[string]string
Rw remotewrite.RWClient
@@ -714,14 +720,13 @@ func (e *executor) execConcurrently(ctx context.Context, rules []Rule, ts time.T
sem := make(chan struct{}, concurrency)
go func() {
wg := sync.WaitGroup{}
for _, r := range rules {
for i := range rules {
rule := rules[i]
sem <- struct{}{}
wg.Add(1)
go func(r Rule) {
res <- e.exec(ctx, r, ts, resolveDuration, limit)
wg.Go(func() {
res <- e.exec(ctx, rule, ts, resolveDuration, limit)
<-sem
wg.Done()
}(r)
})
}
wg.Wait()
close(res)
@@ -775,17 +780,6 @@ func (e *executor) exec(ctx context.Context, r Rule, ts time.Time, resolveDurati
return nil
}
wg := sync.WaitGroup{}
errGr := new(vmalertutil.ErrGroup)
for _, nt := range e.Notifiers() {
wg.Add(1)
go func(nt notifier.Notifier) {
if err := nt.Send(ctx, alerts, e.notifierHeaders); err != nil {
errGr.Add(fmt.Errorf("rule %q: failed to send alerts to addr %q: %w", r, nt.Addr(), err))
}
wg.Done()
}(nt)
}
wg.Wait()
errGr := notifier.Send(ctx, alerts, e.notifierHeaders)
return errGr.Err()
}

View File

@@ -262,7 +262,7 @@ func TestUpdateDuringRandSleep(t *testing.T) {
updateCh: make(chan *Group),
}
g.Init()
go g.Start(context.Background(), nil, nil, nil)
go g.Start(context.Background(), nil, nil)
rule1 := AlertingRule{
Name: "jobDown",
@@ -346,7 +346,8 @@ func TestGroupStart(t *testing.T) {
}
fs := &datasource.FakeQuerier{}
fn := &notifier.FakeNotifier{}
fn, cleanup := notifier.InitFakeNotifier()
defer cleanup()
const evalInterval = time.Millisecond
g := NewGroup(groups[0], fs, evalInterval, map[string]string{"cluster": "east-1"})
@@ -395,7 +396,7 @@ func TestGroupStart(t *testing.T) {
fs.Add(m2)
g.Init()
go func() {
g.Start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil, fs)
g.Start(context.Background(), nil, fs)
close(finished)
}()
@@ -472,15 +473,10 @@ func TestFaultyNotifier(t *testing.T) {
r := newTestAlertingRule("instant", 0)
r.q = fq
fn := &notifier.FakeNotifier{}
e := &executor{
Notifiers: func() []notifier.Notifier {
return []notifier.Notifier{
&notifier.FaultyNotifier{},
fn,
}
},
}
fn, cleanup := notifier.InitFakeNotifier()
defer cleanup()
e := &executor{}
delay := 5 * time.Second
ctx, cancel := context.WithTimeout(context.Background(), delay)
defer cancel()
@@ -553,7 +549,7 @@ func TestCloseWithEvalInterruption(t *testing.T) {
g := NewGroup(groups[0], fq, evalInterval, nil)
g.Init()
go g.Start(context.Background(), nil, nil, nil)
go g.Start(context.Background(), nil, nil)
time.Sleep(evalInterval * 20)
@@ -571,9 +567,10 @@ func TestCloseWithEvalInterruption(t *testing.T) {
func TestGroupStartDelay(t *testing.T) {
g := &Group{}
g.id = uint64(math.MaxUint64 / 10)
// interval of 5min and key generate a static delay of 30s
g.Interval = time.Minute * 5
key := uint64(math.MaxUint64 / 10)
maxDelay := time.Minute * 5
f := func(atS, expS string) {
t.Helper()
@@ -585,7 +582,7 @@ func TestGroupStartDelay(t *testing.T) {
if err != nil {
t.Fatal(err)
}
delay := delayBeforeStart(at, key, g.Interval, g.EvalOffset)
delay := g.delayBeforeStart(at, maxDelay)
gotStart := at.Add(delay)
if expTS != gotStart {
t.Fatalf("expected to get %v; got %v instead", expTS, gotStart)
@@ -606,6 +603,15 @@ func TestGroupStartDelay(t *testing.T) {
f("2023-01-01T00:01:00.000+00:00", "2023-01-01T00:03:00.000+00:00")
f("2023-01-01T00:03:30.000+00:00", "2023-01-01T00:08:00.000+00:00")
f("2023-01-01T00:08:00.000+00:00", "2023-01-01T00:08:00.000+00:00")
maxDelay = time.Minute * 1
g.EvalOffset = nil
// test group with maxDelay, and offset disabled
f("2023-01-01T00:00:00.000+00:00", "2023-01-01T00:00:06.000+00:00")
f("2023-01-01T00:00:01.000+00:00", "2023-01-01T00:00:06.000+00:00")
f("2023-01-01T00:00:06.100+00:00", "2023-01-01T00:01:06.000+00:00")
f("2023-01-01T00:00:11.000+00:00", "2023-01-01T00:01:06.000+00:00")
}
func TestGetPrometheusReqTimestamp(t *testing.T) {

View File

@@ -2,6 +2,7 @@ package rule
import (
"context"
"errors"
"fmt"
"strings"
"time"
@@ -197,7 +198,7 @@ func (rr *RecordingRule) exec(ctx context.Context, ts time.Time, limit int) ([]p
defer func() {
rr.state.add(curState)
if curState.Err != nil {
if curState.Err != nil && !errors.Is(curState.Err, context.Canceled) {
rr.metrics.errors.Inc()
}
}()
@@ -236,7 +237,8 @@ func (rr *RecordingRule) exec(ctx context.Context, ts time.Time, limit int) ([]p
Labels: stringToLabels(k),
Samples: []prompb.Sample{
{Value: decimal.StaleNaN, Timestamp: ts.UnixNano() / 1e6},
}})
},
})
}
rr.lastEvaluation = curEvaluation
return tss, nil
@@ -291,6 +293,11 @@ func (rr *RecordingRule) toTimeSeries(m datasource.Metric) prompb.TimeSeries {
}
// add extra labels configured by user
for k := range rr.Labels {
// do not add label with empty value, since it has no meaning.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9984
if rr.Labels[k] == "" {
continue
}
existingLabel := promrelabel.GetLabelByName(m.Labels, k)
if existingLabel != nil { // there is a conflict between extra and existing label
if existingLabel.Value == rr.Labels[k] {

View File

@@ -209,15 +209,6 @@ func (ar *AlertingRule) AlertsToAPI() []*ApiAlert {
return alerts
}
// AlertToAPI generates apiAlert object from alert by its id(hash)
func (ar *AlertingRule) AlertToAPI(id uint64) *ApiAlert {
a := ar.GetAlert(id)
if a == nil {
return nil
}
return NewAlertAPI(ar, a)
}
// NewAlertAPI creates apiAlert for notifier.Alert
func NewAlertAPI(ar *AlertingRule, a *notifier.Alert) *ApiAlert {
aa := &ApiAlert{

View File

@@ -34,11 +34,12 @@ body {
padding-top: 4.5rem;
}
.group-items {
.vm-group {
cursor: pointer;
padding: 5px;
margin-top: 5px;
position: relative;
display: none;
}
.btn svg, .dropdown-item svg {
@@ -55,14 +56,22 @@ body {
height: 38px;
}
.group-items:not(:has(.sub-item:not(.d-none))) {
display: none !important;
.vm-item:not(.vm-found) {
display: none;
}
.group-items:hover {
.vm-group:has(.vm-item:is(.vm-found)), .vm-group:is(.vm-found) {
display: flex;
}
.vm-group:hover {
background-color: #f8f9fa!important;
}
.vm-group:is(.vm-found) .vm-item {
display: table-row;
}
.table {
table-layout: fixed;
}
@@ -111,3 +120,9 @@ textarea.curl-area {
.w-60 {
width: 60%;
}
.annotations {
white-space: pre-wrap;
color: gray;
word-wrap: break-word;
}

View File

@@ -65,32 +65,34 @@ function getParamURL(key) {
return url.searchParams.get(key)
}
function matchText(search, item) {
const text = item.innerText.toLowerCase();
return text.indexOf(search) >= 0;
}
function filterRules(searchPhrase) {
document.querySelectorAll('.sub-items').forEach((rules) => {
let found = false;
rules.querySelectorAll('.sub-item').forEach((rule) => {
if (searchPhrase) {
const ruleName = rule.innerText.toLowerCase();
const matches = []
const hasValue = ruleName.indexOf(searchPhrase) >= 0;
rule.querySelectorAll('.label').forEach((label) => {
const text = label.innerText.toLowerCase();
if (text.indexOf(searchPhrase) >= 0) {
matches.push(text);
}
});
if (!matches.length && !hasValue) {
rule.classList.add('d-none');
return;
}
document.querySelectorAll('.vm-group').forEach((group) => {
if (!searchPhrase) {
group.classList.add('vm-found');
return;
}
for (const item of group.querySelectorAll('.vm-group-search')) {
if (matchText(searchPhrase, item)) {
group.classList.add('vm-found');
return;
}
rule.classList.remove('d-none');
found = true;
});
if (found && searchPhrase || !searchPhrase) {
rules.classList.remove('d-none');
} else {
rules.classList.add('d-none');
}
group.classList.remove('vm-found');
for (const item of group.querySelectorAll('.vm-item')) {
if (matchText(searchPhrase, item)) {
item.classList.add('vm-found');
continue;
}
if (Array.from(item.querySelectorAll('.label')).find(l => matchText(searchPhrase, l))) {
item.classList.add('vm-found');
continue;
}
item.classList.remove('vm-found');
}
});
}

View File

@@ -485,6 +485,12 @@ func templateFuncs() textTpl.FuncMap {
/* Helpers */
// now returns the Unix timestamp in seconds at the time of the template evaluation.
// For example: {{ (now | toTime).Sub $activeAt }} will return the duration the alert has been active.
"now": func() float64 {
return float64(time.Now().Unix())
},
// Converts a list of objects to a map with keys arg0, arg1 etc.
// This is intended to allow multiple arguments to be passed to templates.
"args": func(args ...any) map[string]any {

View File

@@ -412,18 +412,18 @@ func (rh *requestHandler) groupAlerts() []rule.GroupAlerts {
defer rh.m.groupsMu.RUnlock()
var gAlerts []rule.GroupAlerts
for _, g := range rh.m.groups {
for _, group := range rh.m.groups {
var alerts []*rule.ApiAlert
g := group.ToAPI()
for _, r := range g.Rules {
a, ok := r.(*rule.AlertingRule)
if !ok {
if r.Type != rule.TypeAlerting {
continue
}
alerts = append(alerts, a.AlertsToAPI()...)
alerts = append(alerts, r.Alerts...)
}
if len(alerts) > 0 {
gAlerts = append(gAlerts, rule.GroupAlerts{
Group: g.ToAPI(),
Group: g,
Alerts: alerts,
})
}
@@ -444,12 +444,12 @@ func (rh *requestHandler) listAlerts(rf *rulesFilter) ([]byte, error) {
if !rf.matchesGroup(group) {
continue
}
for _, r := range group.Rules {
a, ok := r.(*rule.AlertingRule)
if !ok {
g := group.ToAPI()
for _, r := range g.Rules {
if r.Type != rule.TypeAlerting {
continue
}
lr.Data.Alerts = append(lr.Data.Alerts, a.AlertsToAPI()...)
lr.Data.Alerts = append(lr.Data.Alerts, r.Alerts...)
}
}

View File

@@ -114,14 +114,17 @@
{%= Controls(prefix, currentIcon, currentText, icons, filters, true) %}
{% if len(groups) > 0 %}
{% for _, g := range groups %}
<div id="group-{%s g.ID %}" class="d-flex w-100 border-0 flex-column group-items{% if g.Unhealthy > 0 %} alert-danger{% endif %}">
<div id="group-{%s g.ID %}" class="w-100 border-0 flex-column vm-group{% if g.Unhealthy > 0 %} alert-danger{% endif %}">
<span class="d-flex justify-content-between">
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s) #</a>
<a
class="vm-group-search"
href="#group-{%s g.ID %}"
>{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s) #</a>
<span
class="flex-grow-1 d-flex justify-content-end"
role="button"
data-bs-toggle="collapse"
data-bs-target="#sub-{%s g.ID %}"
data-bs-target="#item-{%s g.ID %}"
>
<span class="d-flex gap-2">
{% if g.Unhealthy > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d g.Unhealthy %}</span> {% endif %}
@@ -134,9 +137,9 @@
class="d-flex flex-column row-gap-2 mb-2"
role="button"
data-bs-toggle="collapse"
data-bs-target="#sub-{%s g.ID %}"
data-bs-target="#item-{%s g.ID %}"
>
<span class="fs-6 text-start w-100 fw-lighter">{%s g.File %}</span>
<span class="fs-6 text-start vm-group-search w-100 fw-lighter">{%s g.File %}</span>
{% if len(g.Params) > 0 %}
<span class="fs-6 text-start w-100 d-flex justify-content-between fw-lighter">
<span>Extra params</span>
@@ -158,7 +161,7 @@
</span>
{% endif %}
</span>
<div class="collapse sub-items" id="sub-{%s g.ID %}">
<div class="collapse" id="item-{%s g.ID %}">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
@@ -169,7 +172,7 @@
</thead>
<tbody>
{% for _, r := range g.Rules %}
<tr class="sub-item{% if r.LastError != "" %} alert-danger{% endif %}">
<tr class="vm-item{% if r.LastError != "" %} alert-danger{% endif %}">
<td>
<div class="row">
<div class="col-12 mb-2">
@@ -206,7 +209,12 @@
</div>
</td>
<td class="text-center">{%d r.LastSamples %}</td>
<td class="text-center">{%f.3 time.Since(r.LastEvaluation).Seconds() %}s ago</td>
<td class="text-center">{% if r.LastEvaluation.IsZero() %}
Never
{% else %}
{%f.3 time.Since(r.LastEvaluation).Seconds() %}s ago
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
@@ -241,14 +249,14 @@
}
sort.Strings(keys)
%}
<div class="d-flex w-100 flex-column group-items alert-danger">
<div class="w-100 flex-column vm-group alert-danger">
<span id="group-{%s g.ID %}" class="d-flex justify-content-between">
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %}</a>
<span
class="flex-grow-1 d-flex justify-content-end"
role="button"
data-bs-toggle="collapse"
data-bs-target="#sub-{%s g.ID %}"
data-bs-target="#item-{%s g.ID %}"
>
<span class="badge bg-danger" title="Number of active alerts">{%d len(ga.Alerts) %}</span>
</span>
@@ -258,10 +266,10 @@
class="fs-6 text-start w-100 fw-lighter"
role="button"
data-bs-toggle="collapse"
data-bs-target="#sub-{%s g.ID %}"
data-bs-target="#item-{%s g.ID %}"
>{%s g.File %}</span>
</span>
<div class="collapse sub-items" id="sub-{%s g.ID %}">
<div class="collapse" id="item-{%s g.ID %}">
{% for _, ruleID := range keys %}
{%code
defaultAR := alertsByRule[ruleID][0]
@@ -272,7 +280,7 @@
sort.Strings(labelKeys)
%}
<br>
<div class="sub-item">
<div class="vm-item">
<b>alert:</b> {%s defaultAR.Name %} ({%d len(alertsByRule[ruleID]) %})
| <span><a target="_blank" href="{%s defaultAR.SourceLink %}">Source</a></span>
<br>
@@ -337,20 +345,20 @@
typeK, ns := keys[i], targets[notifier.TargetType(keys[i])]
count := len(ns)
%}
<div class="d-flex w-100 flex-column group-items">
<div class="w-100 flex-column vm-group">
<span class="d-flex justify-content-between" id="group-{%s typeK %}">
<a href="#group-{%s typeK %}">{%s typeK %} ({%d count %})</a>
<span
class="flex-grow-1"
role="button"
data-bs-toggle="collapse"
data-bs-target="#sub-{%s typeK %}"
data-bs-target="#item-{%s typeK %}"
></span>
</span>
<div id="sub-{%s typeK %}" class="collapse show sub-items">
<div id="item-{%s typeK %}" class="collapse show">
<table class="table table-striped table-hover table-sm">
<thead>
<tr class="sub-item">
<tr class="vm-item">
<th scope="col">Labels</th>
<th scope="col">Address</th>
</tr>
@@ -435,7 +443,7 @@
<div class="col">
{% for _, k := range annotationKeys %}
<b>{%s k %}:</b><br>
<p>{%s alert.Annotations[k] %}</p>
<p class="annotations">{%s alert.Annotations[k] %}</p>
{% endfor %}
</div>
</div>
@@ -549,7 +557,7 @@
<div class="col">
{% for _, k := range annotationKeys %}
<b>{%s k %}:</b><br>
<p>{%s rule.Annotations[k] %}</p>
<p class="annotations">{%s rule.Annotations[k] %}</p>
{% endfor %}
</div>
</div>
@@ -594,11 +602,11 @@
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" title="The time when the rule was executed">Updated at</th>
<th scope="col" class="w-10 text-center" title="How many series expression returns. Each series will represent an alert.">Series returned</th>
{% if seriesFetchedEnabled %}<th scope="col" class="w-10 text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" class="w-10 text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="The time used in execution query request">Execution timestamp</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
</tr>
</thead>

File diff suppressed because it is too large Load Diff

View File

@@ -23,6 +23,9 @@ func TestHandler(t *testing.T) {
Timestamps: []int64{0},
})
m := &manager{groups: map[uint64]*rule.Group{}}
_, cleanup := notifier.InitFakeNotifier()
defer cleanup()
var ar *rule.AlertingRule
var rr *rule.RecordingRule
var groupIDs []uint64
@@ -45,7 +48,7 @@ func TestHandler(t *testing.T) {
}, fq, 1*time.Minute, nil)
ar = g.Rules[0].(*rule.AlertingRule)
rr = g.Rules[1].(*rule.RecordingRule)
g.ExecOnce(context.Background(), func() []notifier.Notifier { return nil }, nil, time.Time{})
g.ExecOnce(context.Background(), nil, time.Time{})
id := g.CreateID()
m.groups[id] = g
groupIDs = append(groupIDs, id)

View File

@@ -27,6 +27,9 @@ vmauth-linux-ppc64le-prod:
vmauth-linux-386-prod:
APP_NAME=vmauth $(MAKE) app-via-docker-linux-386
vmauth-linux-s390x-prod:
APP_NAME=vmauth $(MAKE) app-via-docker-linux-s390x
vmauth-darwin-amd64-prod:
APP_NAME=vmauth $(MAKE) app-via-docker-darwin-amd64

View File

@@ -4,6 +4,7 @@ import (
"bytes"
"context"
"encoding/base64"
"errors"
"flag"
"fmt"
"math"
@@ -94,6 +95,8 @@ type UserInfo struct {
rt http.RoundTripper
requests *metrics.Counter
requestErrors *metrics.Counter
backendRequests *metrics.Counter
backendErrors *metrics.Counter
requestsDuration *metrics.Summary
}
@@ -105,13 +108,29 @@ type HeadersConf struct {
KeepOriginalHost *bool `yaml:"keep_original_host,omitempty"`
}
func (ui *UserInfo) beginConcurrencyLimit() error {
func (ui *UserInfo) beginConcurrencyLimit(ctx context.Context) error {
select {
case ui.concurrencyLimitCh <- struct{}{}:
return nil
default:
ui.concurrencyLimitReached.Inc()
return fmt.Errorf("cannot handle more than %d concurrent requests from user %s", ui.getMaxConcurrentRequests(), ui.name())
// The per-user limit for the number of concurrent requests is reached.
// Wait until the currently executed requests are finished, so the current request could be executed.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
select {
case ui.concurrencyLimitCh <- struct{}{}:
return nil
case <-ctx.Done():
err := ctx.Err()
if errors.Is(err, context.DeadlineExceeded) {
return fmt.Errorf("cannot start executing the request during -maxQueueDuration=%s because %d concurrent requests from the user %s are executed",
*maxQueueDuration, ui.getMaxConcurrentRequests(), ui.name())
}
return fmt.Errorf("cannot start executing the request because %d concurrent requests from the user %s are executed: %w",
ui.getMaxConcurrentRequests(), ui.name(), err)
}
}
}
@@ -127,6 +146,18 @@ func (ui *UserInfo) getMaxConcurrentRequests() int {
return mcr
}
func (ui *UserInfo) stopHealthChecks() {
if ui == nil {
return
}
if ui.URLPrefix == nil {
return
}
bus := ui.URLPrefix.bus.Load()
bus.stopHealthChecks()
}
// Header is `Name: Value` http header, which must be added to the proxied request.
type Header struct {
Name string
@@ -262,7 +293,7 @@ type URLPrefix struct {
// the list of backend urls
//
// the list can be dynamically updated if `discover_backend_ips` option is set.
bus atomic.Pointer[[]*backendURL]
bus atomic.Pointer[backendURLs]
// if this option is set, then backend ips for busOriginal are periodically re-discovered and put to bus.
discoverBackendIPs bool
@@ -286,21 +317,93 @@ func (up *URLPrefix) setLoadBalancingPolicy(loadBalancingPolicy string) error {
}
}
type backendURLs struct {
healthChecksContext context.Context
healthChecksCancel func()
healthChecksWG sync.WaitGroup
bus []*backendURL
}
func newBackendURLs() *backendURLs {
ctx, cancel := context.WithCancel(context.Background())
return &backendURLs{
healthChecksContext: ctx,
healthChecksCancel: cancel,
}
}
func (bus *backendURLs) add(u *url.URL) {
bus.bus = append(bus.bus, &backendURL{
url: u,
healthCheckContext: bus.healthChecksContext,
healthCheckWG: &bus.healthChecksWG,
})
}
func (bus *backendURLs) stopHealthChecks() {
bus.healthChecksCancel()
bus.healthChecksWG.Wait()
}
type backendURL struct {
brokenDeadline atomic.Uint64
broken atomic.Bool
healthCheckContext context.Context
healthCheckWG *sync.WaitGroup
concurrentRequests atomic.Int32
url *url.URL
}
func (bu *backendURL) isBroken() bool {
ct := fasttime.UnixTimestamp()
return ct < bu.brokenDeadline.Load()
return bu.broken.Load()
}
func (bu *backendURL) setBroken() {
deadline := fasttime.UnixTimestamp() + uint64((*failTimeout).Seconds())
bu.brokenDeadline.Store(deadline)
if bu.broken.CompareAndSwap(false, true) {
bu.healthCheckWG.Add(1)
go func() {
defer bu.healthCheckWG.Done()
bu.runHealthCheck()
bu.broken.Store(false)
}()
}
}
func (bu *backendURL) runHealthCheck() {
port := bu.url.Port()
if port == "" {
port = "80"
}
addr := net.JoinHostPort(bu.url.Hostname(), port)
t := time.NewTicker(*failTimeout)
defer t.Stop()
for {
select {
case <-t.C:
// Verify network connectivity via TCP dial before marking backend healthy.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9997
ctx, cancel := context.WithTimeout(bu.healthCheckContext, time.Second)
c, err := netutil.Dialer.DialContext(ctx, "tcp", addr)
cancel()
if err != nil {
if errors.Is(bu.healthCheckContext.Err(), context.Canceled) {
return
}
logger.Warnf("ignoring the backend at %s for %s becasue of dial error: %s", addr, *failTimeout, err)
continue
}
_ = c.Close()
return
case <-bu.healthCheckContext.Done():
return
}
}
}
func (bu *backendURL) get() {
@@ -312,8 +415,8 @@ func (bu *backendURL) put() {
}
func (up *URLPrefix) getBackendsCount() int {
pbus := up.bus.Load()
return len(*pbus)
bus := up.bus.Load()
return len(bus.bus)
}
// getBackendURL returns the backendURL depending on the load balance policy.
@@ -324,16 +427,15 @@ func (up *URLPrefix) getBackendsCount() int {
func (up *URLPrefix) getBackendURL() *backendURL {
up.discoverBackendAddrsIfNeeded()
pbus := up.bus.Load()
bus := *pbus
if len(bus) == 0 {
bus := up.bus.Load()
if len(bus.bus) == 0 {
return nil
}
if up.loadBalancingPolicy == "first_available" {
return getFirstAvailableBackendURL(bus)
return getFirstAvailableBackendURL(bus.bus)
}
return getLeastLoadedBackendURL(bus, &up.n)
return getLeastLoadedBackendURL(bus.bus, &up.n)
}
func (up *URLPrefix) discoverBackendAddrsIfNeeded() {
@@ -407,25 +509,24 @@ func (up *URLPrefix) discoverBackendAddrsIfNeeded() {
cancel()
// generate new backendURLs for the resolved IPs
var busNew []*backendURL
busNew := newBackendURLs()
for _, bu := range up.busOriginal {
host := bu.Hostname()
for _, addr := range hostToAddrs[host] {
buCopy := *bu
buCopy.Host = addr
busNew = append(busNew, &backendURL{
url: &buCopy,
})
busNew.add(&buCopy)
}
}
pbus := up.bus.Load()
if areEqualBackendURLs(*pbus, busNew) {
bus := up.bus.Load()
if areEqualBackendURLs(bus.bus, busNew.bus) {
return
}
// Store new backend urls
up.bus.Store(&busNew)
up.bus.Store(busNew)
bus.stopHealthChecks()
}
func areEqualBackendURLs(a, b []*backendURL) bool {
@@ -456,20 +557,23 @@ func getFirstAvailableBackendURL(bus []*backendURL) *backendURL {
for i := 1; i < len(bus); i++ {
if !bus[i].isBroken() {
bu = bus[i]
break
bu.get()
return bu
}
}
bu.get()
return bu
return nil
}
// getLeastLoadedBackendURL returns the backendURL with the minimum number of concurrent requests.
// getLeastLoadedBackendURL returns a non-broken backendURL with the lowest number of concurrent requests.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func getLeastLoadedBackendURL(bus []*backendURL, atomicCounter *atomic.Uint32) *backendURL {
if len(bus) == 1 {
// Fast path - return the only backend url.
bu := bus[0]
if bu.isBroken() {
return nil
}
bu.get()
return bu
}
@@ -482,27 +586,37 @@ func getLeastLoadedBackendURL(bus []*backendURL, atomicCounter *atomic.Uint32) *
if bu.isBroken() {
continue
}
if bu.concurrentRequests.Load() == 0 {
// Fast path - return the backend with zero concurrently executed requests.
// Do not use CompareAndSwap() instead of Load(), since it is much slower on systems with many CPU cores.
bu.concurrentRequests.Add(1)
// The Load() in front of CompareAndSwap() avoids CAS overhead for items with values bigger than 0.
if bu.concurrentRequests.Load() == 0 && bu.concurrentRequests.CompareAndSwap(0, 1) {
atomicCounter.CompareAndSwap(n+1, idx+1)
// There is no need in the call bu.get(), because we already incremented bu.concrrentRequests above.
return bu
}
}
// Slow path - return the backend with the minimum number of concurrently executed requests.
buMin := bus[n%uint32(len(bus))]
minRequests := buMin.concurrentRequests.Load()
for _, bu := range bus {
buMinIdx := n % uint32(len(bus))
minRequests := bus[buMinIdx].concurrentRequests.Load()
for i := uint32(1); i < uint32(len(bus)); i++ {
idx := (n + i) % uint32(len(bus))
bu := bus[idx]
if bu.isBroken() {
continue
}
if n := bu.concurrentRequests.Load(); n < minRequests || buMin.isBroken() {
buMin = bu
minRequests = n
reqs := bu.concurrentRequests.Load()
if reqs < minRequests || bus[buMinIdx].isBroken() {
buMinIdx = idx
minRequests = reqs
}
}
buMin := bus[buMinIdx]
if buMin.isBroken() {
return nil
}
buMin.get()
atomicCounter.CompareAndSwap(n+1, buMinIdx+1)
return buMin
}
@@ -725,6 +839,11 @@ func reloadAuthConfigData(data []byte) (bool, error) {
acPrev := authConfig.Load()
if acPrev != nil {
acPrev.UnauthorizedUser.stopHealthChecks()
for i := range acPrev.Users {
acPrev.Users[i].stopHealthChecks()
}
metrics.UnregisterSet(acPrev.ms, true)
}
metrics.RegisterSet(ac.ms)
@@ -771,6 +890,8 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
return nil, fmt.Errorf("cannot parse metric_labels for unauthorized_user: %w", err)
}
ui.requests = ac.ms.NewCounter(`vmauth_unauthorized_user_requests_total` + metricLabels)
ui.requestErrors = ac.ms.NewCounter(`vmauth_unauthorized_user_request_errors_total` + metricLabels)
ui.backendRequests = ac.ms.NewCounter(`vmauth_unauthorized_user_request_backend_requests_total` + metricLabels)
ui.backendErrors = ac.ms.NewCounter(`vmauth_unauthorized_user_request_backend_errors_total` + metricLabels)
ui.requestsDuration = ac.ms.NewSummary(`vmauth_unauthorized_user_request_duration_seconds` + metricLabels)
ui.concurrencyLimitCh = make(chan struct{}, ui.getMaxConcurrentRequests())
@@ -819,6 +940,8 @@ func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
return nil, fmt.Errorf("cannot parse metric_labels: %w", err)
}
ui.requests = ac.ms.GetOrCreateCounter(`vmauth_user_requests_total` + metricLabels)
ui.requestErrors = ac.ms.GetOrCreateCounter(`vmauth_user_request_errors_total` + metricLabels)
ui.backendRequests = ac.ms.GetOrCreateCounter(`vmauth_user_request_backend_requests_total` + metricLabels)
ui.backendErrors = ac.ms.GetOrCreateCounter(`vmauth_user_request_backend_errors_total` + metricLabels)
ui.requestsDuration = ac.ms.GetOrCreateSummary(`vmauth_user_request_duration_seconds` + metricLabels)
mcr := ui.getMaxConcurrentRequests()
@@ -1053,13 +1176,11 @@ func (up *URLPrefix) sanitizeAndInitialize() error {
}
// Initialize up.bus
bus := make([]*backendURL, len(up.busOriginal))
for i, bu := range up.busOriginal {
bus[i] = &backendURL{
url: bu,
}
bus := newBackendURLs()
for _, bu := range up.busOriginal {
bus.add(bu)
}
up.bus.Store(&bus)
up.bus.Store(bus)
return nil
}

View File

@@ -752,10 +752,12 @@ func TestGetLeastLoadedBackendURL(t *testing.T) {
})
up.loadBalancingPolicy = "least_loaded"
pbus := up.bus.Load()
bus := pbus.bus
fn := func(ns ...int) {
t.Helper()
pbus := up.bus.Load()
bus := *pbus
for i, b := range bus {
got := int(b.concurrentRequests.Load())
exp := ns[i]
@@ -767,45 +769,52 @@ func TestGetLeastLoadedBackendURL(t *testing.T) {
up.getBackendURL()
fn(1, 0, 0)
up.getBackendURL()
fn(1, 1, 0)
up.getBackendURL()
fn(1, 1, 1)
up.getBackendURL()
up.getBackendURL()
fn(2, 2, 1)
bus := up.bus.Load()
pbus := *bus
pbus[0].concurrentRequests.Add(2)
pbus[2].concurrentRequests.Add(5)
fn(4, 2, 6)
bus[1].put()
bus[2].put()
fn(1, 0, 0)
up.getBackendURL()
fn(4, 3, 6)
fn(1, 1, 0)
bus[1].put()
up.getBackendURL()
fn(4, 4, 6)
up.getBackendURL()
fn(4, 5, 6)
up.getBackendURL()
fn(5, 5, 6)
up.getBackendURL()
fn(6, 5, 6)
up.getBackendURL()
fn(6, 6, 6)
up.getBackendURL()
fn(6, 6, 7)
fn(1, 0, 1)
up.getBackendURL()
up.getBackendURL()
fn(7, 7, 7)
fn(1, 1, 2)
bus[0].concurrentRequests.Add(2)
bus[2].concurrentRequests.Add(2)
fn(3, 1, 4)
up.getBackendURL()
fn(3, 2, 4)
up.getBackendURL()
fn(3, 3, 4)
up.getBackendURL()
fn(4, 3, 4)
up.getBackendURL()
fn(4, 4, 4)
bus[0].put()
bus[2].put()
up.getBackendURL()
fn(3, 4, 4)
up.getBackendURL()
fn(4, 4, 4)
}
func TestBrokenBackend(t *testing.T) {
@@ -816,7 +825,7 @@ func TestBrokenBackend(t *testing.T) {
})
up.loadBalancingPolicy = "least_loaded"
pbus := up.bus.Load()
bus := *pbus
bus := pbus.bus
// explicitly mark one of the backends as broken
bus[1].setBroken()
@@ -839,7 +848,7 @@ func TestDiscoverBackendIPsWithIPV6(t *testing.T) {
up.discoverBackendAddrsIfNeeded()
pbus := up.bus.Load()
bus := *pbus
bus := pbus.bus
if len(bus) != 1 {
t.Fatalf("expected url list to be of size 1; got %d instead", len(bus))
@@ -933,16 +942,14 @@ func mustParseURL(u string) *URLPrefix {
}
func mustParseURLs(us []string) *URLPrefix {
bus := make([]*backendURL, len(us))
bus := newBackendURLs()
urls := make([]*url.URL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
bus[i] = &backendURL{
url: pu,
}
bus.add(pu)
urls[i] = pu
}
up := &URLPrefix{}
@@ -951,7 +958,7 @@ func mustParseURLs(us []string) *URLPrefix {
} else {
up.vOriginal = us
}
up.bus.Store(&bus)
up.bus.Store(bus)
up.busOriginal = urls
return up
}

View File

@@ -44,12 +44,17 @@ var (
"See also -maxConcurrentRequests")
idleConnTimeout = flag.Duration("idleConnTimeout", 50*time.Second, "The timeout for HTTP keep-alive connections to backend services. "+
"It is recommended setting this value to values smaller than -http.idleConnTimeout set at backend services")
responseTimeout = flag.Duration("responseTimeout", 5*time.Minute, "The timeout for receiving a response from backend")
responseTimeout = flag.Duration("responseTimeout", 5*time.Minute, "The timeout for receiving a response from backend")
maxConcurrentRequests = flag.Int("maxConcurrentRequests", 1000, "The maximum number of concurrent requests vmauth can process. Other requests are rejected with "+
"'429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options")
"'429 Too Many Requests' http status code. See also -maxQueueDuration, -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options")
maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
"in per-user config")
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxQueueDuration and -maxConcurrentRequests command-line options "+
"and max_concurrent_requests option in per-user config")
maxQueueDuration = flag.Duration("maxQueueDuration", 10*time.Second, "The maximum duration the request waits for execution when the number of concurrently executed "+
"requests reach -maxConcurrentRequests or -maxConcurrentPerUserRequests before returning '429 Too Many Requests' error. "+
"This allows graceful handling of short spikes in the number of concurrent requests")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed via authKey query arg. It overrides -httpAuth.*")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
@@ -151,7 +156,6 @@ func requestHandlerWithInternalRoutes(w http.ResponseWriter, r *http.Request) bo
}
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
ats := getAuthTokensFromRequest(r)
if len(ats) == 0 {
// Process requests for unauthorized users
@@ -208,20 +212,45 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
ui.requests.Inc()
ctx, cancel := context.WithTimeout(r.Context(), *maxQueueDuration)
defer cancel()
// Limit the concurrency of requests to backends
concurrencyLimitOnce.Do(concurrencyLimitInit)
select {
case concurrencyLimitCh <- struct{}{}:
if err := ui.beginConcurrencyLimit(); err != nil {
if err := ui.beginConcurrencyLimit(ctx); err != nil {
handleConcurrencyLimitError(w, r, err)
<-concurrencyLimitCh
return
}
default:
concurrentRequestsLimitReached.Inc()
err := fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh))
handleConcurrencyLimitError(w, r, err)
return
// The -maxConcurrentRequests are executed. Wait until some of the requests are finished,
// so the current request could be executed.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10078
select {
case concurrencyLimitCh <- struct{}{}:
if err := ui.beginConcurrencyLimit(ctx); err != nil {
handleConcurrencyLimitError(w, r, err)
<-concurrencyLimitCh
return
}
case <-ctx.Done():
err := ctx.Err()
concurrentRequestsLimitReached.Inc()
if errors.Is(err, context.DeadlineExceeded) {
err = fmt.Errorf("cannot start executing the request during -maxQueueDuration=%s because -maxConcurrentRequests=%d concurrent requests are executed",
*maxQueueDuration, cap(concurrencyLimitCh))
handleConcurrencyLimitError(w, r, err)
return
}
err = fmt.Errorf("cannot start executing the request because -maxConcurrentRequests=%d concurrent requests are executed: %w", cap(concurrencyLimitCh), err)
handleConcurrencyLimitError(w, r, err)
return
}
}
processRequest(w, r, ui)
ui.endConcurrencyLimit()
@@ -285,16 +314,18 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
return
}
bu.setBroken()
ui.backendErrors.Inc()
}
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("all the %d backends for the user %q are unavailable", up.getBackendsCount(), ui.name()),
StatusCode: http.StatusBadGateway,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
ui.requestErrors.Inc()
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int, ui *UserInfo) (bool, bool) {
ui.backendRequests.Inc()
req := sanitizeRequestHeaders(r)
req.URL = targetURL
@@ -310,15 +341,21 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
rtb, rtbOK := req.Body.(*readTrackingBody)
res, err := ui.rt.RoundTrip(req)
if ctxErr := r.Context().Err(); ctxErr != nil {
// Override the error returned by the RoundTrip with the context error if it isn't non-nil
// This makes sure the proper logging for canceled and timed out requests - log the real cause of the error
// instead of the random error, which could be returned from RoundTrip because of canceled or timed out request.
err = ctxErr
}
if err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
// Do not retry canceled or timed out requests
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
if errors.Is(err, context.DeadlineExceeded) {
// Timed out request must be counted as errors, since this usually means that the backend is slow.
ui.backendErrors.Inc()
logger.Warnf("remoteAddr: %s; requestURI: %s; timeout while proxying the response from %s: %s", remoteAddr, requestURI, targetURL, err)
}
return false, false
}
@@ -330,6 +367,7 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
ui.requestErrors.Inc()
return true, false
}
if netutil.IsTrivialNetworkError(err) {
@@ -337,11 +375,11 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
return false, true
}
// Retry the request if its body wasn't read yet. This usually means that the backend isn't reachable.
// Request body wasn't read yet, this usually means that the backend isn't reachable; retry the request at another backend
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
// NOTE: do not use httpserver.GetRequestURI
// it explicitly reads request body, which may fail retries.
logger.Warnf("remoteAddr: %s; requestURI: %s; retrying the request to %s because of response error: %s", remoteAddr, req.URL, targetURL, err)
logger.Warnf("remoteAddr: %s; requestURI: %s; request to %s failed: %s, retrying the request at another backend", remoteAddr, req.URL, targetURL, err)
return false, false
}
if slices.Contains(retryStatusCodes, res.StatusCode) {
@@ -350,12 +388,13 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
// If we get an error from the retry_status_codes list, but cannot execute retry,
// we consider such a request an error as well.
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("got response status code=%d from %s, but cannot retry the request on another backend, because the request has been already consumed",
Err: fmt.Errorf("got response status code=%d from %s, but cannot retry the request at another backend, because the request has been already consumed",
res.StatusCode, targetURL),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
ui.requestErrors.Inc()
return true, false
}
// Retry requests at other backends if it matches retryStatusCodes.
@@ -363,7 +402,7 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
// NOTE: do not use httpserver.GetRequestURI
// it explicitly reads request body, which may fail retries.
logger.Warnf("remoteAddr: %s; requestURI: %s; retrying the request to %s because response status code=%d belongs to retry_status_codes=%d",
logger.Warnf("remoteAddr: %s; requestURI: %s; request to %s failed, retrying the request at another backend because response status code=%d belongs to retry_status_codes=%d",
remoteAddr, req.URL, targetURL, res.StatusCode, retryStatusCodes)
return false, false
}
@@ -379,6 +418,7 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
ui.requestErrors.Inc()
return true, false
}
return true, false
@@ -589,6 +629,13 @@ func handleMissingAuthorizationError(w http.ResponseWriter) {
}
func handleConcurrencyLimitError(w http.ResponseWriter, r *http.Request, err error) {
ctx := r.Context()
if errors.Is(ctx.Err(), context.Canceled) {
// Do not return any response for the request canceled by the client,
// since the connection to the client is already closed.
return
}
w.Header().Add("Retry-After", "10")
err = &httpserver.ErrorWithStatusCode{
Err: err,
@@ -645,6 +692,7 @@ type zeroReader struct{}
func (r *zeroReader) Read(_ []byte) (int, error) {
return 0, io.EOF
}
func (r *zeroReader) Close() error {
return nil
}

View File

@@ -31,6 +31,9 @@ vmbackup-linux-ppc64le-prod:
vmbackup-linux-386-prod:
APP_NAME=vmbackup EXTRA_GO_BUILD_TAGS=$(VMBACKUP_GO_BUILD_TAGS) $(MAKE) app-via-docker-linux-386
vmbackup-linux-s390x-prod:
APP_NAME=vmbackup EXTRA_GO_BUILD_TAGS=$(VMBACKUP_GO_BUILD_TAGS) $(MAKE) app-via-docker-linux-s390x
vmbackup-darwin-amd64-prod:
APP_NAME=vmbackup EXTRA_GO_BUILD_TAGS=$(VMBACKUP_GO_BUILD_TAGS) $(MAKE) app-via-docker-darwin-amd64

View File

@@ -212,7 +212,7 @@ func newSrcFS() (*fslocal.FS, error) {
}
func newDstFS(ctx context.Context) (common.RemoteFS, error) {
fs, err := actions.NewRemoteFS(ctx, *dst)
fs, err := actions.NewRemoteFS(ctx, *dst, nil)
if err != nil {
return nil, fmt.Errorf("cannot parse `-dst`=%q: %w", *dst, err)
}
@@ -255,7 +255,7 @@ func newOriginFS(ctx context.Context) (common.OriginFS, error) {
if len(*origin) == 0 {
return &fsnil.FS{}, nil
}
fs, err := actions.NewRemoteFS(ctx, *origin)
fs, err := actions.NewRemoteFS(ctx, *origin, nil)
if err != nil {
return nil, fmt.Errorf("cannot parse `-origin`=%q: %w", *origin, err)
}
@@ -266,7 +266,7 @@ func newRemoteOriginFS(ctx context.Context) (common.RemoteFS, error) {
if len(*origin) == 0 {
return nil, fmt.Errorf("-origin cannot be empty when -snapshotName and -snapshot.createURL aren't set")
}
fs, err := actions.NewRemoteFS(ctx, *origin)
fs, err := actions.NewRemoteFS(ctx, *origin, nil)
if err != nil {
return nil, fmt.Errorf("cannot parse `-origin`=%q: %w", *origin, err)
}

View File

@@ -27,6 +27,9 @@ vmctl-linux-ppc64le-prod:
vmctl-linux-386-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-linux-386
vmctl-linux-s390x-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-linux-s390x
vmctl-darwin-amd64-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-darwin-amd64

View File

@@ -689,15 +689,15 @@ var (
Usage: "The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'",
Layout: time.RFC3339,
},
&cli.StringFlag{
Name: remoteReadFilterLabel,
Usage: "Prometheus label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.",
Value: "__name__",
&cli.StringSliceFlag{
Name: remoteReadFilterLabel,
Usage: "Prometheus label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.",
DefaultText: "__name__",
},
&cli.StringFlag{
Name: remoteReadFilterLabelValue,
Usage: fmt.Sprintf("Prometheus regular expression to filter label from %q flag.", remoteReadFilterLabelValue),
Value: ".*",
&cli.StringSliceFlag{
Name: remoteReadFilterLabelValue,
Usage: fmt.Sprintf("Prometheus regular expression to filter label from %q flag.", remoteReadFilterLabelValue),
DefaultText: ".*",
},
&cli.BoolFlag{
Name: remoteRead,

View File

@@ -1,6 +1,7 @@
package main
import (
"context"
"fmt"
"io"
"log"
@@ -37,7 +38,7 @@ func newInfluxProcessor(ic *influx.Client, im *vm.Importer, cc int, separator st
}
}
func (ip *influxProcessor) run() error {
func (ip *influxProcessor) run(ctx context.Context) error {
series, err := ip.ic.Explore()
if err != nil {
return fmt.Errorf("explore query failed: %s", err)
@@ -47,7 +48,7 @@ func (ip *influxProcessor) run() error {
}
question := fmt.Sprintf("Found %d timeseries to import. Continue?", len(series))
if !prompt(question) {
if !prompt(ctx, question) {
return nil
}

View File

@@ -103,7 +103,7 @@ func main() {
}
otsdbProcessor := newOtsdbProcessor(otsdbClient, importer, c.Int(otsdbConcurrency), c.Bool(globalVerbose))
return otsdbProcessor.run()
return otsdbProcessor.run(ctx)
},
},
{
@@ -164,7 +164,7 @@ func main() {
c.Bool(influxSkipDatabaseLabel),
c.Bool(influxPrometheusMode),
c.Bool(globalVerbose))
return processor.run()
return processor.run(ctx)
},
},
{
@@ -192,6 +192,14 @@ func main() {
return fmt.Errorf("failed to create transport for -%s=%q: %s", remoteReadSrcAddr, addr, err)
}
// Backwards compatible default values if none provided by user
rrLabelNames := c.StringSlice(remoteReadFilterLabel)
rrLabelValues := c.StringSlice(remoteReadFilterLabelValue)
if len(rrLabelNames) == 0 && len(rrLabelValues) == 0 {
rrLabelNames = []string{"__name__"}
rrLabelValues = []string{".*"}
}
rr, err := remoteread.NewClient(remoteread.Config{
Addr: addr,
Transport: tr,
@@ -200,8 +208,8 @@ func main() {
Timeout: c.Duration(remoteReadHTTPTimeout),
UseStream: c.Bool(remoteReadUseStream),
Headers: c.String(remoteReadHeaders),
LabelName: c.String(remoteReadFilterLabel),
LabelValue: c.String(remoteReadFilterLabelValue),
LabelNames: rrLabelNames,
LabelValues: rrLabelValues,
DisablePathAppend: c.Bool(remoteReadDisablePathAppend),
})
if err != nil {
@@ -271,7 +279,7 @@ func main() {
cc: c.Int(promConcurrency),
isVerbose: c.Bool(globalVerbose),
}
return pp.run()
return pp.run(ctx)
},
},
{

View File

@@ -1,6 +1,7 @@
package main
import (
"context"
"fmt"
"log"
"sync"
@@ -37,7 +38,7 @@ func newOtsdbProcessor(oc *opentsdb.Client, im *vm.Importer, otsdbcc int, verbos
}
}
func (op *otsdbProcessor) run() error {
func (op *otsdbProcessor) run(ctx context.Context) error {
log.Println("Loading all metrics from OpenTSDB for filters: ", op.oc.Filters)
var metrics []string
for _, filter := range op.oc.Filters {
@@ -53,7 +54,7 @@ func (op *otsdbProcessor) run() error {
}
question := fmt.Sprintf("Found %d metrics to import. Continue?", len(metrics))
if !prompt(question) {
if !prompt(ctx, question) {
return nil
}
op.im.ResetStats()

View File

@@ -1,6 +1,7 @@
package main
import (
"context"
"fmt"
"log"
"sync"
@@ -30,7 +31,7 @@ type prometheusProcessor struct {
isVerbose bool
}
func (pp *prometheusProcessor) run() error {
func (pp *prometheusProcessor) run(ctx context.Context) error {
blocks, err := pp.cl.Explore()
if err != nil {
return fmt.Errorf("explore failed: %s", err)
@@ -39,7 +40,7 @@ func (pp *prometheusProcessor) run() error {
return fmt.Errorf("found no blocks to import")
}
question := fmt.Sprintf("Found %d blocks to import. Continue?", len(blocks))
if !prompt(question) {
if !prompt(ctx, question) {
return nil
}

View File

@@ -47,7 +47,7 @@ func (rrp *remoteReadProcessor) run(ctx context.Context) error {
question := fmt.Sprintf("Selected time range %q - %q will be split into %d ranges according to %q step. Continue?",
rrp.filter.timeStart.String(), rrp.filter.timeEnd.String(), len(ranges), rrp.filter.chunk)
if !prompt(question) {
if !prompt(ctx, question) {
return nil
}

View File

@@ -11,14 +11,15 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/gogo/protobuf/proto"
"github.com/golang/snappy"
"github.com/prometheus/prometheus/config"
"github.com/prometheus/prometheus/prompb"
"github.com/prometheus/prometheus/storage/remote"
"github.com/prometheus/prometheus/tsdb/chunkenc"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
)
const (
@@ -63,9 +64,9 @@ type Config struct {
UseStream bool
// Headers optional HTTP headers to send with each request to the corresponding remote storage
Headers string
// LabelName, LabelValue stands for label=~value pair used for read requests.
// LabelNames, LabelValues stands for label=~value pair used for read requests.
// Is optional.
LabelName, LabelValue string
LabelNames, LabelValues []string
}
// Filter defines a list of filters applied to requested data
@@ -94,12 +95,22 @@ func NewClient(cfg Config) (*Client, error) {
return nil, err
}
var m *prompb.LabelMatcher
if cfg.LabelName != "" && cfg.LabelValue != "" {
m = &prompb.LabelMatcher{
Type: prompb.LabelMatcher_RE,
Name: cfg.LabelName,
Value: cfg.LabelValue,
var matchers []*prompb.LabelMatcher
if len(cfg.LabelNames) > 0 || len(cfg.LabelValues) > 0 {
if len(cfg.LabelNames) != len(cfg.LabelValues) {
return nil, fmt.Errorf("the number of label names and label values must be the same")
}
for i := range cfg.LabelNames {
if cfg.LabelNames[i] == "" {
return nil, fmt.Errorf("label name cannot be empty")
}
matcher := &prompb.LabelMatcher{
Type: prompb.LabelMatcher_RE,
Name: cfg.LabelNames[i],
Value: cfg.LabelValues[i],
}
matchers = append(matchers, matcher)
}
}
@@ -116,7 +127,7 @@ func NewClient(cfg Config) (*Client, error) {
password: cfg.Password,
useStream: cfg.UseStream,
headers: headers,
matchers: []*prompb.LabelMatcher{m},
matchers: matchers,
}
return c, nil

View File

@@ -2,6 +2,7 @@ package main
import (
"bufio"
"context"
"fmt"
"os"
"strings"
@@ -15,7 +16,7 @@ const barTpl = `{{ blue "%s:" }} {{ counters . }} {{ bar . "[" "█" (cycle . "
// isSilent should be inited in main
var isSilent bool
func prompt(question string) bool {
func prompt(ctx context.Context, question string) bool {
if isSilent {
return true
}
@@ -25,15 +26,32 @@ func prompt(question string) bool {
}
reader := bufio.NewReader(os.Stdin)
fmt.Print(question, " [Y/n] ")
answer, err := reader.ReadString('\n')
if err != nil {
answerCh := make(chan string, 1)
errCh := make(chan error, 1)
go func() {
answer, err := reader.ReadString('\n')
if err != nil {
errCh <- err
return
}
answerCh <- answer
}()
select {
case <-ctx.Done():
fmt.Println("\nCanceled.")
return false
case err := <-errCh:
panic(err)
case answer := <-answerCh:
answer = strings.TrimSpace(strings.ToLower(answer))
if answer == "" || answer == "yes" || answer == "y" {
return true
}
return false
}
answer = strings.TrimSpace(strings.ToLower(answer))
if answer == "" || answer == "yes" || answer == "y" {
return true
}
return false
}
func wrapErr(vmErr *vm.ImportError, verbose bool) error {

View File

@@ -79,7 +79,7 @@ func (p *vmNativeProcessor) run(ctx context.Context) error {
return fmt.Errorf("failed to get tenants: %w", err)
}
question := fmt.Sprintf("The following tenants were discovered: %s.\n Continue?", tenants)
if !prompt(question) {
if !prompt(ctx, question) {
return nil
}
}
@@ -233,7 +233,7 @@ func (p *vmNativeProcessor) runBackfilling(ctx context.Context, tenantID string,
// do not prompt for intercluster because there could be many tenants,
// and we don't want to interrupt the process when moving to the next tenant.
question := foundSeriesMsg + ". Continue?"
if !prompt(question) {
if !prompt(ctx, question) {
return nil
}
} else {

View File

@@ -11,9 +11,11 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/ratelimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeserieslimits"
)
@@ -50,8 +52,9 @@ var (
type InsertCtx struct {
Labels sortedLabels
mrs []storage.MetricRow
metricNamesBuf []byte
mrs []storage.MetricRow
mms []metricsmetadata.Row
metricNameBuf []byte
relabelCtx relabel.Ctx
streamAggrCtx streamAggrCtx
@@ -73,8 +76,13 @@ func (ctx *InsertCtx) Reset(rowsLen int) {
}
mrs = slicesutil.SetLength(mrs, rowsLen)
ctx.mrs = mrs[:0]
mms := ctx.mms
for i := range mms {
cleanMetricMetadata(&mms[i])
}
ctx.mms = mms[:0]
ctx.metricNamesBuf = ctx.metricNamesBuf[:0]
ctx.metricNameBuf = ctx.metricNameBuf[:0]
ctx.relabelCtx.Reset()
ctx.streamAggrCtx.Reset()
ctx.skipStreamAggr = false
@@ -84,11 +92,20 @@ func cleanMetricRow(mr *storage.MetricRow) {
mr.MetricNameRaw = nil
}
func cleanMetricMetadata(mm *metricsmetadata.Row) {
mm.MetricFamilyName = nil
mm.Unit = nil
mm.Help = nil
mm.Type = 0
mm.ProjectID = 0
mm.AccountID = 0
}
func (ctx *InsertCtx) marshalMetricNameRaw(prefix []byte, labels []prompb.Label) []byte {
start := len(ctx.metricNamesBuf)
ctx.metricNamesBuf = append(ctx.metricNamesBuf, prefix...)
ctx.metricNamesBuf = storage.MarshalMetricNameRaw(ctx.metricNamesBuf, labels)
metricNameRaw := ctx.metricNamesBuf[start:]
start := len(ctx.metricNameBuf)
ctx.metricNameBuf = append(ctx.metricNameBuf, prefix...)
ctx.metricNameBuf = storage.MarshalMetricNameRaw(ctx.metricNameBuf, labels)
metricNameRaw := ctx.metricNameBuf[start:]
return metricNameRaw[:len(metricNameRaw):len(metricNameRaw)]
}
@@ -143,7 +160,7 @@ func (ctx *InsertCtx) addRow(metricNameRaw []byte, timestamp int64, value float6
mr.MetricNameRaw = metricNameRaw
mr.Timestamp = timestamp
mr.Value = value
if len(ctx.metricNamesBuf) > 16*1024*1024 {
if len(ctx.metricNameBuf) > 16*1024*1024 {
if err := ctx.FlushBufs(); err != nil {
return err
}
@@ -151,6 +168,55 @@ func (ctx *InsertCtx) addRow(metricNameRaw []byte, timestamp int64, value float6
return nil
}
// WriteMetadata writes given prometheus protobuf metadata into the storage.
func (ctx *InsertCtx) WriteMetadata(mmpbs []prompb.MetricMetadata) error {
if len(mmpbs) == 0 {
return nil
}
mms := ctx.mms
mms = slicesutil.SetLength(mms, len(mmpbs))
for idx, mmpb := range mmpbs {
mm := &mms[idx]
mm.MetricFamilyName = bytesutil.ToUnsafeBytes(mmpb.MetricFamilyName)
mm.Help = bytesutil.ToUnsafeBytes(mmpb.Help)
mm.Type = mmpb.Type
mm.Unit = bytesutil.ToUnsafeBytes(mmpb.Unit)
}
err := vmstorage.AddMetadataRows(mms)
if err != nil {
return &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot store metrics metadata: %w", err),
StatusCode: http.StatusServiceUnavailable,
}
}
return nil
}
// WritePromMetadata writes given prometheus metric metadata into the storage
func (ctx *InsertCtx) WritePromMetadata(mmps []prometheus.Metadata) error {
if len(mmps) == 0 {
return nil
}
mms := ctx.mms
mms = slicesutil.SetLength(mms, len(mmps))
for idx, mmpb := range mmps {
mm := &mms[idx]
mm.MetricFamilyName = bytesutil.ToUnsafeBytes(mmpb.Metric)
mm.Help = bytesutil.ToUnsafeBytes(mmpb.Help)
mm.Type = mmpb.Type
}
err := vmstorage.AddMetadataRows(mms)
if err != nil {
return &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot store prometheus metrics metadata: %w", err),
StatusCode: http.StatusServiceUnavailable,
}
}
return nil
}
// AddLabelBytes adds (name, value) label to ctx.Labels.
//
// name and value must exist until ctx.Labels is used.
@@ -221,7 +287,7 @@ func (ctx *InsertCtx) FlushBufs() error {
}
}
func (ctx *InsertCtx) dropAggregatedRows(matchIdxs []byte) {
func (ctx *InsertCtx) dropAggregatedRows(matchIdxs []uint32) {
dst := ctx.mrs[:0]
src := ctx.mrs
if !*streamAggrDropInput {
@@ -239,4 +305,4 @@ func (ctx *InsertCtx) dropAggregatedRows(matchIdxs []byte) {
ctx.mrs = dst
}
var matchIdxsPool bytesutil.ByteBufferPool
var matchIdxsPool slicesutil.BufferPool[uint32]

View File

@@ -13,6 +13,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/metrics"
@@ -22,11 +23,11 @@ var (
streamAggrConfig = flag.String("streamAggr.config", "", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/victoriametrics/stream-aggregation/ . "+
"See also -streamAggr.keepInput, -streamAggr.dropInput and -streamAggr.dedupInterval")
streamAggrKeepInput = flag.Bool("streamAggr.keepInput", false, "Whether to keep all the input samples after the aggregation with -streamAggr.config. "+
"By default, only aggregated samples are dropped, while the remaining samples are stored in the database. "+
streamAggrKeepInput = flag.Bool("streamAggr.keepInput", false, "Whether to keep input samples that match any rule in -streamAggr.config. "+
"By default, matched raw samples are aggregated and dropped, while unmatched samples are written to the remote storage. "+
"See also -streamAggr.dropInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDropInput = flag.Bool("streamAggr.dropInput", false, "Whether to drop all the input samples after the aggregation with -streamAggr.config. "+
"By default, only aggregated samples are dropped, while the remaining samples are stored in the database. "+
streamAggrDropInput = flag.Bool("streamAggr.dropInput", false, "Whether to drop input samples that not matching any rule in -streamAggr.config. "+
"By default, only matched raw samples are dropped, while unmatched samples are written to the remote storage."+
"See also -streamAggr.keepInput and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/")
streamAggrDedupInterval = flag.Duration("streamAggr.dedupInterval", 0, "Input samples are de-duplicated with this interval before optional aggregation with -streamAggr.config . "+
"See also -streamAggr.dropInputLabels and -dedup.minScrapeInterval and https://docs.victoriametrics.com/victoriametrics/stream-aggregation/#deduplication")
@@ -189,7 +190,7 @@ func (ctx *streamAggrCtx) Reset() {
ctx.buf = ctx.buf[:0]
}
func (ctx *streamAggrCtx) push(mrs []storage.MetricRow, matchIdxs []byte) []byte {
func (ctx *streamAggrCtx) push(mrs []storage.MetricRow, matchIdxs []uint32) []uint32 {
mn := &ctx.mn
tss := ctx.tss
labels := ctx.labels
@@ -248,7 +249,7 @@ func (ctx *streamAggrCtx) push(mrs []storage.MetricRow, matchIdxs []byte) []byte
if sas.IsEnabled() {
matchIdxs = sas.Push(tss, matchIdxs)
} else if deduplicator != nil {
matchIdxs = bytesutil.ResizeNoCopyMayOverallocate(matchIdxs, len(tss))
matchIdxs = slicesutil.SetLength(matchIdxs, len(tss))
for i := range matchIdxs {
matchIdxs[i] = 1
}

View File

@@ -27,6 +27,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/zabbixconnector"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
@@ -231,6 +232,17 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
firehose.WriteSuccessResponse(w, r)
return true
case "zabbixconnector/api/v1/history":
zabbixconnectorHistoryRequests.Inc()
if err := zabbixconnector.InsertHandlerForHTTP(r); err != nil {
zabbixconnectorHistoryErrors.Inc()
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest)
fmt.Fprintf(w, `{"error":%q}`, err.Error())
return true
}
w.WriteHeader(http.StatusAccepted)
return true
case "/newrelic":
newrelicCheckRequest.Inc()
w.Header().Set("Content-Type", "application/json")
@@ -423,6 +435,9 @@ var (
opentelemetryPushRequests = metrics.NewCounter(`vm_http_requests_total{path="/opentelemetry/v1/metrics", protocol="opentelemetry"}`)
opentelemetryPushErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/opentelemetry/v1/metrics", protocol="opentelemetry"}`)
zabbixconnectorHistoryRequests = metrics.NewCounter(`vm_http_requests_total{path="/zabbixconnector/api/v1/history", protocol="zabbixconnector"}`)
zabbixconnectorHistoryErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/zabbixconnector/api/v1/history", protocol="zabbixconnector"}`)
newrelicWriteRequests = metrics.NewCounter(`vm_http_requests_total{path="/newrelic/infra/v2/metrics/events/bulk", protocol="newrelic"}`)
newrelicWriteErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/newrelic/infra/v2/metrics/events/bulk", protocol="newrelic"}`)

View File

@@ -6,6 +6,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prommetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/firehose"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentelemetry/stream"
@@ -14,8 +15,9 @@ import (
)
var (
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="opentelemetry"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="opentelemetry"}`)
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="opentelemetry"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="opentelemetry"}`)
metadataInserted = metrics.NewCounter(`vm_metadata_rows_inserted_total{type="opentelemetry"}`)
)
// InsertHandler processes opentelemetry metrics.
@@ -33,12 +35,12 @@ func InsertHandler(req *http.Request) error {
return fmt.Errorf("json encoding isn't supported for opentelemetry format. Use protobuf encoding")
}
}
return stream.ParseStream(req.Body, encoding, processBody, func(tss []prompb.TimeSeries, _ []prompb.MetricMetadata) error {
return insertRows(tss, extraLabels)
return stream.ParseStream(req.Body, encoding, processBody, func(tss []prompb.TimeSeries, mms []prompb.MetricMetadata) error {
return insertRows(tss, mms, extraLabels)
})
}
func insertRows(tss []prompb.TimeSeries, extraLabels []prompb.Label) error {
func insertRows(tss []prompb.TimeSeries, mms []prompb.MetricMetadata, extraLabels []prompb.Label) error {
ctx := common.GetInsertCtx()
defer common.PutInsertCtx(ctx)
@@ -75,5 +77,14 @@ func insertRows(tss []prompb.TimeSeries, extraLabels []prompb.Label) error {
}
rowsInserted.Add(rowsTotal)
rowsPerInsert.Update(float64(rowsTotal))
return ctx.FlushBufs()
if err := ctx.FlushBufs(); err != nil {
return fmt.Errorf("cannot flush metric bufs: %w", err)
}
if prommetadata.IsEnabled() {
if err := ctx.WriteMetadata(mms); err != nil {
return err
}
metadataInserted.Add(len(mms))
}
return nil
}

View File

@@ -1,6 +1,7 @@
package prometheusimport
import (
"fmt"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
@@ -15,8 +16,9 @@ import (
)
var (
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="prometheus"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="prometheus"}`)
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="prometheus"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="prometheus"}`)
metadataInserted = metrics.NewCounter(`vm_metadata_rows_inserted_total{type="prometheus"}`)
)
// InsertHandler processes `/api/v1/import/prometheus` request.
@@ -30,14 +32,14 @@ func InsertHandler(req *http.Request) error {
return err
}
encoding := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, defaultTimestamp, encoding, true, prommetadata.IsEnabled(), func(rows []prometheus.Row, _ []prometheus.Metadata) error {
return insertRows(rows, extraLabels)
return stream.Parse(req.Body, defaultTimestamp, encoding, true, prommetadata.IsEnabled(), func(rows []prometheus.Row, mms []prometheus.Metadata) error {
return insertRows(rows, mms, extraLabels)
}, func(s string) {
httpserver.LogError(req, s)
})
}
func insertRows(rows []prometheus.Row, extraLabels []prompb.Label) error {
func insertRows(rows []prometheus.Row, mms []prometheus.Metadata, extraLabels []prompb.Label) error {
ctx := common.GetInsertCtx()
defer common.PutInsertCtx(ctx)
@@ -64,5 +66,15 @@ func insertRows(rows []prometheus.Row, extraLabels []prompb.Label) error {
}
rowsInserted.Add(len(rows))
rowsPerInsert.Update(float64(len(rows)))
return ctx.FlushBufs()
if err := ctx.FlushBufs(); err != nil {
return fmt.Errorf("cannot flush metric bufs: %w", err)
}
if prommetadata.IsEnabled() {
if err := ctx.WritePromMetadata(mms); err != nil {
return err
}
metadataInserted.Add(len(mms))
}
return nil
}

View File

@@ -4,13 +4,15 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prommetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="promscrape"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="promscrape"}`)
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="promscrape"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="promscrape"}`)
metadataRowsInserted = metrics.NewCounter(`vm_metadata_rows_inserted_total{type="promscrape"}`)
)
const maxRowsPerBlock = 10000
@@ -41,6 +43,13 @@ func Push(wr *prompb.WriteRequest) {
}
push(ctx, tssBlock)
}
if prommetadata.IsEnabled() {
if err := ctx.WriteMetadata(wr.Metadata); err != nil {
logger.Errorf("cannot write promscrape metrics metadata to storage: %s", err)
} else {
metadataRowsInserted.Add(len(wr.Metadata))
}
}
}
func push(ctx *common.InsertCtx, tss []prompb.TimeSeries) {

View File

@@ -1,10 +1,12 @@
package promremotewrite
import (
"fmt"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prommetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
@@ -12,8 +14,9 @@ import (
)
var (
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="promremotewrite"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="promremotewrite"}`)
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="promremotewrite"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="promremotewrite"}`)
metadataInserted = metrics.NewCounter(`vm_metadata_rows_inserted_total{type="promremotewrite"}`)
)
// InsertHandler processes remote write for prometheus.
@@ -23,12 +26,12 @@ func InsertHandler(req *http.Request) error {
return err
}
isVMRemoteWrite := req.Header.Get("Content-Encoding") == "zstd"
return stream.Parse(req.Body, isVMRemoteWrite, func(tss []prompb.TimeSeries, _ []prompb.MetricMetadata) error {
return insertRows(tss, extraLabels)
return stream.Parse(req.Body, isVMRemoteWrite, func(tss []prompb.TimeSeries, mms []prompb.MetricMetadata) error {
return insertRows(tss, mms, extraLabels)
})
}
func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompb.Label) error {
func insertRows(timeseries []prompb.TimeSeries, mms []prompb.MetricMetadata, extraLabels []prompb.Label) error {
ctx := common.GetInsertCtx()
defer common.PutInsertCtx(ctx)
@@ -68,5 +71,15 @@ func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompb.Label) erro
}
rowsInserted.Add(rowsTotal)
rowsPerInsert.Update(float64(rowsTotal))
return ctx.FlushBufs()
if err := ctx.FlushBufs(); err != nil {
return fmt.Errorf("cannot flush metric bufs: %w", err)
}
if prommetadata.IsEnabled() {
if err := ctx.WriteMetadata(mms); err != nil {
return err
}
metadataInserted.Add(len(mms))
}
return nil
}

View File

@@ -86,7 +86,7 @@ func loadRelabelConfig() (*promrelabel.ParsedConfigs, error) {
if len(*relabelConfig) == 0 {
return nil, nil
}
pcs, err := promrelabel.LoadRelabelConfigs(*relabelConfig)
pcs, _, err := promrelabel.LoadRelabelConfigs(*relabelConfig)
if err != nil {
return nil, fmt.Errorf("error when reading -relabelConfig=%q: %w", *relabelConfig, err)
}

View File

@@ -0,0 +1,67 @@
package zabbixconnector
import (
"net/http"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/protoparserutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/zabbixconnector"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/zabbixconnector/stream"
)
var (
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="zabbixconnector"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="zabbixconnector"}`)
)
// InsertHandlerForHTTP processes remote write for ZabbixConnector POST /zabbixconnector/v1/history request.
func InsertHandlerForHTTP(req *http.Request) error {
extraLabels, err := protoparserutil.GetExtraLabels(req)
if err != nil {
return err
}
encoding := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, encoding, func(rows []zabbixconnector.Row) error {
return insertRows(rows, extraLabels)
})
}
func insertRows(rows []zabbixconnector.Row, extraLabels []prompb.Label) error {
ctx := common.GetInsertCtx()
defer common.PutInsertCtx(ctx)
rowsTotal := len(rows)
ctx.Reset(rowsTotal)
hasRelabeling := relabel.HasRelabeling()
for i := range rows {
r := &rows[i]
ctx.Labels = ctx.Labels[:0]
for k := range r.Tags {
t := &r.Tags[k]
ctx.AddLabelBytes(t.Key, t.Value)
}
for k := range extraLabels {
label := &extraLabels[k]
ctx.AddLabel(label.Name, label.Value)
}
if hasRelabeling {
ctx.ApplyRelabeling()
}
if len(ctx.Labels) == 0 {
// Skip metric without labels.
continue
}
ctx.SortLabelsIfNeeded()
if err := ctx.WriteDataPoint(nil, ctx.Labels, r.Timestamp, r.Value); err != nil {
return err
}
}
rowsInserted.Add(rowsTotal)
rowsPerInsert.Update(float64(rowsTotal))
return ctx.FlushBufs()
}

View File

@@ -31,6 +31,9 @@ vmrestore-linux-ppc64le-prod:
vmrestore-linux-386-prod:
APP_NAME=vmrestore EXTRA_GO_BUILD_TAGS=$(VMRESTORE_GO_BUILD_TAGS) $(MAKE) app-via-docker-linux-386
vmrestore-linux-s390x-prod:
APP_NAME=vmrestore EXTRA_GO_BUILD_TAGS=$(VMRESTORE_GO_BUILD_TAGS) $(MAKE) app-via-docker-linux-s390x
vmrestore-darwin-amd64-prod:
APP_NAME=vmrestore EXTRA_GO_BUILD_TAGS=$(VMRESTORE_GO_BUILD_TAGS) $(MAKE) app-via-docker-darwin-amd64

View File

@@ -104,7 +104,7 @@ func newDstFS() (*fslocal.FS, error) {
}
func newSrcFS(ctx context.Context) (common.RemoteFS, error) {
fs, err := actions.NewRemoteFS(ctx, *src)
fs, err := actions.NewRemoteFS(ctx, *src, nil)
if err != nil {
return nil, fmt.Errorf("cannot parse `-src`=%q: %w", *src, err)
}

View File

@@ -421,6 +421,16 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
w.WriteHeader(http.StatusNoContent)
return true
case "/api/v1/metadata":
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
metadataRequests.Inc()
if err := prometheus.MetadataHandler(qt, startTime, w, r); err != nil {
metadataErrors.Inc()
httpserver.SendPrometheusError(w, r, err)
return true
}
return true
default:
return false
}
@@ -574,12 +584,6 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
w.Header().Set("Content-Type", "application/json")
fmt.Fprint(w, `{"status":"success","data":{"notifiers":[]}}`)
return true
case "/api/v1/metadata":
// Return dumb placeholder for https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
metadataRequests.Inc()
w.Header().Set("Content-Type", "application/json")
fmt.Fprintf(w, "%s", `{"status":"success","data":{}}`)
return true
case "/api/v1/status/buildinfo":
buildInfoRequests.Inc()
w.Header().Set("Content-Type", "application/json")
@@ -708,7 +712,9 @@ var (
alertsRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/alerts"}`)
notifiersRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/notifiers"}`)
metadataRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/metadata"}`)
metadataRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/metadata"}`)
metadataErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/api/v1/metadata"}`)
buildInfoRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/buildinfo"}`)
queryExemplarsRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/query_exemplars"}`)

View File

@@ -20,6 +20,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
)
var (
@@ -865,6 +866,23 @@ func LabelValues(qt *querytracer.Tracer, labelName string, sq *storage.SearchQue
return labelValues, nil
}
// GetMetricsMetadata returns time series metric names metadata for the given args
func GetMetricsMetadata(qt *querytracer.Tracer, limit int, metricName string) ([]*metricsmetadata.Row, error) {
qt = qt.NewChild("get metrics metadata: limit=%d, metric_name=%q", limit, metricName)
defer qt.Done()
metadata := vmstorage.Storage.GetMetadataRows(qt, limit, metricName)
sort.Slice(metadata, func(i, j int) bool {
return string(metadata[i].MetricFamilyName) < string(metadata[j].MetricFamilyName)
})
if limit > 0 && len(metadata) >= limit {
metadata = metadata[:limit]
}
return metadata, nil
}
// GraphiteTagValues returns tag values for the given tagName until the given deadline.
func GraphiteTagValues(qt *querytracer.Tracer, tagName, filter string, limit int, deadline searchutil.Deadline) ([]string, error) {
qt = qt.NewChild("get graphite tag values for tagName=%s, filter=%s, limit=%d", tagName, filter, limit)

View File

@@ -0,0 +1,35 @@
{% import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
) %}
{% stripspace %}
MetadataResponse generates response for /api/v1/metadata
See https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
{% func MetadataResponse( result []*metricsmetadata.Row, qt *querytracer.Tracer) %}
{
"status":"success",
"data": {
{% code
mapItems := len(result)
currentItem := 0
%}
{% for _, row := range result %}
"{%s string(row.MetricFamilyName) %}": [
{
"type": {%q= row.Type.String() %},
{% if len(row.Unit) > 0 -%}
"unit": {%q= string(row.Unit) %},
{% endif -%}
"help": {%q= string(row.Help) %}
}
]
{% if currentItem != mapItems-1 %},{% endif %}
{% code currentItem++ %}
{% endfor %}
}
{%= dumpQueryTrace(qt) %}
}
{% endfunc %}
{% endstripspace %}

View File

@@ -0,0 +1,108 @@
// Code generated by qtc from "metadata_response.qtpl". DO NOT EDIT.
// See https://github.com/valyala/quicktemplate for details.
//line app/vmselect/prometheus/metadata_response.qtpl:1
package prometheus
//line app/vmselect/prometheus/metadata_response.qtpl:1
import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
)
// MetadataResponse generates response for /api/v1/metadataSee https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
//line app/vmselect/prometheus/metadata_response.qtpl:9
import (
qtio422016 "io"
qt422016 "github.com/valyala/quicktemplate"
)
//line app/vmselect/prometheus/metadata_response.qtpl:9
var (
_ = qtio422016.Copy
_ = qt422016.AcquireByteBuffer
)
//line app/vmselect/prometheus/metadata_response.qtpl:9
func StreamMetadataResponse(qw422016 *qt422016.Writer, result []*metricsmetadata.Row, qt *querytracer.Tracer) {
//line app/vmselect/prometheus/metadata_response.qtpl:9
qw422016.N().S(`{"status":"success","data": {`)
//line app/vmselect/prometheus/metadata_response.qtpl:14
mapItems := len(result)
currentItem := 0
//line app/vmselect/prometheus/metadata_response.qtpl:17
for _, row := range result {
//line app/vmselect/prometheus/metadata_response.qtpl:17
qw422016.N().S(`"`)
//line app/vmselect/prometheus/metadata_response.qtpl:18
qw422016.E().S(string(row.MetricFamilyName))
//line app/vmselect/prometheus/metadata_response.qtpl:18
qw422016.N().S(`": [{"type":`)
//line app/vmselect/prometheus/metadata_response.qtpl:20
qw422016.N().Q(row.Type.String())
//line app/vmselect/prometheus/metadata_response.qtpl:20
qw422016.N().S(`,`)
//line app/vmselect/prometheus/metadata_response.qtpl:21
if len(row.Unit) > 0 {
//line app/vmselect/prometheus/metadata_response.qtpl:21
qw422016.N().S(`"unit":`)
//line app/vmselect/prometheus/metadata_response.qtpl:22
qw422016.N().Q(string(row.Unit))
//line app/vmselect/prometheus/metadata_response.qtpl:22
qw422016.N().S(`,`)
//line app/vmselect/prometheus/metadata_response.qtpl:23
}
//line app/vmselect/prometheus/metadata_response.qtpl:23
qw422016.N().S(`"help":`)
//line app/vmselect/prometheus/metadata_response.qtpl:24
qw422016.N().Q(string(row.Help))
//line app/vmselect/prometheus/metadata_response.qtpl:24
qw422016.N().S(`}]`)
//line app/vmselect/prometheus/metadata_response.qtpl:27
if currentItem != mapItems-1 {
//line app/vmselect/prometheus/metadata_response.qtpl:27
qw422016.N().S(`,`)
//line app/vmselect/prometheus/metadata_response.qtpl:27
}
//line app/vmselect/prometheus/metadata_response.qtpl:28
currentItem++
//line app/vmselect/prometheus/metadata_response.qtpl:29
}
//line app/vmselect/prometheus/metadata_response.qtpl:29
qw422016.N().S(`}`)
//line app/vmselect/prometheus/metadata_response.qtpl:31
streamdumpQueryTrace(qw422016, qt)
//line app/vmselect/prometheus/metadata_response.qtpl:31
qw422016.N().S(`}`)
//line app/vmselect/prometheus/metadata_response.qtpl:33
}
//line app/vmselect/prometheus/metadata_response.qtpl:33
func WriteMetadataResponse(qq422016 qtio422016.Writer, result []*metricsmetadata.Row, qt *querytracer.Tracer) {
//line app/vmselect/prometheus/metadata_response.qtpl:33
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/metadata_response.qtpl:33
StreamMetadataResponse(qw422016, result, qt)
//line app/vmselect/prometheus/metadata_response.qtpl:33
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/metadata_response.qtpl:33
}
//line app/vmselect/prometheus/metadata_response.qtpl:33
func MetadataResponse(result []*metricsmetadata.Row, qt *querytracer.Tracer) string {
//line app/vmselect/prometheus/metadata_response.qtpl:33
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/metadata_response.qtpl:33
WriteMetadataResponse(qb422016, result, qt)
//line app/vmselect/prometheus/metadata_response.qtpl:33
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/metadata_response.qtpl:33
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/metadata_response.qtpl:33
return qs422016
//line app/vmselect/prometheus/metadata_response.qtpl:33
}

View File

@@ -56,7 +56,7 @@ var (
maxTSDBStatusSeries = flag.Int("search.maxTSDBStatusSeries", 10e6, "The maximum number of time series, which can be processed during the call to /api/v1/status/tsdb. This option allows limiting memory usage")
maxSeriesLimit = flag.Int("search.maxSeries", 30e3, "The maximum number of time series, which can be returned from /api/v1/series. This option allows limiting memory usage")
maxDeleteSeries = flag.Int("search.maxDeleteSeries", 1e6, "The maximum number of time series, which can be deleted using /api/v1/admin/tsdb/delete_series. This option allows limiting memory usage")
maxTSDBStatusTopNSeries = flag.Int("search.maxTSDBStatusTopNSeries", 1000, "The maximum value of `topN` argument that can be passed to /api/v1/status/tsdb API. This option allows limiting memory usage. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#tsdb-stats")
maxTSDBStatusTopNSeries = flag.Int("search.maxTSDBStatusTopNSeries", 1000, "The maximum value of 'topN' argument that can be passed to /api/v1/status/tsdb API. This option allows limiting memory usage. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#tsdb-stats")
maxLabelsAPISeries = flag.Int("search.maxLabelsAPISeries", 1e6, "The maximum number of time series, which could be scanned when searching for the matching time series "+
"at /api/v1/labels and /api/v1/label/.../values. This option allows limiting memory usage and CPU usage. See also -search.maxLabelsAPIDuration, "+
"-search.maxTagKeys, -search.maxTagValues and -search.ignoreExtraFiltersAtLabelsAPI")
@@ -639,6 +639,37 @@ func LabelsHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseW
return nil
}
// MetadataHandler processes /api/v1/metadata request.
//
// See https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata
func MetadataHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWriter, r *http.Request) error {
limit, err := httputil.GetInt(r, "limit")
if err != nil {
return err
}
if limit < 0 {
limit = 0
}
metricName := r.FormValue("metric")
metadata, err := netstorage.GetMetricsMetadata(qt, limit, metricName)
if err != nil {
return fmt.Errorf("cannot get metadata: %w", err)
}
qt.Done()
w.Header().Set("Content-Type", "application/json")
bw := bufferedwriter.Get(w)
defer bufferedwriter.Put(bw)
WriteMetadataResponse(bw, metadata, qt)
if err := bw.Flush(); err != nil {
return fmt.Errorf("cannot send metadata response to remote client: %w", err)
}
return nil
}
var labelsDuration = metrics.NewSummary(`vm_request_duration_seconds{path="/api/v1/labels"}`)
// SeriesCountHandler processes /api/v1/series/count request.

View File

@@ -49,7 +49,7 @@ var (
minWindowForInstantRollupOptimization = flag.Duration("search.minWindowForInstantRollupOptimization", time.Hour*3, "Enable cache-based optimization for repeated queries "+
"to /api/v1/query (aka instant queries), which contain rollup functions with lookbehind window exceeding the given value")
maxBinaryOpPushdownLabelValues = flag.Int("search.maxBinaryOpPushdownLabelValues", 100, "The maximum number of values for a label in the first expression that can be extracted as a common label filter and pushed down to the second expression in a binary operation. "+
"A larger value makes the pushed-down filter more complex but fewer time series will be returned. This flag is useful when selective label contains numerous values, for example `instance`, and storage resources are abundant.")
"A larger value makes the pushed-down filter more complex but fewer time series will be returned. This flag is useful when selective label (e.g., 'instance') contains numerous values, and storage resources are abundant.")
)
// The minimum number of points per timeseries for enabling time rounding.
@@ -1169,60 +1169,6 @@ func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string,
},
}
return evalExpr(qt, ec, be)
case "rate":
if iafc != nil {
if !strings.EqualFold(iafc.ae.Name, "sum") {
qt.Printf("do not apply instant rollup optimization for incremental aggregate %s()", iafc.ae.Name)
return evalAt(qt, timestamp, window)
}
qt.Printf("optimized calculation for sum(rate(m[d])) as (sum(increase(m[d])) / d)")
afe := expr.(*metricsql.AggrFuncExpr)
fe := afe.Args[0].(*metricsql.FuncExpr)
feIncrease := *fe
feIncrease.Name = "increase"
// copy RollupExpr to drop possible offset,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9762
newArg := copyRollupExpr(fe.Args[0].(*metricsql.RollupExpr))
newArg.Offset = nil
feIncrease.Args = []metricsql.Expr{newArg}
d := newArg.Window.Duration(ec.Step)
if d == 0 {
d = ec.Step
}
afeIncrease := *afe
afeIncrease.Args = []metricsql.Expr{&feIncrease}
be := &metricsql.BinaryOpExpr{
Op: "/",
KeepMetricNames: true,
Left: &afeIncrease,
Right: &metricsql.NumberExpr{
N: float64(d) / 1000,
},
}
return evalExpr(qt, ec, be)
}
qt.Printf("optimized calculation for instant rollup rate(m[d]) as (increase(m[d]) / d)")
fe := expr.(*metricsql.FuncExpr)
feIncrease := *fe
feIncrease.Name = "increase"
// copy RollupExpr to drop possible offset,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9762
newArg := copyRollupExpr(fe.Args[0].(*metricsql.RollupExpr))
newArg.Offset = nil
feIncrease.Args = []metricsql.Expr{newArg}
d := newArg.Window.Duration(ec.Step)
if d == 0 {
d = ec.Step
}
be := &metricsql.BinaryOpExpr{
Op: "/",
KeepMetricNames: fe.KeepMetricNames,
Left: &feIncrease,
Right: &metricsql.NumberExpr{
N: float64(d) / 1000,
},
}
return evalExpr(qt, ec, be)
case "max_over_time":
if iafc != nil {
if !strings.EqualFold(iafc.ae.Name, "max") {

Some files were not shown because too many files have changed in this diff Show More