Compare commits

...

52 Commits

Author SHA1 Message Date
Zhu Jiekun
f32cb253dd Refactor apptest target in Makefile
Removed VM_APPTEST variable from apptest target.

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
2026-06-11 21:57:38 +08:00
Zhu Jiekun
0b53dbf2ab Delete apptest/tests/apptest_test.go
Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
2026-06-11 21:57:10 +08:00
Zhu Jiekun
703f36cc55 Remove unused 'fmt' import from apptest_test.go
Removed unused import for 'fmt' in apptest_test.go

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
2026-06-11 20:48:03 +08:00
Zhu Jiekun
ea72cd1c59 Rename integration test environment check function
Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
2026-06-11 20:43:25 +08:00
Zhu Jiekun
c0d5bb2183 Refactor TestMain to remove binary checks
Removed binary requirement checks from TestMain function.

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
2026-06-11 20:42:22 +08:00
Pablo (Tomas) Fernandez
d82550a6ea Update docs/victoriametrics/CONTRIBUTING.md
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
2026-06-11 13:12:45 +01:00
Pablo (Tomas) Fernandez
bc8e218889 Apply suggestions from grammar review
Co-authored-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
Signed-off-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
2026-06-11 11:37:30 +01:00
Zhu Jiekun
b217fa9aa9 Apply suggestions from code review
Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
2026-06-10 17:29:04 +08:00
Jiekun
eb45399841 apptest: rename integration test to apptest 2026-06-10 16:34:56 +08:00
Jiekun
fff6c00f82 integration test: improve binary check and doc 2026-06-10 16:15:37 +08:00
Jiekun
1ab6e8e34d doc: remove unnecessary commands in testing guide 2026-06-10 15:50:12 +08:00
Jiekun
264ef58178 integration test: fix typo in Makefile 2026-06-10 15:40:46 +08:00
Jiekun
54a9038cfc integration test: add testing doc in contributing guide. add integration test alias 2026-06-10 15:38:38 +08:00
Pablo (Tomas) Fernandez
e35bd268ff docs/vmanomaly: fix broken links/anchors in anomaly detection (#11071)
This PR fixes broken links or anchors in the Anomaly Detection pages:
- changelog
- anomaly-detection/UI
- Find a heading substitute to "Sampling Period"
- Correct "vlogs-reader" with "victorialogs-reader"
2026-06-08 17:10:15 +03:00
hagen1778
6f26f9c090 dashboards: run make dashboards-sync
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-06-08 14:40:08 +02:00
hagen1778
03350c836a dashboards: rm RPC network usage panel from Interconnection row
The panel `RPC network usage` was misleading and didn't bring any value.
See related ticket https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11048

The interconnection traffic between vminsert-vmstorage and vmselect-vmstorage is already
displayed as separate panels within `vminsert` and `vmselect` rows respectively.
These panels are correctly displaying in/out traffic between components. There is no need
in additional `RPC network usage` panel.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-06-08 14:38:33 +02:00
hagen1778
ef70e2b119 dashboards: fix the typo with wrong legend tooltip
Initially noticed and fixed in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11068
by @immanuwell.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-06-08 14:28:57 +02:00
hagen1778
c3070bb446 dashboards: consistently use Lower is better hint
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-06-08 14:25:30 +02:00
hagen1778
8ce5ac094d dashboards: update and sync PSI panels descriptions
The descriptions were updated based on the feedback
received from https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11063.

The change removes relation to % of the pressure and clarifies that pressure
is measured as amount of time within 1s time bucket.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-06-08 14:22:09 +02:00
Max Kotliar
da680ace8f docs: run make docs-update-flags 2026-06-08 14:11:14 +03:00
Max Kotliar
3afcdf2ae9 docs: update vmctl thanos and mimir sub commands flags 2026-06-08 14:09:44 +03:00
Max Kotliar
a3a94260d2 docs: bump version to v1.145.0
Signed-off-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-08 13:59:02 +03:00
Max Kotliar
983f7d0f6a deplyoment/docker: bump version to v1.145.0
Signed-off-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-08 13:55:15 +03:00
Max Kotliar
021fa25808 docs/changelog: port forward LTS changelogs to master 2026-06-08 13:50:52 +03:00
Artem Fetishev
89db66573b vmstorage: Extract vmstorage code common for single node and cluster into a separate VMStorage type (#11046)
This allows to unify the search limit logic and flags across the two
branches.

- The vmcluster version of this PR can be found here:
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11042
- This PR fixes
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10969

---------

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-06-07 22:03:57 +02:00
Fred Navruzov
31129b9d8c docs/vmanomaly: fix outdated anchors and param names (#11064)
Updated /docs/anomaly-detection/ section content for stale anchors, metrics renames, and typos in param names

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11064
2026-06-05 21:45:31 +03:00
Max Kotliar
3147f9c24b go.mod: security update golang.org/x/crypto
Fixes:

NAME                 INSTALLED  FIXED IN  TYPE       VULNERABILITY
SEVERITY  EPSS           RISK
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5006
Critical  < 0.1% (21st)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5023
Critical  < 0.1% (16th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5017
Critical  < 0.1% (17th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5020
Critical  < 0.1% (17th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5013   High
< 0.1% (17th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5005
Critical  < 0.1% (13th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5021
Critical  < 0.1% (11th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5019
Critical  < 0.1% (9th)   < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5018   High
< 0.1% (10th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5033
Medium    < 0.1% (16th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5014
Medium    < 0.1% (10th)  < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5015
Medium    < 0.1% (8th)   < 0.1
golang.org/x/crypto  v0.51.0    0.52.0    go-module  GO-2026-5016
Medium    < 0.1% (6th)   < 0.1
2026-06-05 20:48:57 +03:00
Max Kotliar
db8c9badbf docs/changelog: cut release v1.145.0
Signed-off-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-05 15:42:48 +03:00
Max Kotliar
2dc487d125 app/vmselect: run make vmui-update
Signed-off-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-05 15:34:42 +03:00
Nikolay
4cb5fae347 lib/storage: add scheduled rows metrics for calculation background merge completion (#1048)
Previously, only the size and number of partitions were exposed as metrics. It's not enough to estimate the possible completion time.

New metrics `vm_downsampling_partitions_scheduled_rows` and `vm_retention_filters_partitions_scheduled_rows` allow predicting completion time by the following query:

```
sum(vm_downsampling_partitions_scheduled_rows)
/
sum(rate(vm_rows_merged_total{type="storage/big"}))
```
and

```
sum(vm_retention_filters_partitions_scheduled_rows)
/
sum(rate(vm_rows_merged_total{type="storage/big"}))
```

Note, `vm_rows_merged_total` accounts for all possible merges, and it could slightly overestimate completion time.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10960
PR https://github.com/VictoriaMetrics/VictoriaMetrics-enterprise/pull/1048
2026-06-05 14:51:46 +03:00
Max Kotliar
3c192f9238 docs/changelog: add link to issue 2026-06-04 17:47:19 +03:00
Max Kotliar
159bc15825 docs/changelog: add link to pr 2026-06-04 17:41:09 +03:00
Max Kotliar
8db58ac410 docs/changelog: add link to mimir page 2026-06-04 17:38:51 +03:00
Max Kotliar
42c1f729db dashboards: show short_version when available, fall back to long version otherwise (#11047)
The `Version` panel shows an empty value when a service is built from a
feature branch. In such cases, `short_version` label is not available,
so `by(job, short_version)` grouping produces no visible value in the
panel.

Previously, to see the actual version, one had to manually edit the
panel query and add the relevant `version` label to the `by()` clause.

The fix tries `short_version` and when empty falls back to a long
`version` label.

This is applied uniformly across all dashboards:
- victoriametrics (single-node)
- victoriametrics-cluster
- vmagent
- vmalert
- vmauth
- operator

PR simplifies debugging with custom or feature-branch builds.
2026-06-04 17:28:25 +03:00
Max Kotliar
6851a75c71 lib/netutil: detect silently dropped idle TCP connections before reuse (#11024)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10735
Related to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10646

Connections silently dropped at the network layer (no RST/FIN) remain
alive from the OS perspective until TCP_USER_TIMEOUT fires. It happens while the connection stays in the pool. When such a stale connection is reused from the pool, the next write fails immediately (<100 microseconds) with a `write: connection timed out` error. Since timeout errors are not retriable, it results in a user-visible query error like:
```
cannot flush requestData to conn: write tcp4 ...: write: connection timed out
```

Before returning a pooled connection to a caller, probe it with a non-blocking read, but only if it has been idle for 5 or more seconds. Fresh connections (idle < 5s) are returned without an extra alive check to avoid unnecessary overhead on the fast path.

The alive probe sets ReadDeadline=now+5ms and reads 1 byte. A deadline-exceeded error means the connection is alive. Any other result (EOF, ETIMEDOUT, unexpected data) means the connection is dead, and it is discarded.

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11024

This is useful for deployments with unstable network infrastructure,
where established connections can be silently broken while sitting in
the pool.

When in doubt if it's your case or not, look for `write: connection
timed out` errors in logs, or monitor the error ratio with:

```
sum(
  increase(vm_request_errors_total{action="search",type="rpcClient",name="vmselect"}[1m])
) by (job)
/
sum(
  increase(vm_requests_total{action="search",type="rpcClient",name="vmselect"}[1m])
) by (job)
* 100
```
2026-06-04 17:13:44 +03:00
JAYICE
1baaaf3b31 app/vmagent: introduce metrics for kafka saturation breakdown
This commit adds new metrics for kafka remotewrite:

* vmagent_remotewrite_kafka_outbuf_latency_seconds
* vmagent_remotewrite_kafka_rtt_seconds

 It helps to detect possible network saturation between vmagent and kafka brokers.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10730
2026-06-04 13:28:05 +02:00
Zhu Jiekun
ec88b9cac6 app/vmauth: properly log user information when a missing route error occurs
properly log user information when a missing route error occurs.

examples:
- `user foo missing route for "http://foo:secret@some-host.com/a/b`
- `user unauthorized missing route for "http://some-host.com/abc?de=fg"`

---

Previously, the error log was not good enough to identify the source of
the request:
```
2026-06-03T06:48:09.020Z	warn	VictoriaMetrics/app/vmauth/main.go:427	remoteAddr: "**.**.**.**, X-Forwarded-For: **.**.**.**"; requestURI: /prometheus/api/v1/write; missing route for "/prometheus/api/v1/write" 
```
It takes extra efforts to check it by metrics such as
`vmauth_user_requests_total`.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11052
2026-06-04 13:14:55 +02:00
Max Kotliar
661fbf947c docs/changelog: add empty line
follow-up on prev commit
5c176838d1
2026-06-03 13:26:14 +03:00
JAYICE
5c176838d1 deployment: upgrade golang builder from 1.26.3 to 1.26.4 (#11053)
upgrade golang builder to fix failed PR pipeline
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/26870814471/job/79245400038?pr=11052

PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11053
2026-06-03 13:21:38 +03:00
Zakhar Bessarab
9d4c06210c app/vmauth: take JWT users into a total count of users for startup message
Previously, config like this:
```
users:
  - jwt:
      skip_verify: true
    url_prefix:
      - "http://backend"
```
Would print a message saying that 0 users were loaded which leads to
confusion.

Related PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11050/
2026-06-02 17:51:11 +02:00
Dmytro Kozlov
347d2e0fef app/vmctl: implement migration from mimir object storage
This commit adds the ability to read blocks from Mimir object storage, process
them, and store them in VictoriaMetrics.
This new version of the migration can read Mimir object storage from the
file system and S3, GCP and Azure.

Under the hood, the vmctl tries to read the bucket-index.json file and
collect blocks via the defined filters. Depending on the used path vmctl
decides how to retrieve data. If it is not the file system, it will
download each file from the block and store it in the `tmp` folder.
Process this block folder and remove it from the file system.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7717
2026-06-02 16:51:54 +02:00
Alexander Frolov
f3de1f4ac7 vmselect: allow partial responses for storage node groups
Previously vmselect returned http.StatusServiceUnavailable instead of a partial response when a single storage group is unavailable, even when partial responses are allowed.

According to the cluster documentation, vmselect should continue serving queries from available vmstorage nodes and mark incomplete responses as partial.

This commit fixes behavior and allows partial respones for grouped storage nodes.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11009
2026-06-02 16:17:02 +02:00
f41gh7
a9032ecd1d docs/changelog: sort changelog entries 2026-06-02 15:44:08 +02:00
Nikolay
ce712f0bc9 docs/vmauth: add security recommendations for vmselect multitenant reads
Currently, vmselect uses `OR` logic to filter out tenants for
 `multitenant` requests. And if vmauth is used to enforce tenant
 requests with `extra_label` or `extra_filters`. It's required to set
 `extra_label`, `extra_filters` and `extra_filters[]` explicitly at
 configuration file to prevent possible enforcement bypass.
2026-06-02 15:34:17 +02:00
Nikolay
23a3ff4174 app/vmselect: fixes utf-8 escaping at federate API
Prometheus compatible scrapers failed to parse `/federate` API responses
from VictoriaMetrics, if it contains any utf-8 characters.

This commit adds best effort negotiation check for /federate API
requests. By default, VictoriaMetrics emmits output in the same format.
But in case of Accept: allow-utf-8 header, VictoriaMetrics switches into
quoted output for time series name and label names according to
Prometheus v3.0 text exposition format.
https://prometheus.io/docs/guides/utf8/#querying

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10968
2026-06-02 14:41:07 +02:00
Hui Wang
b638f95aba app/vmalert: reset group evaluation timestamp when system clock is modified backwards
The issue could happen subtly and be hard to debug. It might also
generate confusing results. For example, if the system clock for vmalert
is moved 30m backward, vmalert would reset the evaluation timestamps to
30m ago, causing duplicate evaluations for timestamps within that 30m
window. Exposing a new group metric `vmalert_iteration_reset_total` to
help debuging this issue.

fixes:
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10985
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10423.
2026-06-02 14:40:15 +02:00
Alexander Frolov
2345b2b4ed vmui: clarify partial results warning text
Update the VMUI partial results warning to say `storage nodes` instead
of `vmstorage nodes`.

People operating VictoriaMetrics and people primarily using the UI are
often different audiences. UI users may not know what `vmstorage` means,
but `storage nodes` is easier to understand while preserving the meaning
of the warning.
2026-06-02 14:39:13 +02:00
Hui Wang
b6edf40198 app/vmalert: support match[] parameter in /api/v1/rules and /api/v1/alerts APIs
If multiple match[] parameters are passed, rules that match any of the
provided label selector sets are returned.
For `/api/v1/rules`, the matching is performed against the labels
defined in each rule configuration;
For `/api/v1/alerts`, the matching is performed against the labels
attached in each alert.

The current `search` in [VMUI
alerting](https://play.victoriametrics.com/select/0/prometheus/graph/#/rules)
can only filter by group and rule name. The label filtering in the
vmalert UI is implemented on the frontend and matches exact label
content rather than label selectors. Both UI could be improved after
support this parameter.

fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11020
2026-06-02 14:38:36 +02:00
Nikolay
b220066049 lib/prompb: properly reuse memory for native histogram parsing
Previously, bucket spans were incorrectly re-used due to missing `clear`
call.
It could produce unexpected results with span bucket value from
previously parsed histogram.
 In addition, if timeseries had a multiple data values for the same
 histogram, metric name could add unexpected `bucket` prefix multiple
 times, producing `_bucket_bucket` metric names.

  This commit properly clears reused memory.

 Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11041
2026-06-02 14:36:58 +02:00
Aliaksandr Valialkin
9ea1770ba4 app/vmctl: mention --vm-significant-figures option in the description for the --vm-round-digits option
These options are related, so it is better from the discoveribility PoV to cross-mention them.
2026-06-01 22:18:11 +02:00
Aliaksandr Valialkin
24d176fd2c app/vmagent: mention the -remoteWrite.significantFigures command-line flag in the description for the -remoteWrite.roundDigits flag
These flags are related, so it is better from the discoverity PoV to cross-mention them.
2026-06-01 22:17:16 +02:00
Hui Wang
e5c277237e app/vmalert: fix notifiers page in web UI appearing blank (#11036)
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11035.

There is no search box on the vmalert UI Notifiers page, but the notifier group containers have the `vm-group` CSS class applied. This class is hidden by default and only becomes visible after `filterRules()` adds the `vm-found` class. Since `filterRules()` is only called when a search box exists, it is never invoked on the Notifiers
page, leaving all content invisible.

The bug was introduced in v1.129.0 in 32ed45b672 (diff-e349265135dddcf960e58d2ada6be0fc18b76603f74c05107cbd1f348eb4d62b)

PR: https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11036

Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
2026-06-01 18:25:52 +03:00
113 changed files with 3772 additions and 5862 deletions

View File

@@ -34,8 +34,21 @@ var (
"This can be changed with -promscrape.config.strictParse=false command-line flag")
maxIngestionRate = flag.Int("maxIngestionRate", 0, "The maximum number of samples vmsingle can receive per second. Data ingestion is paused when the limit is exceeded. "+
"By default there are no limits on samples ingestion rate.")
vmselectMaxConcurrentRequests = flag.Int("search.maxConcurrentRequests", getDefaultMaxConcurrentRequests(), "The maximum number of concurrent search requests. "+
"It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. "+
"See also -search.maxQueueDuration and -search.maxMemoryPerQuery")
vmselectMaxQueueDuration = flag.Duration("search.maxQueueDuration", 10*time.Second, "The maximum time the request waits for execution when -search.maxConcurrentRequests "+
"limit is reached; see also -search.maxQueryDuration")
)
func getDefaultMaxConcurrentRequests() int {
// A single request can saturate all the CPU cores, so there is no sense
// in allowing higher number of concurrent requests - they will just contend
// for unavailable CPU time.
n := min(cgroup.AvailableCPUs()*2, 16)
return n
}
func main() {
// VictoriaMetrics is optimized for reduced memory allocations,
// so it can run with the reduced GOGC in order to reduce the used memory,
@@ -76,8 +89,8 @@ func main() {
}
logger.Infof("starting VictoriaMetrics at %q...", listenAddrs)
startTime := time.Now()
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
vmselect.Init()
vmstorage.Init(*vmselectMaxConcurrentRequests, promql.ResetRollupResultCacheIfNeeded)
vmselect.Init(*vmselectMaxConcurrentRequests, *vmselectMaxQueueDuration)
vminsertcommon.StartIngestionRateLimiter(*maxIngestionRate)
vminsert.Init()

View File

@@ -93,7 +93,7 @@ func selfScraper(scrapeInterval time.Duration) {
mr.Value = r.Value
}
}
if err := vmstorage.AddRows(mrs); err != nil {
if err := vmstorage.VMInsertAPI.WriteRows(mrs); err != nil {
logger.Errorf("cannot store self-scraped metrics: %s", err)
}
if len(metadataRows.Rows) > 0 {
@@ -105,7 +105,7 @@ func selfScraper(scrapeInterval time.Duration) {
Type: mm.Type,
})
}
if err := vmstorage.AddMetadataRows(mms); err != nil {
if err := vmstorage.VMInsertAPI.WriteMetadata(mms); err != nil {
logger.Errorf("cannot store self-scraped metrics metadata: %s", err)
}
}

View File

@@ -79,7 +79,8 @@ var (
"writing them to remote storage. "+
"Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. "+
"By default, digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. "+
"This option may be used for improving data compression for the stored metrics")
"This option may be used for improving data compression for the stored metrics. "+
"See also -remoteWrite.significantFigures")
sortLabels = flag.Bool("sortLabels", false, `Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. `+
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+

View File

@@ -52,7 +52,7 @@ func writeInputSeries(input []series, interval *promutil.Duration, startStamp ti
data := testutil.Compress(r)
// write input series to vm
httpWrite(dst, bytes.NewBuffer(data))
vmstorage.Storage.DebugFlush()
vmstorage.DebugFlush()
return nil
}

View File

@@ -108,7 +108,9 @@ func UnitTest(files []string, disableGroupLabel bool, externalLabels []string, e
storagePath = tmpFolder
processFlags()
vminsert.Init()
vmselect.Init()
const maxConcurrentRequests = 4
maxQueueDuration := 5 * time.Second
vmselect.Init(maxConcurrentRequests, maxQueueDuration)
// storagePath will be created again when closing vmselect, so remove it again.
defer fs.MustRemoveDir(storagePath)
defer vminsert.Stop()
@@ -279,7 +281,8 @@ func processFlags() {
}
func setUp() {
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
const maxConcurrentRequests = 4
vmstorage.Init(maxConcurrentRequests, promql.ResetRollupResultCacheIfNeeded)
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
readyCheckFunc := func() bool {
@@ -384,7 +387,7 @@ func (tg *testGroup) test(evalInterval time.Duration, groupOrderMap map[string]i
}
}
// flush series after each group evaluation
vmstorage.Storage.DebugFlush()
vmstorage.DebugFlush()
}
// check alert_rule_test case at every eval time

View File

@@ -95,6 +95,7 @@ type groupMetrics struct {
iterationTotal *metrics.Counter
iterationDuration *metrics.Summary
iterationMissed *metrics.Counter
iterationReset *metrics.Counter
iterationInterval *metrics.Gauge
}
@@ -330,6 +331,7 @@ func (g *Group) Init() {
g.metrics.iterationTotal = g.metrics.set.NewCounter(fmt.Sprintf(`vmalert_iteration_total{%s}`, labels))
g.metrics.iterationDuration = g.metrics.set.NewSummary(fmt.Sprintf(`vmalert_iteration_duration_seconds{%s}`, labels))
g.metrics.iterationMissed = g.metrics.set.NewCounter(fmt.Sprintf(`vmalert_iteration_missed_total{%s}`, labels))
g.metrics.iterationReset = g.metrics.set.NewCounter(fmt.Sprintf(`vmalert_iteration_reset_total{%s}`, labels))
g.metrics.iterationInterval = g.metrics.set.NewGauge(fmt.Sprintf(`vmalert_iteration_interval_seconds{%s}`, labels), func() float64 {
i := g.Interval.Seconds()
return i
@@ -474,14 +476,16 @@ func (g *Group) Start(ctx context.Context, rw remotewrite.RWClient, rr datasourc
if missed < 0 {
// missed can become < 0 due to irregular delays during evaluation
// which can result in time.Since(evalTS) < g.Interval;
// or the system wall clock was changed backward
missed = 0
// or the system wall clock was changed backward,
// Reset the evalTS to the current time.
evalTS = time.Now()
g.metrics.iterationReset.Inc()
} else {
evalTS = evalTS.Add((missed + 1) * g.Interval)
}
if missed > 0 {
g.metrics.iterationMissed.Inc()
}
evalTS = evalTS.Add((missed + 1) * g.Interval)
eval(evalCtx, evalTS)
}

View File

@@ -11,6 +11,8 @@ import (
"strconv"
"strings"
"github.com/VictoriaMetrics/metricsql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/rule"
@@ -160,12 +162,12 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/vmalert/api/v1/alerts", "/api/v1/alerts":
// path used by Grafana for ng alerting
gf, err := newGroupsFilter(r)
af, err := newAlertsFilter(r)
if err != nil {
errJson(w, r, err)
return true
}
data, err := rh.listAlerts(gf)
data, err := rh.listAlerts(af)
if err != nil {
errJson(w, r, err)
return true
@@ -325,6 +327,48 @@ func (gf *groupsFilter) matches(group *rule.Group) bool {
return true
}
type alertsFilter struct {
gf *groupsFilter
match [][]metricsql.LabelFilter
}
func getMatchFilters(matches []string) ([][]metricsql.LabelFilter, *httpserver.ErrorWithStatusCode) {
if len(matches) == 0 {
return nil, nil
}
tfss := make([][]metricsql.LabelFilter, 0, len(matches))
for _, s := range matches {
expr, err := metricsql.Parse(s)
if err != nil {
return nil, errResponse(fmt.Errorf(`invalid parameter "match[]": failed to parse %q: %w`, s, err), http.StatusBadRequest)
}
me, ok := expr.(*metricsql.MetricExpr)
if !ok {
return nil, errResponse(fmt.Errorf(`invalid parameter "match[]": expecting metricSelector; got %q`, expr.AppendString(nil)), http.StatusBadRequest)
}
if len(me.LabelFilterss) == 0 {
return nil, errResponse(fmt.Errorf(`invalid parameter "match[]": labelFilterss cannot be empty`), http.StatusBadRequest)
}
tfss = append(tfss, me.LabelFilterss...)
}
return tfss, nil
}
func newAlertsFilter(r *http.Request) (*alertsFilter, *httpserver.ErrorWithStatusCode) {
gf, err := newGroupsFilter(r)
if err != nil {
return nil, err
}
var af alertsFilter
af.gf = gf
af.match, err = getMatchFilters(r.Form["match[]"])
if err != nil {
return nil, err
}
return &af, nil
}
// see https://prometheus.io/docs/prometheus/latest/querying/api/#rules
type rulesFilter struct {
gf *groupsFilter
@@ -335,6 +379,7 @@ type rulesFilter struct {
maxGroups int
pageNum int
search string
match [][]metricsql.LabelFilter
extendedStates bool
}
@@ -355,7 +400,10 @@ func newRulesFilter(r *http.Request) (*rulesFilter, *httpserver.ErrorWithStatusC
return nil, errResponse(fmt.Errorf(`invalid parameter "type": not supported value %q`, ruleTypeParam), http.StatusBadRequest)
}
}
rf.match, err = getMatchFilters(r.Form["match[]"])
if err != nil {
return nil, err
}
states := vs["state"]
if len(states) == 0 {
states = vs["filter"]
@@ -416,12 +464,47 @@ func (rf *rulesFilter) matchesRule(r *rule.ApiRule) bool {
if len(rf.ruleNames) > 0 && !slices.Contains(rf.ruleNames, r.Name) {
return false
}
if !areLabelsMatch(r.Labels, rf.match) {
return false
}
if len(rf.states) == 0 {
return true
}
return slices.Contains(rf.states, r.State)
}
func areLabelsMatch(labels map[string]string, matches [][]metricsql.LabelFilter) bool {
if len(matches) == 0 {
return true
}
// labels need to match at least one of the provided match[] arg
return slices.ContainsFunc(matches, func(filters []metricsql.LabelFilter) bool {
for _, mf := range filters {
if !isLabelFilterMatch(labels[mf.Label], mf) {
return false
}
}
return true
})
}
func isLabelFilterMatch(s string, match metricsql.LabelFilter) bool {
if !match.IsRegexp {
if match.IsNegative {
return s != match.Value
}
return s == match.Value
}
re, err := metricsql.CompileRegexpAnchored(match.Value)
if err != nil {
return false
}
if match.IsNegative {
return !re.MatchString(s)
}
return re.MatchString(s)
}
func (rh *requestHandler) groups(rf *rulesFilter) *listGroupsResponse {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
@@ -543,14 +626,14 @@ func (rh *requestHandler) groupAlerts() []rule.GroupAlerts {
return gAlerts
}
func (rh *requestHandler) listAlerts(gf *groupsFilter) ([]byte, *httpserver.ErrorWithStatusCode) {
func (rh *requestHandler) listAlerts(af *alertsFilter) ([]byte, *httpserver.ErrorWithStatusCode) {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
lr := listAlertsResponse{Status: "success"}
lr.Data.Alerts = make([]*rule.ApiAlert, 0)
for _, group := range rh.m.groups {
if !gf.matches(group) {
if !af.gf.matches(group) {
continue
}
g := group.ToAPI()
@@ -558,7 +641,11 @@ func (rh *requestHandler) listAlerts(gf *groupsFilter) ([]byte, *httpserver.Erro
if r.Type != rule.TypeAlerting {
continue
}
lr.Data.Alerts = append(lr.Data.Alerts, r.Alerts...)
for _, alert := range r.Alerts {
if areLabelsMatch(alert.Labels, af.match) {
lr.Data.Alerts = append(lr.Data.Alerts, alert)
}
}
}
}

View File

@@ -348,7 +348,7 @@
typeK, ns := keys[i], targets[notifier.TargetType(keys[i])]
count := len(ns)
%}
<div class="w-100 flex-column vm-group">
<div class="w-100 flex-column">
<span class="d-flex justify-content-between" id="group-{%s typeK %}">
<a href="#group-{%s typeK %}">{%s typeK %} ({%d count %})</a>
<span
@@ -361,7 +361,7 @@
<div id="item-{%s typeK %}" class="collapse show">
<table class="table table-striped table-hover table-sm">
<thead>
<tr class="vm-item">
<tr>
<th scope="col">Labels</th>
<th scope="col">Address</th>
</tr>

View File

@@ -1115,7 +1115,7 @@ func StreamListTargets(qw422016 *qt422016.Writer, r *http.Request, targets map[n
//line app/vmalert/web.qtpl:350
qw422016.N().S(`
<div class="w-100 flex-column vm-group">
<div class="w-100 flex-column">
<span class="d-flex justify-content-between" id="group-`)
//line app/vmalert/web.qtpl:352
qw422016.E().S(typeK)
@@ -1152,7 +1152,7 @@ func StreamListTargets(qw422016 *qt422016.Writer, r *http.Request, targets map[n
qw422016.N().S(`" class="collapse show">
<table class="table table-striped table-hover table-sm">
<thead>
<tr class="vm-item">
<tr>
<th scope="col">Labels</th>
<th scope="col">Address</th>
</tr>

View File

@@ -10,6 +10,8 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/metricsql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
@@ -37,12 +39,14 @@ func TestHandler(t *testing.T) {
Concurrency: 1,
Rules: []config.Rule{
{
ID: 0,
Alert: "alert",
ID: 0,
Alert: "alert",
Labels: map[string]string{"job": "foo"},
},
{
ID: 1,
Record: "record",
Labels: map[string]string{"job": "bar"},
},
},
}, fq, 1*time.Minute, nil)
@@ -128,6 +132,18 @@ func TestHandler(t *testing.T) {
if length := len(lr.Data.Alerts); length != 2 {
t.Fatalf("expected 2 alert got %d", length)
}
lr = listAlertsResponse{}
getResp(t, ts.URL+`/api/v1/alerts?match[]={job="foo"}`, &lr, 200)
if length := len(lr.Data.Alerts); length != 3 {
t.Fatalf("expected 3 alerts got %d", length)
}
lr = listAlertsResponse{}
getResp(t, ts.URL+`/api/v1/alerts?match[]={job="bar"}`, &lr, 200)
if length := len(lr.Data.Alerts); length != 0 {
t.Fatalf("expected 0 alerts got %d", length)
}
})
t.Run("/api/v1/alert?alertID&groupID", func(t *testing.T) {
expAlert := rule.NewAlertAPI(ar, ar.GetAlerts()[0])
@@ -242,6 +258,13 @@ func TestHandler(t *testing.T) {
check("/vmalert/api/v1/rules?datasource_type=graphite", 200, 1, 2)
check("/vmalert/api/v1/rules?datasource_type=graphiti", 400, 0, 0)
// invalid match[] params
check(`/vmalert/api/v1/rules?match[]={job=!"foo"}`, 400, 0, 0)
check(`/vmalert/api/v1/rules?match[]={job="foo"}`, 200, 3, 3)
check(`/vmalert/api/v1/rules?match[]={job="bar"}`, 200, 3, 3)
check(`/vmalert/api/v1/rules?match[]={job="bar"}&match[]={job="foo"}`, 200, 3, 6)
check(`/vmalert/api/v1/rules?match[]={job="barzz"}`, 200, 0, 0)
// no filtering expected due to bad params
check("/api/v1/rules?type=badParam", 400, 0, 0)
check("/api/v1/rules?foo=bar", 200, 3, 6)
@@ -367,3 +390,116 @@ func TestEmptyResponse(t *testing.T) {
}
})
}
func TestMatchesRule(t *testing.T) {
parseMatch := func(t *testing.T, selectors []string) [][]metricsql.LabelFilter {
t.Helper()
var match [][]metricsql.LabelFilter
for _, s := range selectors {
expr, err := metricsql.Parse(s)
if err != nil {
t.Fatalf("failed to parse selector %q: %v", s, err)
}
me, ok := expr.(*metricsql.MetricExpr)
if !ok {
t.Fatalf("expected MetricExpr for %q, got %T", s, expr)
}
match = append(match, me.LabelFilterss...)
}
return match
}
f := func(t *testing.T, selectors []string, labels map[string]string, wantMatch bool) {
t.Helper()
rf := &rulesFilter{
gf: &groupsFilter{},
match: parseMatch(t, selectors),
}
r := &rule.ApiRule{Labels: labels}
got := rf.matchesRule(r)
if got != wantMatch {
t.Fatalf("matchesRule(%v) with selectors %v: got %v, want %v",
labels, selectors, got, wantMatch)
}
}
f(t, nil, map[string]string{"foo": "bar"}, true)
f(t, []string{`{foo="bar"}`}, map[string]string{"foo": "bar"}, true)
f(t, []string{`{foo="bar"}`}, map[string]string{"foo": "baz"}, false)
f(t, []string{`{foo="bar"}`}, map[string]string{"bar": "baz"}, false)
f(t, []string{`{foo=""}`}, map[string]string{"bar": "baz"}, true)
f(t, []string{`{foo!="bar"}`}, map[string]string{"foo": "baz"}, true)
f(t, []string{`{foo!="bar"}`}, map[string]string{"foo": "bar"}, false)
f(t, []string{`{foo=~"bar.*"}`}, map[string]string{"foo": "bar"}, true)
f(t, []string{`{foo=~"bar.*"}`}, map[string]string{"foo": "baz"}, false)
f(t, []string{`{bar=~"baz|bar"}`}, map[string]string{"bar": "baz"}, true)
f(t, []string{`{bar=~"baz|bar"}`}, map[string]string{"bar": "bar"}, true)
f(t, []string{`{bar=~"baz|bar"}`}, map[string]string{"bar": "foo"}, false)
f(t, []string{`{foo!~"bar.*"}`}, map[string]string{"foo": "baz"}, true)
f(t, []string{`{foo!~"bar.*"}`}, map[string]string{"foo": "bar"}, false)
// single match[] with multiple filters
f(t,
[]string{`{job="foo",instance="bar"}`},
map[string]string{"job": "foo", "instance": "bar"},
true,
)
f(t,
[]string{`{job="foo",instance="bar"}`},
map[string]string{"job": "other", "instance": "bar"},
false,
)
f(t,
[]string{`{foo="bar",baz=~"b.*"}`},
map[string]string{"foo": "bar", "baz": "bazinga"},
true,
)
f(t,
[]string{`{foo="bar",baz=~"b.*"}`},
map[string]string{"foo": "other", "baz": "bazinga"},
false,
)
// multiple matches[]
f(t,
[]string{`{foo="bar"}`, `{foo="baz"}`},
map[string]string{"foo": "baz"},
true,
)
f(t,
[]string{`{foo="bar"}`, `{foo="baz"}`},
map[string]string{"foo": "unknown"},
false,
)
f(t,
[]string{`{foo=~"bar.*"}`, `{bar=~"baz.*"}`},
map[string]string{"bar": "bazinga"},
true,
)
f(t,
[]string{`{foo=~"bar.*"}`, `{bar=~"baz.*"}`},
map[string]string{"foo": "bartender"},
true,
)
f(t,
[]string{`{foo=~"bar.*"}`, `{bar=~"baz.*"}`},
map[string]string{"foo": "other", "bar": "other"},
false,
)
f(t,
[]string{`{job="foo",instance="bar"}`, `{foo="bar"}`},
map[string]string{"foo": "bar"},
true,
)
f(t,
[]string{`{job="foo", instance="bar"}`, `{foo="bar"}`},
map[string]string{"instance": "barr", "job": "foo"},
false,
)
}

View File

@@ -889,7 +889,8 @@ func reloadAuthConfig() (bool, error) {
}
mp := authUsers.Load()
logger.Infof("loaded information about %d users from -auth.config=%q", len(*mp), *authConfigPath)
jwtc := jwtAuthCache.Load()
logger.Infof("loaded information about %d users from -auth.config=%q", len(*mp)+len(jwtc.users), *authConfigPath)
return true, nil
}

View File

@@ -317,7 +317,7 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tk
defer ui.endConcurrencyLimit()
// Process the request.
processRequest(w, r, ui, tkn)
processRequest(w, r, ui, tkn, userName)
}
func beginConcurrencyLimit(ctx context.Context) error {
@@ -391,7 +391,7 @@ func bufferRequestBody(ctx context.Context, r io.ReadCloser, userName string) (i
return bb, nil
}
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *jwt.Token) {
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *jwt.Token, userName string) {
u := normalizeURL(r.URL)
up, hc := ui.getURLPrefixAndHeaders(u, r.Host, r.Header)
isDefault := false
@@ -409,7 +409,7 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *j
if ui.DumpRequestOnErrors {
di = debugInfo(u, r)
}
httpserver.Errorf(w, r, "missing route for %q%s", u.String(), di)
httpserver.Errorf(w, r, "user %s missing route for %q%s", userName, u.String(), di)
return
}
up, hc = ui.DefaultURL, ui.HeadersConf
@@ -455,7 +455,7 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo, tkn *j
ui.backendErrors.Inc()
}
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("all the %d backends for the user %q are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend", up.getBackendsCount(), ui.name()),
Err: fmt.Errorf("all the %d backends for the user %q are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend", up.getBackendsCount(), userName),
StatusCode: http.StatusBadGateway,
}
httpserver.Errorf(w, r, "%s", err)

View File

@@ -307,6 +307,24 @@ statusCode=200
requested_url={BACKEND}/bar/a/b`
f(cfgStr, requestURL, backendHandler, responseExpected)
// correct authorization but unexisted path, hence missing route error.
cfgStr = `
users:
- username: foo
password: secret
url_map:
- src_paths:
- "/api/v1/write"
url_prefix: "{BACKEND}/bar"`
requestURL = "http://foo:secret@some-host.com/a/b"
backendHandler = func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "requested_url=http://%s%s", r.Host, r.URL)
}
responseExpected = `
statusCode=400
user foo missing route for "http://foo:secret@some-host.com/a/b"`
f(cfgStr, requestURL, backendHandler, responseExpected)
// verify how path cleanup works
cfgStr = `
unauthorized_user:
@@ -403,7 +421,7 @@ unauthorized_user:
}
responseExpected = `
statusCode=400
missing route for "http://some-host.com/abc?de=fg"`
user unauthorized missing route for "http://some-host.com/abc?de=fg"`
f(cfgStr, requestURL, backendHandler, responseExpected)
// missing default_url and default url_prefix for unauthorized user with dump_request_on_errors enabled
@@ -419,7 +437,7 @@ unauthorized_user:
}
responseExpected = `
statusCode=400
missing route for "http://some-host.com/abc?de=fg" (host: "some-host.com"; path: "/abc"; args: "de=fg"; headers:Connection: Some-Header,Other-Header
user unauthorized missing route for "http://some-host.com/abc?de=fg" (host: "some-host.com"; path: "/abc"; args: "de=fg"; headers:Connection: Some-Header,Other-Header
Pass-Header: abc
Some-Header: foobar
X-Forwarded-For: 12.34.56.78
@@ -461,7 +479,7 @@ unauthorized_user:
}
responseExpected = `
statusCode=502
all the 2 backends for the user "" are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend`
all the 2 backends for the user "unauthorized" are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend`
f(cfgStr, requestURL, backendHandler, responseExpected)
// all the backend_urls are unavailable for authorized user
@@ -501,7 +519,7 @@ unauthorized_user:
}
responseExpected = `
statusCode=502
all the 0 backends for the user "" are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend`
all the 0 backends for the user "unauthorized" are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend`
f(cfgStr, requestURL, backendHandler, responseExpected)
netutil.Resolver = origResolver
@@ -518,7 +536,7 @@ unauthorized_user:
}
responseExpected = `
statusCode=502
all the 2 backends for the user "" are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend`
all the 2 backends for the user "unauthorized" are unavailable for proxying the request - check previous WARN logs to see the exact error for each failed backend`
f(cfgStr, requestURL, backendHandler, responseExpected)
if n := retries.Load(); n != 2 {
t.Fatalf("unexpected number of retries; got %d; want 2", n)
@@ -545,6 +563,31 @@ requested_url={BACKEND}/path2/foo/?de=fg`
if n := retries.Load(); n != 2 {
t.Fatalf("unexpected number of retries; got %d; want 2", n)
}
// make sure that empty config value erases client extra filters and extra labels
cfgStr = `
unauthorized_user:
url_prefix: {BACKEND}/foo?bar=baz&extra_filters[]=&extra_label=&extra_filters=`
requestURL = "http://some-host.com/abc/def?some_arg=some_value&extra_filters[]=baz&extra_label=tenant=admin&extra_filters=bar"
backendHandler = func(w http.ResponseWriter, r *http.Request) {
h := w.Header()
h.Set("Connection", "close")
h.Set("Foo", "bar")
var bb bytes.Buffer
if err := r.Header.Write(&bb); err != nil {
panic(fmt.Errorf("unexpected error when marshaling headers: %w", err))
}
fmt.Fprintf(w, "requested_url=http://%s%s\n%s", r.Host, r.URL, bb.String())
}
responseExpected = `
statusCode=200
Foo: bar
requested_url={BACKEND}/foo/abc/def?bar=baz&extra_filters=&extra_filters%5B%5D=&extra_label=&some_arg=some_value
Pass-Header: abc
User-Agent: vmauth
X-Forwarded-For: 12.34.56.78, 42.2.3.84`
f(cfgStr, requestURL, backendHandler, responseExpected)
}
func TestJWTRequestHandler(t *testing.T) {

View File

@@ -146,7 +146,8 @@ var (
Name: vmRoundDigits,
Value: 100,
Usage: "Round metric values to the given number of decimal digits after the point. " +
"This option may be used for increasing on-disk compression level for the stored metrics",
"This option may be used for increasing on-disk compression level for the stored metrics. " +
"See also --vm-significant-figures option",
},
&cli.StringSliceFlag{
Name: vmExtraLabel,
@@ -500,6 +501,96 @@ var (
}
)
const (
mimirPath = "mimir-path"
mimirTenantID = "mimir-tenant-id"
mimirConcurrency = "mimir-concurrency"
mimirFilterTimeStart = "mimir-filter-time-start"
mimirFilterTimeEnd = "mimir-filter-time-end"
mimirFilterLabel = "mimir-filter-label"
mimirFilterLabelValue = "mimir-filter-label-value"
mimirCredsFilePath = "mimir-creds-file-path"
mimirConfigFilePath = "mimir-config-file-path"
mimirConfigProfile = "mimir-config-profile"
mimirCustomS3Endpoint = "mimir-custom-s3-endpoint"
mimirS3ForcePathStyle = "mimir-s3-force-path-style"
mimirS3TLSInsecureSkipVerify = "mimir-s3-tls-insecure-skip-verify"
mimirSSEKMSKeyID = "mimir-s3-sse-kms-key-id"
mimirSSEAlgorithm = "mimir-s3-sse-algorithm"
)
var (
mimirFlags = []cli.Flag{
&cli.StringFlag{
Name: mimirPath,
Usage: "Path to Mimir storage bucket or local folder.",
Required: true,
},
&cli.StringFlag{
Name: mimirTenantID,
Usage: "Tenant ID for Mimir storage",
},
&cli.IntFlag{
Name: mimirConcurrency,
Usage: "Number of concurrently running block readers",
Value: 1,
},
&cli.StringFlag{
Name: mimirFilterTimeStart,
Usage: "The time filter in RFC3339 format to select timeseries with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'",
Required: true,
},
&cli.StringFlag{
Name: mimirFilterTimeEnd,
Usage: "The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'",
Required: true,
},
&cli.StringFlag{
Name: mimirFilterLabel,
Usage: "Mimir label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.",
},
&cli.StringFlag{
Name: mimirFilterLabelValue,
Usage: fmt.Sprintf("Regular expression to filter label from %q flag.", mimirFilterLabel),
Value: ".*",
},
&cli.StringFlag{
Name: mimirCredsFilePath,
Usage: "Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html",
},
&cli.StringFlag{
Name: mimirConfigFilePath,
Usage: "Path to file with S3 configs. Configs are loaded from default location if not set. See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html",
},
&cli.StringFlag{
Name: mimirConfigProfile,
Usage: "Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used",
},
&cli.StringFlag{
Name: mimirCustomS3Endpoint,
Usage: "Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set",
},
&cli.BoolFlag{
Name: mimirS3ForcePathStyle,
Usage: "Prefixing endpoint with bucket name when set false, true by default.",
Value: true,
},
&cli.BoolFlag{
Name: mimirS3TLSInsecureSkipVerify,
Usage: "Whether to skip TLS verification when connecting to the S3 endpoint.",
},
&cli.StringFlag{
Name: mimirSSEKMSKeyID,
Usage: "SSE KMS Key ID for use with S3-compatible storages.",
},
&cli.StringFlag{
Name: mimirSSEAlgorithm,
Usage: "SSE algorithm for use with S3-compatible storages.",
},
}
)
const (
vmNativeFilterMatch = "vm-native-filter-match"
vmNativeFilterTimeStart = "vm-native-filter-time-start"

View File

@@ -18,6 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/backoff"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/barpool"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/mimir"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/native"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/remoteread"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -297,12 +298,12 @@ func main() {
},
},
{
Name: "thanos",
Usage: "Migrate time series from Thanos blocks (supports raw and downsampled data)",
Flags: mergeFlags(globalFlags, thanosFlags, vmFlags),
Name: "mimir",
Usage: "Migrate time series from Mimir object storage or local filesystem",
Flags: mergeFlags(globalFlags, mimirFlags, vmFlags),
Before: beforeFn,
Action: func(c *cli.Context) error {
fmt.Println("Thanos import mode")
fmt.Println("Mimir import mode")
vmCfg, err := initConfigVM(c)
if err != nil {
@@ -314,6 +315,54 @@ func main() {
return fmt.Errorf("failed to create VM importer: %s", err)
}
mCfg := mimir.Config{
Filter: mimir.Filter{
TimeMin: c.String(mimirFilterTimeStart),
TimeMax: c.String(mimirFilterTimeEnd),
Label: c.String(mimirFilterLabel),
LabelValue: c.String(mimirFilterLabelValue),
},
Path: c.String(mimirPath),
TenantID: c.String(mimirTenantID),
CredsFilePath: c.String(mimirCredsFilePath),
ConfigFilePath: c.String(mimirConfigFilePath),
ConfigProfile: c.String(mimirConfigProfile),
CustomS3Endpoint: c.String(mimirCustomS3Endpoint),
S3ForcePathStyle: c.Bool(mimirS3ForcePathStyle),
S3TLSInsecureSkipVerify: c.Bool(mimirS3TLSInsecureSkipVerify),
SSEKMSKeyID: c.String(mimirSSEKMSKeyID),
SSEAlgorithm: c.String(mimirSSEAlgorithm),
}
cl, err := mimir.NewClient(ctx, mCfg)
if err != nil {
return fmt.Errorf("failed to create mimir client: %s", err)
}
pp := prometheusProcessor{
cl: cl,
im: importer,
cc: c.Int(mimirConcurrency),
isVerbose: c.Bool(globalVerbose),
}
return pp.run(ctx)
},
},
{
Name: "thanos",
Usage: "Migrate time series from Thanos blocks (supports raw and downsampled data)",
Flags: mergeFlags(globalFlags, thanosFlags, vmFlags),
Before: beforeFn,
Action: func(c *cli.Context) error {
fmt.Println("Thanos import mode")
vmCfg, err := initConfigVM(c)
if err != nil {
return fmt.Errorf("failed to init VM configuration: %s", err)
}
importer, err = vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
thanosCfg := thanos.Config{
Snapshot: c.String(thanosSnapshot),
Filter: thanos.Filter{

View File

@@ -0,0 +1,195 @@
package mimir
import (
"fmt"
"log"
"os"
"path/filepath"
"sync"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/tombstones"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/common"
)
var _ tsdb.BlockReader = (*lazyBlockReader)(nil)
// lazyBlockReader is stores block id and segment num information.
// It is used to lazily fetch and parse block data.
// It implements tsdb.BlockReader interface.
type lazyBlockReader struct {
// Block ID.
ID ulid.ULID
// SegmentsNum stores the number of chunks segments in the block.
SegmentsNum int
mu sync.Mutex
reader *tsdb.Block
tempDirPath string
fs common.RemoteFS
err error
}
// newLazyBlockReader returns a new LazyBlockReader for the given block.
func newLazyBlockReader(block *Block, fs common.RemoteFS) (*lazyBlockReader, error) {
if block.SegmentsFormat != "1b6d" {
return nil, fmt.Errorf("unsupported segments format: %s", block.SegmentsFormat)
}
return &lazyBlockReader{
ID: block.ID,
SegmentsNum: block.SegmentsNum,
fs: fs,
}, nil
}
func (lbr *lazyBlockReader) initialize() error {
lbr.mu.Lock()
defer lbr.mu.Unlock()
if lbr.reader != nil {
return nil
}
// fetching block and parse it and store it in lbr.reader
temp, err := lbr.mkTempDir()
if err != nil {
return fmt.Errorf("failed to create temp dir: %s", err)
}
lbr.tempDirPath = temp
// TODO: replace fetchFile and writeFile with buffered IO if needed
meta, err := lbr.fetchFile(metaFilename)
if err != nil {
return err
}
if err := lbr.writeFile(temp, metaFilename, meta); err != nil {
return fmt.Errorf("failed to write meta file: %w", err)
}
idx, err := lbr.fetchFile(indexFilename)
if err != nil {
return fmt.Errorf("failed to fetch index file %q: %w", indexFilename, err)
}
if err := lbr.writeFile(temp, indexFilename, idx); err != nil {
return err
}
for i := 1; i <= lbr.SegmentsNum; i++ {
// segments formats has format 1b06d
// https://github.com/grafana/mimir/blob/main/pkg/storage/tsdb/bucketindex/index.go#L32
chunkName := fmt.Sprintf("%06d", i)
blockChunkPath := filepath.Join("chunks", chunkName)
chunk, err := lbr.fetchFile(blockChunkPath)
if err != nil {
return fmt.Errorf("failed to fetch chunk file: %q: %w", chunkName, err)
}
if err := lbr.writeFile(temp, blockChunkPath, chunk); err != nil {
return fmt.Errorf("failed to write chunk file: %q: %s", chunkName, err)
}
}
// Set postingDecoder to nil because
// If it is nil then a default decoder is used, compatible with Prometheus v2.
pb, err := tsdb.OpenBlock(nil, temp, nil, nil)
if err != nil {
return fmt.Errorf("failed to open block %q: %w", lbr.ID, err)
}
lbr.reader = pb
return nil
}
// Index returns an IndexReader over the block's data.
func (lbr *lazyBlockReader) Index() (tsdb.IndexReader, error) {
if err := lbr.initialize(); err != nil {
return nil, err
}
return lbr.reader.Index()
}
// Chunks returns a ChunkReader over the block's data.
func (lbr *lazyBlockReader) Chunks() (tsdb.ChunkReader, error) {
if err := lbr.initialize(); err != nil {
return nil, err
}
return lbr.reader.Chunks()
}
// Tombstones returns a tombstones.Reader over the block's deleted data.
func (lbr *lazyBlockReader) Tombstones() (tombstones.Reader, error) {
if err := lbr.initialize(); err != nil {
return nil, err
}
return lbr.reader.Tombstones()
}
// Meta provides meta information about the block reader.
func (lbr *lazyBlockReader) Meta() tsdb.BlockMeta {
if err := lbr.initialize(); err != nil {
lbr.err = fmt.Errorf("cannot get BlockMeta: %w", err)
return tsdb.BlockMeta{}
}
return lbr.reader.Meta()
}
// Size returns the number of bytes that the block takes up on disk.
func (lbr *lazyBlockReader) Size() int64 {
if err := lbr.initialize(); err != nil {
lbr.err = fmt.Errorf("error get Size of the block: %s, return zero size", err)
return 0
}
return lbr.reader.Size()
}
// Err returns the last error that occurred on the block reader.
func (lbr *lazyBlockReader) Err() error {
return lbr.err
}
// Close closes block and releases all resources
func (lbr *lazyBlockReader) Close() error {
lbr.mu.Lock()
defer lbr.mu.Unlock()
if lbr.reader == nil {
return nil
}
err := lbr.reader.Close()
if err := os.RemoveAll(lbr.tempDirPath); err != nil {
log.Printf("failed to remove temp dir: %s", err)
}
lbr.reader = nil
lbr.tempDirPath = ""
return err
}
func (lbr *lazyBlockReader) mkTempDir() (string, error) {
temp, err := os.MkdirTemp("", lbr.ID.String())
if err != nil {
return "", fmt.Errorf("failed to create temp dir: %s", err)
}
err = os.Mkdir(filepath.Join(temp, "chunks"), os.ModePerm)
if err != nil {
return "", fmt.Errorf("failed to create temp dir: %s", err)
}
return temp, nil
}
func (lbr *lazyBlockReader) fetchFile(filePath string) ([]byte, error) {
blockID := lbr.ID.String()
blockPath := filepath.Join(blockID, filePath)
has, err := lbr.fs.HasFile(blockPath)
if err != nil {
return nil, err
}
if !has {
return nil, fmt.Errorf("block meta %s not found", blockID)
}
return lbr.fs.ReadFile(blockPath)
}
func (lbr *lazyBlockReader) writeFile(folder string, filename string, file []byte) error {
fileName := filepath.Join(folder, filename)
return os.WriteFile(fileName, file, os.ModePerm)
}

238
app/vmctl/mimir/mimir.go Normal file
View File

@@ -0,0 +1,238 @@
package mimir
import (
"bytes"
"compress/gzip"
"context"
"encoding/json"
"fmt"
"log"
"github.com/oklog/ulid/v2"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/tsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/prometheus"
utils "github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vmctlutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/common"
)
const (
bucketIndex = "bucket-index.json"
bucketIndexCompressedFilename = bucketIndex + ".gz"
metaFilename = "meta.json"
indexFilename = "index"
)
// BlockDeletionMark holds the information about a block's deletion mark in the index.
// This type was copied from the mimir repository https://github.com/grafana/mimir/blob/main/pkg/storage/tsdb/bucketindex/index.go#L234.
type BlockDeletionMark struct {
// Block ID.
ID ulid.ULID `json:"block_id"`
// DeletionTime is a unix timestamp (seconds precision) of when the block was marked to be deleted.
DeletionTime int64 `json:"deletion_time"`
}
// Block holds the information about a block in the index.
// This is a partial implementation of the https://github.com/grafana/mimir/blob/main/pkg/storage/tsdb/bucketindex/index.go#L73
type Block struct {
// Block ID.
ID ulid.ULID `json:"block_id"`
// MinTime and MaxTime specify the time range all samples in the block are in (millis precision).
MinTime int64 `json:"min_time"`
MaxTime int64 `json:"max_time"`
// SegmentsFormat and SegmentsNum stores the format and number of chunks segments
// in the block.
SegmentsFormat string `json:"segments_format,omitempty"`
SegmentsNum int `json:"segments_num,omitempty"`
}
// Index contains all known blocks and markers of a tenant.
// This is a partial implementation pof the https://github.com/grafana/mimir/blob/main/pkg/storage/tsdb/bucketindex/index.go#L36
type Index struct {
// Version of the index format.
Version int `json:"version"`
// List of complete blocks (partial blocks are excluded from the index).
Blocks []*Block `json:"blocks"`
}
// Config contains a list of params needed
// for reading mimir snapshots
type Config struct {
// Path to remote storage bucket
Path string
// TenantID is the tenant id for the storage
TenantID string
Filter Filter
CredsFilePath string
ConfigFilePath string
ConfigProfile string
CustomS3Endpoint string
S3ForcePathStyle bool
S3TLSInsecureSkipVerify bool
SSEKMSKeyID string
SSEAlgorithm string
}
// Filter contains configuration for filtering
// the timeseries
type Filter struct {
TimeMin string
TimeMax string
Label string
LabelValue string
}
// Client is a wrapper over Prometheus tsdb.DBReader
type Client struct {
common.RemoteFS
filter filter
}
type filter struct {
min, max int64
label string
labelValue string
}
func (f filter) inRange(minTime, maxTime int64) bool {
fmin, fmax := f.min, f.max
if minTime == 0 {
fmin = minTime
}
if fmax == 0 {
fmax = maxTime
}
return minTime <= fmax && fmin <= maxTime
}
// NewClient creates and validates new Client
// with given Config
func NewClient(ctx context.Context, cfg Config) (*Client, error) {
if cfg.Path == "" {
return nil, fmt.Errorf("path cannot be empty")
}
if cfg.TenantID != "" {
cfg.Path = fmt.Sprintf("%s/%s", cfg.Path, cfg.TenantID)
}
var c Client
rfs, err := newRemoteFS(ctx, cfg)
if err != nil {
return nil, fmt.Errorf("cannot parse `-src`=%q: %w", cfg.Path, err)
}
c.RemoteFS = rfs
timeMin, err := utils.ParseTime(cfg.Filter.TimeMin)
if err != nil {
return nil, fmt.Errorf("failed to parse min time in filter: %s", err)
}
timeMax, err := utils.ParseTime(cfg.Filter.TimeMax)
if err != nil {
return nil, fmt.Errorf("failed to parse max time in filter: %s", err)
}
c.filter = filter{
min: timeMin.UnixMilli(),
max: timeMax.UnixMilli(),
label: cfg.Filter.Label,
labelValue: cfg.Filter.LabelValue,
}
return &c, nil
}
// Explore a fetches bucket-index.json file from a remote storage or local filesystem
// and filter blocks via the defined time range, but does not take into account label filters.
func (c *Client) Explore() ([]tsdb.BlockReader, error) {
log.Printf("Fetching blocks from remote storage")
indexFile, err := c.fetchIndexFile()
if err != nil {
return nil, fmt.Errorf("failed to fetch index file: %s", err)
}
var blocksToImport []tsdb.BlockReader
for _, block := range indexFile.Blocks {
if !c.filter.inRange(block.MinTime, block.MaxTime) {
// Skipping block outside of time range
continue
}
if block.ID.String() == "" {
continue
}
lazyBlockReader, err := newLazyBlockReader(block, c.RemoteFS)
if err != nil {
return nil, fmt.Errorf("failed to create lazy block reader: %s", err)
}
blocksToImport = append(blocksToImport, lazyBlockReader)
}
return blocksToImport, nil
}
// Read reads the given BlockReader according to configured
// time and label filters.
func (c *Client) Read(ctx context.Context, block tsdb.BlockReader) (*prometheus.CloseableSeriesSet, error) {
meta := block.Meta()
if b, ok := block.(*lazyBlockReader); ok && b.Err() != nil {
return nil, fmt.Errorf("failed to read block: %s", b.Err())
}
if meta.ULID.String() == "" {
return nil, fmt.Errorf("unexpected block without id")
}
minTime, maxTime := meta.MinTime, meta.MaxTime
if c.filter.min != 0 {
minTime = c.filter.min
}
if c.filter.max != 0 {
maxTime = c.filter.max
}
q, err := tsdb.NewBlockQuerier(block, minTime, maxTime)
if err != nil {
return nil, err
}
ss := q.Select(ctx, false, nil, labels.MustNewMatcher(labels.MatchRegexp, c.filter.label, c.filter.labelValue))
return &prometheus.CloseableSeriesSet{SeriesSet: ss, Close: q.Close}, nil
}
func (c *Client) fetchIndexFile() (*Index, error) {
has, err := c.HasFile(bucketIndexCompressedFilename)
if err != nil {
return nil, err
}
if !has {
return nil, fmt.Errorf("bucket-index.json.gz not found")
}
file, err := c.ReadFile(bucketIndexCompressedFilename)
if err != nil {
return nil, fmt.Errorf("failed to read bucket index: %s", err)
}
r := bytes.NewReader(file)
// Read all the content.
gzipReader, err := gzip.NewReader(r)
if err != nil {
return nil, fmt.Errorf("failed to create gzip reader: %s", err)
}
var indexFile Index
err = json.NewDecoder(gzipReader).Decode(&indexFile)
if err != nil {
return nil, fmt.Errorf("failed to decode bucket index: %s", err)
}
return &indexFile, nil
}

View File

@@ -0,0 +1,93 @@
package mimir
import (
"context"
"fmt"
"path/filepath"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/azremote"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/fsremote"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/gcsremote"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/backup/s3remote"
)
// newRemoteFS returns new remote fs from the given Config.
func newRemoteFS(ctx context.Context, cfg Config) (common.RemoteFS, error) {
if len(cfg.Path) == 0 {
return nil, fmt.Errorf("path cannot be empty")
}
n := strings.Index(cfg.Path, "://")
if n < 0 {
return nil, fmt.Errorf("missing scheme in path %q. Supported schemes: `gs://`, `s3://`, `azblob://`, `fs://`", cfg.Path)
}
scheme := cfg.Path[:n]
dir := cfg.Path[n+len("://"):]
switch scheme {
case "fs":
if !filepath.IsAbs(dir) {
return nil, fmt.Errorf("dir must be absolute; got %q", dir)
}
fsr := &fsremote.FS{
Dir: filepath.Clean(dir),
}
return fsr, nil
case "gcs", "gs":
n := strings.Index(dir, "/")
if n < 0 {
return nil, fmt.Errorf("missing directory on the gcs bucket %q", dir)
}
bucket := dir[:n]
dir = dir[n:]
fsr := &gcsremote.FS{
CredsFilePath: cfg.CredsFilePath,
Bucket: bucket,
Dir: dir,
}
if err := fsr.Init(ctx); err != nil {
return nil, fmt.Errorf("cannot initialize connection to gcs: %w", err)
}
return fsr, nil
case "azblob":
n := strings.Index(dir, "/")
if n < 0 {
return nil, fmt.Errorf("missing directory on the AZBlob container %q", dir)
}
bucket := dir[:n]
dir = dir[n:]
fsr := &azremote.FS{
Container: bucket,
Dir: dir,
}
if err := fsr.Init(ctx); err != nil {
return nil, fmt.Errorf("cannot initialize connection to AZBlob: %w", err)
}
return fsr, nil
case "s3":
n := strings.Index(dir, "/")
if n < 0 {
return nil, fmt.Errorf("missing directory on the s3 bucket %q", dir)
}
bucket := dir[:n]
dir = dir[n:]
fsr := &s3remote.FS{
CredsFilePath: cfg.CredsFilePath,
ConfigFilePath: cfg.ConfigFilePath,
CustomEndpoint: cfg.CustomS3Endpoint,
TLSInsecureSkipVerify: cfg.S3TLSInsecureSkipVerify,
S3ForcePathStyle: cfg.S3ForcePathStyle,
ProfileName: cfg.ConfigProfile,
Bucket: bucket,
Dir: dir,
SSEKMSKeyId: cfg.SSEKMSKeyID,
SSEAlgorithm: s3remote.StringToEncryptionAlgorithm(cfg.SSEAlgorithm),
}
if err := fsr.Init(ctx); err != nil {
return nil, fmt.Errorf("cannot initialize connection to s3: %w", err)
}
return fsr, nil
default:
return nil, fmt.Errorf("unsupported scheme %q", scheme)
}
}

View File

@@ -3,6 +3,7 @@ package main
import (
"context"
"fmt"
"io"
"log"
"strings"
"sync"
@@ -18,10 +19,17 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
)
// Runner is an interface for fetching and reading
// snapshot blocks
type Runner interface {
Explore() ([]tsdb.BlockReader, error)
Read(context.Context, tsdb.BlockReader) (*prometheus.CloseableSeriesSet, error)
}
type prometheusProcessor struct {
// prometheus client fetches and reads
// Runner fetches and reads
// snapshot blocks
cl *prometheus.Client
cl Runner
// importer performs import requests
// for timeseries data returned from
// snapshot blocks
@@ -48,7 +56,7 @@ func (pp *prometheusProcessor) run(ctx context.Context) error {
return nil
}
if err := pp.processBlocks(blocks); err != nil {
if err := pp.processBlocks(ctx, blocks); err != nil {
return fmt.Errorf("migration failed: %s", err)
}
@@ -57,11 +65,17 @@ func (pp *prometheusProcessor) run(ctx context.Context) error {
return nil
}
func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
ss, err := pp.cl.Read(b)
func (pp *prometheusProcessor) do(ctx context.Context, b tsdb.BlockReader) error {
css, err := pp.cl.Read(ctx, b)
if err != nil {
return fmt.Errorf("failed to read block: %s", err)
}
defer func() {
if err := css.Close(); err != nil {
log.Printf("cannot close SeriesSet for block: %q : %s\n", b.Meta().ULID, err)
}
}()
ss := css.SeriesSet
var it chunkenc.Iterator
for ss.Next() {
var name string
@@ -114,7 +128,7 @@ func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
return ss.Err()
}
func (pp *prometheusProcessor) processBlocks(blocks []tsdb.BlockReader) error {
func (pp *prometheusProcessor) processBlocks(ctx context.Context, blocks []tsdb.BlockReader) error {
promBlocksTotal.Add(len(blocks))
bar := barpool.AddWithTemplate(fmt.Sprintf(barTpl, "Processing blocks"), len(blocks))
if err := barpool.Start(); err != nil {
@@ -130,11 +144,16 @@ func (pp *prometheusProcessor) processBlocks(blocks []tsdb.BlockReader) error {
for range pp.cc {
wg.Go(func() {
for br := range blockReadersCh {
if err := pp.do(br); err != nil {
if err := pp.do(ctx, br); err != nil {
promErrorsTotal.Inc()
errCh <- fmt.Errorf("read failed for block %q: %s", br.Meta().ULID, err)
errCh <- fmt.Errorf("cannot read block %q: %s", br.Meta().ULID, err)
return
}
if cb, ok := br.(io.Closer); ok {
if err := cb.Close(); err != nil {
errCh <- fmt.Errorf("cannot close block: %q: %w", br.Meta().ULID, err)
}
}
promBlocksProcessed.Inc()
bar.Increment()
}

View File

@@ -8,6 +8,8 @@ import (
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/storage"
"github.com/prometheus/prometheus/tsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vmctlutil"
)
// Config contains a list of params needed
@@ -60,13 +62,13 @@ func NewClient(cfg Config) (*Client, error) {
return nil, fmt.Errorf("failed to open snapshot %q: %s", cfg.Snapshot, err)
}
c := &Client{DBReadOnly: db}
minTime, maxTime, err := parseTime(cfg.Filter.TimeMin, cfg.Filter.TimeMax)
timeMin, timeMax, err := parseTime(cfg.Filter.TimeMin, cfg.Filter.TimeMax)
if err != nil {
return nil, fmt.Errorf("failed to parse time in filter: %s", err)
}
c.filter = filter{
min: minTime,
max: maxTime,
min: timeMin,
max: timeMax,
label: cfg.Filter.Label,
labelValue: cfg.Filter.LabelValue,
}
@@ -83,7 +85,7 @@ func (c *Client) Explore() ([]tsdb.BlockReader, error) {
if err != nil {
return nil, fmt.Errorf("failed to fetch blocks: %s", err)
}
s := &Stats{
s := &vmctlutil.Stats{
Filtered: c.filter.min != 0 || c.filter.max != 0 || c.filter.label != "",
Blocks: len(blocks),
}
@@ -108,9 +110,15 @@ func (c *Client) Explore() ([]tsdb.BlockReader, error) {
return blocksToImport, nil
}
// CloseableSeriesSet defines a SeriesSet with Close method
type CloseableSeriesSet struct {
SeriesSet storage.SeriesSet
Close func() error
}
// Read reads the given BlockReader according to configured
// time and label filters.
func (c *Client) Read(block tsdb.BlockReader) (storage.SeriesSet, error) {
func (c *Client) Read(ctx context.Context, block tsdb.BlockReader) (*CloseableSeriesSet, error) {
minTime, maxTime := block.Meta().MinTime, block.Meta().MaxTime
if c.filter.min != 0 {
minTime = c.filter.min
@@ -122,8 +130,8 @@ func (c *Client) Read(block tsdb.BlockReader) (storage.SeriesSet, error) {
if err != nil {
return nil, err
}
ss := q.Select(context.Background(), false, nil, labels.MustNewMatcher(labels.MatchRegexp, c.filter.label, c.filter.labelValue))
return ss, nil
ss := q.Select(ctx, false, nil, labels.MustNewMatcher(labels.MatchRegexp, c.filter.label, c.filter.labelValue))
return &CloseableSeriesSet{ss, q.Close}, nil
}
func parseTime(start, end string) (int64, int64, error) {

View File

@@ -1,4 +1,4 @@
package prometheus
package vmctlutil
import (
"fmt"
@@ -18,7 +18,7 @@ type Stats struct {
// String returns string representation for s.
func (s Stats) String() string {
str := fmt.Sprintf("Prometheus snapshot stats:\n"+
str := fmt.Sprintf("Snapshot stats:\n"+
" blocks found: %d;\n"+
" blocks skipped by time filter: %d;\n"+
" min time: %d (%v);\n"+

View File

@@ -184,7 +184,7 @@ func (ctx *InsertCtx) WriteMetadata(mmpbs []prompb.MetricMetadata) error {
}
ctx.mms = mms
err := vmstorage.AddMetadataRows(mms)
err := vmstorage.VMInsertAPI.WriteMetadata(mms)
if err != nil {
return &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot store metrics metadata: %w", err),
@@ -209,7 +209,7 @@ func (ctx *InsertCtx) WritePromMetadata(mmps []prometheus.Metadata) error {
}
ctx.mms = mms
err := vmstorage.AddMetadataRows(mms)
err := vmstorage.VMInsertAPI.WriteMetadata(mms)
if err != nil {
return &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot store prometheus metrics metadata: %w", err),
@@ -278,7 +278,7 @@ func (ctx *InsertCtx) FlushBufs() error {
// since the number of concurrent FlushBufs() calls should be already limited via writeconcurrencylimiter
// used at every stream.Parse() call under lib/protoparser/*
err := vmstorage.AddRows(ctx.mrs)
err := vmstorage.VMInsertAPI.WriteRows(ctx.mrs)
ctx.Reset(0)
if err == nil {
return nil

View File

@@ -283,7 +283,7 @@ func pushAggregateSeries(tss []prompb.TimeSeries) {
}
// There is no need in limiting the number of concurrent calls to vmstorage.AddRows() here,
// since the number of concurrent pushAggregateSeries() calls should be already limited by lib/streamaggr.
if err := vmstorage.AddRows(ctx.mrs); err != nil {
if err := vmstorage.VMInsertAPI.WriteRows(ctx.mrs); err != nil {
logger.Errorf("cannot flush aggregate series: %s", err)
}
}

View File

@@ -1,7 +1,6 @@
package graphite
import (
"flag"
"fmt"
"math"
"net/http"
@@ -21,8 +20,6 @@ import (
"github.com/VictoriaMetrics/metricsql"
)
var maxTagValueSuffixes = flag.Int("search.maxTagValueSuffixesPerSearch", 100e3, "The maximum number of tag value suffixes returned from /metrics/find")
// MetricsFindHandler implements /metrics/find handler.
//
// See https://graphite-api.readthedocs.io/en/latest/api.html#metrics-find
@@ -222,10 +219,11 @@ func MetricsIndexHandler(startTime time.Time, w http.ResponseWriter, r *http.Req
// metricsFind searches for label values that match the given qHead and qTail.
func metricsFind(tr storage.TimeRange, label, qHead, qTail string, delimiter byte, isExpand bool, deadline searchutil.Deadline) ([]string, error) {
maxSuffixes := 0 // let vmstorage use its maxTagValueSuffixesPerSearch limit
n := strings.IndexAny(qTail, "*{[")
if n < 0 {
query := qHead + qTail
suffixes, err := netstorage.TagValueSuffixes(nil, tr, label, query, delimiter, *maxTagValueSuffixes, deadline)
suffixes, err := netstorage.TagValueSuffixes(nil, tr, label, query, delimiter, maxSuffixes, deadline)
if err != nil {
return nil, err
}
@@ -245,7 +243,7 @@ func metricsFind(tr storage.TimeRange, label, qHead, qTail string, delimiter byt
}
if n == len(qTail)-1 && strings.HasSuffix(qTail, "*") {
query := qHead + qTail[:len(qTail)-1]
suffixes, err := netstorage.TagValueSuffixes(nil, tr, label, query, delimiter, *maxTagValueSuffixes, deadline)
suffixes, err := netstorage.TagValueSuffixes(nil, tr, label, query, delimiter, maxSuffixes, deadline)
if err != nil {
return nil, err
}

View File

@@ -138,7 +138,9 @@ func registerMetrics(startTime time.Time, w http.ResponseWriter, r *http.Request
mr.MetricNameRaw = storage.MarshalMetricNameRaw(mr.MetricNameRaw[:0], labels)
mr.Timestamp = ct
}
vmstorage.RegisterMetricNames(nil, mrs)
if err := vmstorage.VMSelectAPI.RegisterMetricNames(nil, mrs, 0); err != nil {
return err
}
// Return response
contentType := "text/plain; charset=utf-8"

View File

@@ -21,7 +21,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/stats"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
@@ -36,12 +35,6 @@ var (
deleteAuthKey = flagutil.NewPassword("deleteAuthKey", "authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries. It could be passed via authKey query arg. It overrides -httpAuth.*")
metricNamesStatsResetAuthKey = flagutil.NewPassword("metricNamesStatsResetAuthKey", "authKey for resetting metric names usage cache via /api/v1/admin/status/metric_names_stats/reset. It overrides -httpAuth.*. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#track-ingested-metrics-usage")
maxConcurrentRequests = flag.Int("search.maxConcurrentRequests", getDefaultMaxConcurrentRequests(), "The maximum number of concurrent search requests. "+
"It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. "+
"See also -search.maxQueueDuration and -search.maxMemoryPerQuery")
maxQueueDuration = flag.Duration("search.maxQueueDuration", 10*time.Second, "The maximum time the request waits for execution when -search.maxConcurrentRequests "+
"limit is reached; see also -search.maxQueryDuration")
resetCacheAuthKey = flagutil.NewPassword("search.resetCacheAuthKey", "Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call. It could be passed via authKey query arg. It overrides -httpAuth.*")
logSlowQueryDuration = flag.Duration("search.logSlowQueryDuration", 5*time.Second, "Log queries with execution time exceeding this value. Zero disables slow query logging. "+
"See also -search.logQueryMemoryUsage")
@@ -50,23 +43,17 @@ var (
var slowQueries = metrics.NewCounter(`vm_slow_queries_total`)
func getDefaultMaxConcurrentRequests() int {
// A single request can saturate all the CPU cores, so there is no sense
// in allowing higher number of concurrent requests - they will just contend
// for unavailable CPU time.
n := min(cgroup.AvailableCPUs()*2, 16)
return n
}
// Init initializes vmselect
func Init() {
func Init(vmselectMaxConcurrentRequests int, vmselectMaxQueueDuration time.Duration) {
tmpDirPath := vmstorage.DataPath() + "/tmp"
fs.MustRemoveDirContents(tmpDirPath)
netstorage.InitTmpBlocksDir(tmpDirPath)
promql.InitRollupResultCache(vmstorage.DataPath() + "/cache/rollupResult")
prometheus.InitMaxUniqueTimeseries(*maxConcurrentRequests)
concurrencyLimitCh = make(chan struct{}, *maxConcurrentRequests)
maxConcurrentRequests = vmselectMaxConcurrentRequests
maxQueueDuration = vmselectMaxQueueDuration
concurrencyLimitCh = make(chan struct{}, maxConcurrentRequests)
initVMUIConfig()
initVMAlertProxy()
@@ -78,7 +65,11 @@ func Stop() {
promql.StopRollupResultCache()
}
var concurrencyLimitCh chan struct{}
var (
maxConcurrentRequests int
maxQueueDuration time.Duration
concurrencyLimitCh chan struct{}
)
var (
concurrencyLimitReached = metrics.NewCounter(`vm_concurrent_select_limit_reached_total`)
@@ -90,9 +81,6 @@ var (
_ = metrics.NewGauge(`vm_concurrent_select_current`, func() float64 {
return float64(len(concurrencyLimitCh))
})
_ = metrics.NewGauge(`vm_search_max_unique_timeseries`, func() float64 {
return float64(prometheus.GetMaxUniqueTimeSeries())
})
)
//go:embed vmui
@@ -131,12 +119,12 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
default:
// Sleep for a while until giving up. This should resolve short bursts in requests.
concurrencyLimitReached.Inc()
d := min(searchutil.GetMaxQueryDuration(r), *maxQueueDuration)
d := min(searchutil.GetMaxQueryDuration(r), maxQueueDuration)
t := timerpool.Get(d)
select {
case concurrencyLimitCh <- struct{}{}:
timerpool.Put(t)
qt.Printf("wait in queue because -search.maxConcurrentRequests=%d concurrent requests are executed", *maxConcurrentRequests)
qt.Printf("wait in queue because -search.maxConcurrentRequests=%d concurrent requests are executed", maxConcurrentRequests)
defer func() { <-concurrencyLimitCh }()
case <-r.Context().Done():
timerpool.Put(t)
@@ -152,7 +140,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
Err: fmt.Errorf("couldn't start executing the request in %.3f seconds, since -search.maxConcurrentRequests=%d concurrent requests "+
"are executed. Possible solutions: to reduce query load; to add more compute resources to the server; "+
"to increase -search.maxQueueDuration=%s; to increase -search.maxQueryDuration; to increase -search.maxConcurrentRequests",
d.Seconds(), *maxConcurrentRequests, maxQueueDuration),
d.Seconds(), maxConcurrentRequests, maxQueueDuration),
StatusCode: http.StatusTooManyRequests,
}
w.Header().Add("Retry-After", "10")

View File

@@ -27,10 +27,6 @@ import (
)
var (
maxTagKeysPerSearch = flag.Int("search.maxTagKeys", 100e3, "The maximum number of tag keys returned from /api/v1/labels . "+
"See also -search.maxLabelsAPISeries and -search.maxLabelsAPIDuration")
maxTagValuesPerSearch = flag.Int("search.maxTagValues", 100e3, "The maximum number of tag values returned from /api/v1/label/<label_name>/values . "+
"See also -search.maxLabelsAPISeries and -search.maxLabelsAPIDuration")
maxSamplesPerSeries = flag.Int("search.maxSamplesPerSeries", 30e6, "The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage")
maxSamplesPerQuery = flag.Int("search.maxSamplesPerQuery", 1e9, "The maximum number of raw samples a single query can process across all time series. "+
"This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries")
@@ -80,7 +76,7 @@ func (rss *Results) Cancel() {
}
func (rss *Results) mustClose() {
putStorageSearch(rss.sr)
vmstorage.PutSearch(rss.sr)
rss.sr = nil
putTmpBlocksFile(rss.tbf)
rss.tbf = nil
@@ -758,12 +754,7 @@ var sbhPool sync.Pool
func DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline searchutil.Deadline) (int, error) {
qt = qt.NewChild("delete series: %s", sq)
defer qt.Done()
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
if err != nil {
return 0, err
}
return vmstorage.DeleteSeries(qt, tfss, sq.MaxMetrics)
return vmstorage.VMSelectAPI.DeleteSeries(qt, sq, deadline.Deadline())
}
// LabelNames returns label names matching the given sq until the given deadline.
@@ -773,15 +764,7 @@ func LabelNames(qt *querytracer.Tracer, sq *storage.SearchQuery, maxLabelNames i
if deadline.Exceeded() {
return nil, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
if maxLabelNames > *maxTagKeysPerSearch || maxLabelNames <= 0 {
maxLabelNames = *maxTagKeysPerSearch
}
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
if err != nil {
return nil, err
}
labels, err := vmstorage.SearchLabelNames(qt, tfss, tr, maxLabelNames, sq.MaxMetrics, deadline.Deadline())
labels, err := vmstorage.VMSelectAPI.LabelNames(qt, sq, maxLabelNames, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("error during labels search on time range: %w", err)
}
@@ -841,15 +824,7 @@ func LabelValues(qt *querytracer.Tracer, labelName string, sq *storage.SearchQue
if deadline.Exceeded() {
return nil, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
if maxLabelValues > *maxTagValuesPerSearch || maxLabelValues <= 0 {
maxLabelValues = *maxTagValuesPerSearch
}
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
if err != nil {
return nil, err
}
labelValues, err := vmstorage.SearchLabelValues(qt, labelName, tfss, tr, maxLabelValues, sq.MaxMetrics, deadline.Deadline())
labelValues, err := vmstorage.VMSelectAPI.LabelValues(qt, sq, labelName, maxLabelValues, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("error during label values search on time range for labelName=%q: %w", labelName, err)
}
@@ -864,7 +839,10 @@ func GetMetricsMetadata(qt *querytracer.Tracer, limit int, metricName string) ([
qt = qt.NewChild("get metrics metadata: limit=%d, metric_name=%q", limit, metricName)
defer qt.Done()
metadata := vmstorage.Storage.GetMetadataRows(qt, limit, metricName)
metadata, err := vmstorage.VMSelectAPI.GetMetadataRecords(qt, nil, limit, metricName, 0)
if err != nil {
return nil, err
}
sort.Slice(metadata, func(i, j int) bool {
return string(metadata[i].MetricFamilyName) < string(metadata[j].MetricFamilyName)
@@ -912,16 +890,11 @@ func TagValueSuffixes(qt *querytracer.Tracer, tr storage.TimeRange, tagKey, tagV
if deadline.Exceeded() {
return nil, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
suffixes, err := vmstorage.SearchTagValueSuffixes(qt, tr, tagKey, tagValuePrefix, delimiter, maxSuffixes, deadline.Deadline())
suffixes, err := vmstorage.VMSelectAPI.TagValueSuffixes(qt, 0, 0, tr, tagKey, tagValuePrefix, delimiter, maxSuffixes, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("error during search for suffixes for tagKey=%q, tagValuePrefix=%q, delimiter=%c on time range %s: %w",
tagKey, tagValuePrefix, delimiter, tr.String(), err)
}
if len(suffixes) >= maxSuffixes {
return nil, fmt.Errorf("more than -search.maxTagValueSuffixesPerSearch=%d tag value suffixes found for tagKey=%q, tagValuePrefix=%q, delimiter=%c on time range %s; "+
"either narrow down the query or increase -search.maxTagValueSuffixesPerSearch command-line flag value",
maxSuffixes, tagKey, tagValuePrefix, delimiter, tr.String())
}
return suffixes, nil
}
@@ -934,13 +907,7 @@ func TSDBStatus(qt *querytracer.Tracer, sq *storage.SearchQuery, focusLabel stri
if deadline.Exceeded() {
return nil, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
if err != nil {
return nil, err
}
date := uint64(tr.MinTimestamp) / (3600 * 24 * 1000)
status, err := vmstorage.GetTSDBStatus(qt, tfss, date, focusLabel, topN, sq.MaxMetrics, deadline.Deadline())
status, err := vmstorage.VMSelectAPI.TSDBStatus(qt, sq, focusLabel, topN, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("error during tsdb status request: %w", err)
}
@@ -954,28 +921,13 @@ func SeriesCount(qt *querytracer.Tracer, deadline searchutil.Deadline) (uint64,
if deadline.Exceeded() {
return 0, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
n, err := vmstorage.GetSeriesCount(deadline.Deadline())
n, err := vmstorage.VMSelectAPI.SeriesCount(qt, 0, 0, deadline.Deadline())
if err != nil {
return 0, fmt.Errorf("error during series count request: %w", err)
}
return n, nil
}
func getStorageSearch() *storage.Search {
v := ssPool.Get()
if v == nil {
return &storage.Search{}
}
return v.(*storage.Search)
}
func putStorageSearch(sr *storage.Search) {
sr.MustClose()
ssPool.Put(sr)
}
var ssPool sync.Pool
// ExportBlocks searches for time series matching sq and calls f for each found block.
//
// f is called in parallel from multiple goroutines.
@@ -989,18 +941,13 @@ func ExportBlocks(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline sear
if deadline.Exceeded() {
return fmt.Errorf("timeout exceeded before starting data export: %s", deadline.String())
}
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
sr, _, err := vmstorage.GetSearch(qt, sq, deadline.Deadline())
if err != nil {
return err
}
vmstorage.WG.Add(1)
defer vmstorage.WG.Done()
sr := getStorageSearch()
defer putStorageSearch(sr)
sr.Init(qt, vmstorage.Storage, tfss, tr, sq.MaxMetrics, deadline.Deadline())
defer vmstorage.PutSearch(sr)
// Start workers that call f in parallel on available CPU cores.
workCh := make(chan *exportWork, gomaxprocs*8)
@@ -1093,14 +1040,7 @@ func SearchMetricNames(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline
return nil, fmt.Errorf("timeout exceeded before starting to search metric names: %s", deadline.String())
}
// Setup search.
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
if err != nil {
return nil, err
}
metricNames, err := vmstorage.SearchMetricNames(qt, tfss, tr, sq.MaxMetrics, deadline.Deadline())
metricNames, err := vmstorage.VMSelectAPI.SearchMetricNames(qt, sq, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("cannot find metric names: %w", err)
}
@@ -1119,18 +1059,11 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
return nil, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
// Setup search.
tr := sq.GetTimeRange()
tfss, err := setupTfss(qt, tr, sq.TagFilterss, sq.MaxMetrics, deadline)
sr, maxSeriesCount, err := vmstorage.GetSearch(qt, sq, deadline.Deadline())
if err != nil {
return nil, err
}
vmstorage.WG.Add(1)
defer vmstorage.WG.Done()
sr := getStorageSearch()
maxSeriesCount := sr.Init(qt, vmstorage.Storage, tfss, tr, sq.MaxMetrics, deadline.Deadline())
type blockRefs struct {
brs []blockRef
}
@@ -1168,7 +1101,7 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
blocksRead++
if deadline.Exceeded() {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
vmstorage.PutSearch(sr)
return nil, fmt.Errorf("timeout exceeded while fetching data block #%d from storage: %s", blocksRead, deadline.String())
}
br := sr.MetricBlockRef.BlockRef
@@ -1180,7 +1113,7 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
samples += br.RowsCount()
if *maxSamplesPerQuery > 0 && samples > *maxSamplesPerQuery {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
vmstorage.PutSearch(sr)
return nil, fmt.Errorf("cannot select more than -search.maxSamplesPerQuery=%d samples; possible solutions: increase the -search.maxSamplesPerQuery; "+
"reduce time range for the query; use more specific label filters in order to select fewer series", *maxSamplesPerQuery)
}
@@ -1189,7 +1122,7 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
addr, err := tbf.WriteBlockRefData(buf)
if err != nil {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
vmstorage.PutSearch(sr)
return nil, fmt.Errorf("cannot write %d bytes to temporary file: %w", len(buf), err)
}
@@ -1247,7 +1180,7 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
if err := sr.Error(); err != nil {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
vmstorage.PutSearch(sr)
if errors.Is(err, storage.ErrDeadlineExceeded) {
return nil, fmt.Errorf("timeout exceeded during the query: %s", deadline.String())
}
@@ -1255,13 +1188,13 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
}
if err := tbf.Finalize(); err != nil {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
vmstorage.PutSearch(sr)
return nil, fmt.Errorf("cannot finalize temporary file: %w", err)
}
qt.Printf("fetch unique series=%d, blocks=%d, samples=%d, bytes=%d", len(m), blocksRead, samples, tbf.Len())
var rss Results
rss.tr = tr
rss.tr = sq.GetTimeRange()
rss.deadline = deadline
pts := make([]packedTimeseries, len(orderedMetricNames))
for i, metricName := range orderedMetricNames {
@@ -1302,35 +1235,6 @@ func getBlockRefsEnd(a []blockRef) uintptr {
return uintptr(unsafe.Pointer(unsafe.SliceData(a))) + uintptr(len(a))*unsafe.Sizeof(blockRef{})
}
func setupTfss(qt *querytracer.Tracer, tr storage.TimeRange, tagFilterss [][]storage.TagFilter, maxMetrics int, deadline searchutil.Deadline) ([]*storage.TagFilters, error) {
tfss := make([]*storage.TagFilters, 0, len(tagFilterss))
for _, tagFilters := range tagFilterss {
tfs := storage.NewTagFilters()
for i := range tagFilters {
tf := &tagFilters[i]
if string(tf.Key) == "__graphite__" {
query := tf.Value
paths, err := vmstorage.SearchGraphitePaths(qt, tr, query, maxMetrics, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("error when searching for Graphite paths for query %q: %w", query, err)
}
if len(paths) >= maxMetrics {
return nil, fmt.Errorf("more than %d time series match Graphite query %q; "+
"either narrow down the query or increase the corresponding -search.max* command-line flag value; "+
"see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#resource-usage-limits", maxMetrics, query)
}
tfs.AddGraphiteQuery(query, paths, tf.IsNegative)
continue
}
if err := tfs.Add(tf.Key, tf.Value, tf.IsNegative, tf.IsRegexp); err != nil {
return nil, fmt.Errorf("cannot parse tag filter %s: %w", tf, err)
}
}
tfss = append(tfss, tfs)
}
return tfss, nil
}
func applyGraphiteRegexpFilter(filter string, ss []string) ([]string, error) {
// Anchor filter regexp to the beginning of the string as Graphite does.
// See https://github.com/graphite-project/graphite-web/blob/3ad279df5cb90b211953e39161df416e54a84948/webapp/graphite/tags/localdatabase.py#L157
@@ -1357,13 +1261,12 @@ const maxFastAllocBlockSize = 32 * 1024
func GetMetricNamesStats(qt *querytracer.Tracer, limit, le int, matchPattern string) (metricnamestats.StatsResult, error) {
qt = qt.NewChild("get metric names usage statistics with limit: %d, less or equal to: %d, match pattern=%q", limit, le, matchPattern)
defer qt.Done()
return vmstorage.GetMetricNamesStats(qt, limit, le, matchPattern)
return vmstorage.VMSelectAPI.GetMetricNamesUsageStats(qt, nil, limit, le, matchPattern, 0)
}
// ResetMetricNamesStats resets state of metric names usage
func ResetMetricNamesStats(qt *querytracer.Tracer) error {
qt = qt.NewChild("reset metric names usage stats")
defer qt.Done()
vmstorage.ResetMetricNamesStats(qt)
return nil
return vmstorage.VMSelectAPI.ResetMetricNamesUsageStats(qt, 0)
}

View File

@@ -2,13 +2,16 @@
"math"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
) %}
{% stripspace %}
// Federate writes rs in /federate format.
// See https://prometheus.io/docs/prometheus/latest/federation/
{% func Federate(rs *netstorage.Result) %}
{% func Federate(rs *netstorage.Result, escapeScheme string) %}
{% code
values := rs.Values
timestamps := rs.Timestamps
@@ -24,10 +27,54 @@
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3185
{% endcomment %}
{% return %}
{% endif %}
{%= prometheusMetricName(&rs.MetricName) %}{% space %}
{% endif %}
{% switch escapeScheme %}
{% case federateEscapeSchemeUTF8 %}
{%= prometheusFederateMetricNameUTF8(&rs.MetricName) %}{% space %}
{% case federateEscapeSchemeUnderscore %}
{%= prometheusFederateMetricNameEscapeUnderscore(&rs.MetricName) %}{% space %}
{% case "" %}
{%= prometheusMetricName(&rs.MetricName) %}{% space %}
{% endswitch %}
{%f= lastValue %}{% space %}
{%dl= timestamps[len(timestamps)-1] %}{% newline %}
{% endfunc %}
{% func prometheusFederateMetricNameEscapeUnderscore(mn *storage.MetricName) %}
{%s= promrelabel.SanitizeMetricName(bytesutil.ToUnsafeString(mn.MetricGroup)) %}
{% if len(mn.Tags) > 0 %}
{
{% code tags := mn.Tags %}
{%s= promrelabel.SanitizeLabelName(bytesutil.ToUnsafeString(tags[0].Key)) %}={%= escapePrometheusLabel(tags[0].Value) %}
{% code tags = tags[1:] %}
{% for i := range tags %}
{% code tag := &tags[i] %}
,{%s= promrelabel.SanitizeLabelName(bytesutil.ToUnsafeString(tag.Key)) %}={%= escapePrometheusLabel(tag.Value) %}
{% endfor %}
}
{% endif %}
{% endfunc %}
{% func prometheusFederateMetricNameUTF8(mn *storage.MetricName) %}
{
{%= escapePrometheusLabel(mn.MetricGroup) %}
{% if len(mn.Tags) > 0 %}
,
{% code tags := mn.Tags %}
{%= escapePrometheusLabel(tags[0].Key) %}={%= escapePrometheusLabel(tags[0].Value) %}
{% code tags = tags[1:] %}
{% for i := range tags %}
{% code tag := &tags[i] %}
,{%= escapePrometheusLabel(tag.Key) %}={%= escapePrometheusLabel(tag.Value) %}
{% endfor %}
{% endif %}
}
{% endfunc %}
{% endstripspace %}

View File

@@ -9,82 +9,241 @@ import (
"math"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
)
// Federate writes rs in /federate format.// See https://prometheus.io/docs/prometheus/latest/federation/
//line app/vmselect/prometheus/federate.qtpl:11
//line app/vmselect/prometheus/federate.qtpl:14
import (
qtio422016 "io"
qt422016 "github.com/valyala/quicktemplate"
)
//line app/vmselect/prometheus/federate.qtpl:11
//line app/vmselect/prometheus/federate.qtpl:14
var (
_ = qtio422016.Copy
_ = qt422016.AcquireByteBuffer
)
//line app/vmselect/prometheus/federate.qtpl:11
func StreamFederate(qw422016 *qt422016.Writer, rs *netstorage.Result) {
//line app/vmselect/prometheus/federate.qtpl:13
//line app/vmselect/prometheus/federate.qtpl:14
func StreamFederate(qw422016 *qt422016.Writer, rs *netstorage.Result, escapeScheme string) {
//line app/vmselect/prometheus/federate.qtpl:16
values := rs.Values
timestamps := rs.Timestamps
//line app/vmselect/prometheus/federate.qtpl:16
//line app/vmselect/prometheus/federate.qtpl:19
if len(timestamps) == 0 || len(values) == 0 {
//line app/vmselect/prometheus/federate.qtpl:16
//line app/vmselect/prometheus/federate.qtpl:19
return
//line app/vmselect/prometheus/federate.qtpl:16
//line app/vmselect/prometheus/federate.qtpl:19
}
//line app/vmselect/prometheus/federate.qtpl:18
//line app/vmselect/prometheus/federate.qtpl:21
lastValue := values[len(values)-1]
//line app/vmselect/prometheus/federate.qtpl:20
//line app/vmselect/prometheus/federate.qtpl:23
if math.IsNaN(lastValue) {
//line app/vmselect/prometheus/federate.qtpl:26
//line app/vmselect/prometheus/federate.qtpl:29
return
//line app/vmselect/prometheus/federate.qtpl:27
//line app/vmselect/prometheus/federate.qtpl:30
}
//line app/vmselect/prometheus/federate.qtpl:28
streamprometheusMetricName(qw422016, &rs.MetricName)
//line app/vmselect/prometheus/federate.qtpl:28
qw422016.N().S(` `)
//line app/vmselect/prometheus/federate.qtpl:29
//line app/vmselect/prometheus/federate.qtpl:32
switch escapeScheme {
//line app/vmselect/prometheus/federate.qtpl:33
case federateEscapeSchemeUTF8:
//line app/vmselect/prometheus/federate.qtpl:34
streamprometheusFederateMetricNameUTF8(qw422016, &rs.MetricName)
//line app/vmselect/prometheus/federate.qtpl:34
qw422016.N().S(` `)
//line app/vmselect/prometheus/federate.qtpl:36
case federateEscapeSchemeUnderscore:
//line app/vmselect/prometheus/federate.qtpl:37
streamprometheusFederateMetricNameEscapeUnderscore(qw422016, &rs.MetricName)
//line app/vmselect/prometheus/federate.qtpl:37
qw422016.N().S(` `)
//line app/vmselect/prometheus/federate.qtpl:39
case "":
//line app/vmselect/prometheus/federate.qtpl:40
streamprometheusMetricName(qw422016, &rs.MetricName)
//line app/vmselect/prometheus/federate.qtpl:40
qw422016.N().S(` `)
//line app/vmselect/prometheus/federate.qtpl:41
}
//line app/vmselect/prometheus/federate.qtpl:43
qw422016.N().F(lastValue)
//line app/vmselect/prometheus/federate.qtpl:29
//line app/vmselect/prometheus/federate.qtpl:43
qw422016.N().S(` `)
//line app/vmselect/prometheus/federate.qtpl:30
//line app/vmselect/prometheus/federate.qtpl:44
qw422016.N().DL(timestamps[len(timestamps)-1])
//line app/vmselect/prometheus/federate.qtpl:30
//line app/vmselect/prometheus/federate.qtpl:44
qw422016.N().S(`
`)
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
}
//line app/vmselect/prometheus/federate.qtpl:31
func WriteFederate(qq422016 qtio422016.Writer, rs *netstorage.Result) {
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
func WriteFederate(qq422016 qtio422016.Writer, rs *netstorage.Result, escapeScheme string) {
//line app/vmselect/prometheus/federate.qtpl:45
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/federate.qtpl:31
StreamFederate(qw422016, rs)
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
StreamFederate(qw422016, rs, escapeScheme)
//line app/vmselect/prometheus/federate.qtpl:45
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
}
//line app/vmselect/prometheus/federate.qtpl:31
func Federate(rs *netstorage.Result) string {
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
func Federate(rs *netstorage.Result, escapeScheme string) string {
//line app/vmselect/prometheus/federate.qtpl:45
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/federate.qtpl:31
WriteFederate(qb422016, rs)
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
WriteFederate(qb422016, rs, escapeScheme)
//line app/vmselect/prometheus/federate.qtpl:45
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
return qs422016
//line app/vmselect/prometheus/federate.qtpl:31
//line app/vmselect/prometheus/federate.qtpl:45
}
//line app/vmselect/prometheus/federate.qtpl:47
func streamprometheusFederateMetricNameEscapeUnderscore(qw422016 *qt422016.Writer, mn *storage.MetricName) {
//line app/vmselect/prometheus/federate.qtpl:48
qw422016.N().S(promrelabel.SanitizeMetricName(bytesutil.ToUnsafeString(mn.MetricGroup)))
//line app/vmselect/prometheus/federate.qtpl:49
if len(mn.Tags) > 0 {
//line app/vmselect/prometheus/federate.qtpl:49
qw422016.N().S(`{`)
//line app/vmselect/prometheus/federate.qtpl:51
tags := mn.Tags
//line app/vmselect/prometheus/federate.qtpl:52
qw422016.N().S(promrelabel.SanitizeLabelName(bytesutil.ToUnsafeString(tags[0].Key)))
//line app/vmselect/prometheus/federate.qtpl:52
qw422016.N().S(`=`)
//line app/vmselect/prometheus/federate.qtpl:52
streamescapePrometheusLabel(qw422016, tags[0].Value)
//line app/vmselect/prometheus/federate.qtpl:53
tags = tags[1:]
//line app/vmselect/prometheus/federate.qtpl:54
for i := range tags {
//line app/vmselect/prometheus/federate.qtpl:55
tag := &tags[i]
//line app/vmselect/prometheus/federate.qtpl:55
qw422016.N().S(`,`)
//line app/vmselect/prometheus/federate.qtpl:56
qw422016.N().S(promrelabel.SanitizeLabelName(bytesutil.ToUnsafeString(tag.Key)))
//line app/vmselect/prometheus/federate.qtpl:56
qw422016.N().S(`=`)
//line app/vmselect/prometheus/federate.qtpl:56
streamescapePrometheusLabel(qw422016, tag.Value)
//line app/vmselect/prometheus/federate.qtpl:57
}
//line app/vmselect/prometheus/federate.qtpl:57
qw422016.N().S(`}`)
//line app/vmselect/prometheus/federate.qtpl:59
}
//line app/vmselect/prometheus/federate.qtpl:60
}
//line app/vmselect/prometheus/federate.qtpl:60
func writeprometheusFederateMetricNameEscapeUnderscore(qq422016 qtio422016.Writer, mn *storage.MetricName) {
//line app/vmselect/prometheus/federate.qtpl:60
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/federate.qtpl:60
streamprometheusFederateMetricNameEscapeUnderscore(qw422016, mn)
//line app/vmselect/prometheus/federate.qtpl:60
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/federate.qtpl:60
}
//line app/vmselect/prometheus/federate.qtpl:60
func prometheusFederateMetricNameEscapeUnderscore(mn *storage.MetricName) string {
//line app/vmselect/prometheus/federate.qtpl:60
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/federate.qtpl:60
writeprometheusFederateMetricNameEscapeUnderscore(qb422016, mn)
//line app/vmselect/prometheus/federate.qtpl:60
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/federate.qtpl:60
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/federate.qtpl:60
return qs422016
//line app/vmselect/prometheus/federate.qtpl:60
}
//line app/vmselect/prometheus/federate.qtpl:62
func streamprometheusFederateMetricNameUTF8(qw422016 *qt422016.Writer, mn *storage.MetricName) {
//line app/vmselect/prometheus/federate.qtpl:62
qw422016.N().S(`{`)
//line app/vmselect/prometheus/federate.qtpl:64
streamescapePrometheusLabel(qw422016, mn.MetricGroup)
//line app/vmselect/prometheus/federate.qtpl:65
if len(mn.Tags) > 0 {
//line app/vmselect/prometheus/federate.qtpl:65
qw422016.N().S(`,`)
//line app/vmselect/prometheus/federate.qtpl:67
tags := mn.Tags
//line app/vmselect/prometheus/federate.qtpl:68
streamescapePrometheusLabel(qw422016, tags[0].Key)
//line app/vmselect/prometheus/federate.qtpl:68
qw422016.N().S(`=`)
//line app/vmselect/prometheus/federate.qtpl:68
streamescapePrometheusLabel(qw422016, tags[0].Value)
//line app/vmselect/prometheus/federate.qtpl:69
tags = tags[1:]
//line app/vmselect/prometheus/federate.qtpl:70
for i := range tags {
//line app/vmselect/prometheus/federate.qtpl:71
tag := &tags[i]
//line app/vmselect/prometheus/federate.qtpl:71
qw422016.N().S(`,`)
//line app/vmselect/prometheus/federate.qtpl:72
streamescapePrometheusLabel(qw422016, tag.Key)
//line app/vmselect/prometheus/federate.qtpl:72
qw422016.N().S(`=`)
//line app/vmselect/prometheus/federate.qtpl:72
streamescapePrometheusLabel(qw422016, tag.Value)
//line app/vmselect/prometheus/federate.qtpl:73
}
//line app/vmselect/prometheus/federate.qtpl:74
}
//line app/vmselect/prometheus/federate.qtpl:74
qw422016.N().S(`}`)
//line app/vmselect/prometheus/federate.qtpl:76
}
//line app/vmselect/prometheus/federate.qtpl:76
func writeprometheusFederateMetricNameUTF8(qq422016 qtio422016.Writer, mn *storage.MetricName) {
//line app/vmselect/prometheus/federate.qtpl:76
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/federate.qtpl:76
streamprometheusFederateMetricNameUTF8(qw422016, mn)
//line app/vmselect/prometheus/federate.qtpl:76
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/federate.qtpl:76
}
//line app/vmselect/prometheus/federate.qtpl:76
func prometheusFederateMetricNameUTF8(mn *storage.MetricName) string {
//line app/vmselect/prometheus/federate.qtpl:76
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/federate.qtpl:76
writeprometheusFederateMetricNameUTF8(qb422016, mn)
//line app/vmselect/prometheus/federate.qtpl:76
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/federate.qtpl:76
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/federate.qtpl:76
return qs422016
//line app/vmselect/prometheus/federate.qtpl:76
}

View File

@@ -8,15 +8,15 @@ import (
)
func TestFederate(t *testing.T) {
f := func(rs *netstorage.Result, expectedResult string) {
f := func(rs *netstorage.Result, escapeScheme string, expectedResult string) {
t.Helper()
result := Federate(rs)
result := Federate(rs, escapeScheme)
if result != expectedResult {
t.Fatalf("unexpected result; got\n%s\nwant\n%s", result, expectedResult)
}
}
f(&netstorage.Result{}, ``)
f(&netstorage.Result{}, ``, ``)
f(&netstorage.Result{
MetricName: storage.MetricName{
@@ -39,5 +39,60 @@ func TestFederate(t *testing.T) {
},
Values: []float64{1.23},
Timestamps: []int64{123},
}, `foo{a="b",qqq="\\",abc="a<b\"\\c"} 1.23 123`+"\n")
}, ``, `foo{a="b",qqq="\\",abc="a<b\"\\c"} 1.23 123`+"\n")
f(&netstorage.Result{
MetricName: storage.MetricName{
MetricGroup: []byte("foo.bar"),
Tags: []storage.Tag{
{
Key: []byte("some.!other"),
Value: []byte("value.unchanged!."),
},
{
Key: []byte("qqq"),
Value: []byte("\\"),
},
{
Key: []byte("!key"),
Value: []byte("value"),
},
{
Key: []byte("abc"),
// Verify that < isn't encoded. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5431
Value: []byte("a<b\"\\c"),
},
},
},
Values: []float64{1.23},
Timestamps: []int64{123},
}, federateEscapeSchemeUnderscore, `foo_bar{some__other="value.unchanged!.",qqq="\\",_key="value",abc="a<b\"\\c"} 1.23 123`+"\n")
f(&netstorage.Result{
MetricName: storage.MetricName{
MetricGroup: []byte("foo.bar"),
Tags: []storage.Tag{
{
Key: []byte("some.!other"),
Value: []byte("value.unchanged!."),
},
{
Key: []byte("qqq"),
Value: []byte("\\"),
},
{
Key: []byte("!key"),
Value: []byte("value"),
},
{
Key: []byte(`ab"c`),
// Verify that < isn't encoded. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5431
Value: []byte("a<b\"\\c"),
},
},
},
Values: []float64{1.23},
Timestamps: []int64{123},
}, federateEscapeSchemeUTF8, `{"foo.bar","some.!other"="value.unchanged!.","qqq"="\\","!key"="value","ab\"c"="a<b\"\\c"} 1.23 123`+"\n")
}

View File

@@ -9,16 +9,17 @@ import (
)
func BenchmarkFederate(b *testing.B) {
rs := &netstorage.Result{
MetricName: storage.MetricName{
MetricGroup: []byte("foo_bar_bazaaaa_total"),
MetricGroup: []byte("foo_bar_?_._bazaaaa_total"),
Tags: []storage.Tag{
{
Key: []byte("instance"),
Key: []byte("instance:job"),
Value: []byte("foobarbaz:2344"),
},
{
Key: []byte("job"),
Key: []byte("job.name"),
Value: []byte("aaabbbccc"),
},
},
@@ -27,12 +28,22 @@ func BenchmarkFederate(b *testing.B) {
Timestamps: []int64{1234567890},
}
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
var bb bytes.Buffer
for pb.Next() {
bb.Reset()
WriteFederate(&bb, rs)
}
})
f := func(name, escapeScheme string) {
b.Helper()
b.Run(name, func(b *testing.B) {
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
var bb bytes.Buffer
for pb.Next() {
bb.Reset()
WriteFederate(&bb, rs, escapeScheme)
}
})
})
}
f("without escape", "")
f("allow-utf-8", federateEscapeSchemeUTF8)
f("legacy-underscore", federateEscapeSchemeUnderscore)
}

View File

@@ -28,8 +28,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
@@ -50,9 +48,6 @@ var (
"If set to true, the query model becomes closer to InfluxDB data model. If set to true, then -search.maxLookback and -search.maxStalenessInterval are ignored")
maxStepForPointsAdjustment = flag.Duration("search.maxStepForPointsAdjustment", time.Minute, "The maximum step when /api/v1/query_range handler adjusts "+
"points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data")
maxUniqueTimeseries = flag.Int("search.maxUniqueTimeseries", 0, "The maximum number of unique time series, which can be selected during /api/v1/query and /api/v1/query_range queries. This option allows limiting memory usage. "+
"When set to zero, the limit is automatically calculated based on -search.maxConcurrentRequests (inversely proportional) and memory available to the process (proportional).")
maxFederateSeries = flag.Int("search.maxFederateSeries", 1e6, "The maximum number of time series, which can be returned from /federate. This option allows limiting memory usage")
maxExportSeries = flag.Int("search.maxExportSeries", 10e6, "The maximum number of time series, which can be returned from /api/v1/export* APIs. This option allows limiting memory usage")
maxTSDBStatusSeries = flag.Int("search.maxTSDBStatusSeries", 10e6, "The maximum number of time series, which can be processed during the call to /api/v1/status/tsdb. This option allows limiting memory usage")
@@ -108,6 +103,11 @@ func PrettifyQuery(w http.ResponseWriter, r *http.Request) {
_ = bw.Flush()
}
const (
federateEscapeSchemeUnderscore = "underscore"
federateEscapeSchemeUTF8 = "utf-8"
)
// FederateHandler implements /federate . See https://prometheus.io/docs/prometheus/latest/federation/
func FederateHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) error {
defer federateDuration.UpdateDuration(startTime)
@@ -132,6 +132,21 @@ func FederateHandler(startTime time.Time, w http.ResponseWriter, r *http.Request
return fmt.Errorf("cannot fetch data for %q: %w", sq, err)
}
// add best-effort format negotiation
// modern version of Prometheus always set allow-utf-8 in order to properly parse utf-8 names and labels
// prometheus below v3 uses underscore escaping by default and it's the most common standard
var escapeScheme string
accept := r.Header.Get("Accept")
if len(accept) > 0 && strings.Contains(accept, "allow-utf-8") {
escapeScheme = federateEscapeSchemeUTF8
}
// try fallback to legacy underscore escaping if needed for Prometheus only,
// it's not widely used after Prometheus v3.0 release
// most of the Prometheus scrapers already use allow-utf-8 header
isPrometheus := strings.HasPrefix(r.UserAgent(), "Prometheus")
if len(escapeScheme) == 0 && isPrometheus {
escapeScheme = federateEscapeSchemeUnderscore
}
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
bw := bufferedwriter.Get(w)
defer bufferedwriter.Put(bw)
@@ -141,7 +156,7 @@ func FederateHandler(startTime time.Time, w http.ResponseWriter, r *http.Request
return err
}
bb := sw.getBuffer(workerID)
WriteFederate(bb, rs)
WriteFederate(bb, rs, escapeScheme)
return sw.maybeFlushBuffer(bb)
})
if err == nil {
@@ -853,7 +868,7 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
End: start,
Step: step,
MaxPointsPerSeries: *maxPointsPerTimeseries,
MaxSeries: GetMaxUniqueTimeSeries(),
MaxSeries: 0, // let vmstorage use maxUniqueTimeseries by default
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
MayCache: mayCache,
@@ -964,7 +979,7 @@ func queryRangeHandler(qt *querytracer.Tracer, startTime time.Time, w http.Respo
End: end,
Step: step,
MaxPointsPerSeries: *maxPointsPerTimeseries,
MaxSeries: GetMaxUniqueTimeSeries(),
MaxSeries: 0, // let vmstorage use maxUniqueTimeseries by default
QuotedRemoteAddr: httpserver.GetQuotedRemoteAddr(r),
Deadline: deadline,
MayCache: mayCache,
@@ -1300,43 +1315,6 @@ func (sw *scalableWriter) flush() error {
return sw.bw.Flush()
}
var (
maxUniqueTimeseriesValueOnce sync.Once
maxUniqueTimeseriesValue int
)
// InitMaxUniqueTimeseries init the max metrics limit calculated by available resources.
// The calculation is split into calculateMaxUniqueTimeSeriesForResource for unit testing.
func InitMaxUniqueTimeseries(maxConcurrentRequests int) {
maxUniqueTimeseriesValueOnce.Do(func() {
maxUniqueTimeseriesValue = *maxUniqueTimeseries
if maxUniqueTimeseriesValue <= 0 {
maxUniqueTimeseriesValue = calculateMaxUniqueTimeSeriesForResource(maxConcurrentRequests, memory.Remaining())
}
})
}
// calculateMaxUniqueTimeSeriesForResource calculate the max metrics limit calculated by available resources.
func calculateMaxUniqueTimeSeriesForResource(maxConcurrentRequests, remainingMemory int) int {
if maxConcurrentRequests <= 0 {
// This line should NOT be reached unless the user has set an incorrect `search.maxConcurrentRequests`.
// In such cases, fallback to unlimited.
logger.Warnf("limiting -search.maxUniqueTimeseries to %v because -search.maxConcurrentRequests=%d.", 2e9, maxConcurrentRequests)
return 2e9
}
// Calculate the max metrics limit for a single request in the worst-case concurrent scenario.
// The approximate size of 1 unique series that could occupy in the vmstorage is 200 bytes.
mts := remainingMemory / 200 / maxConcurrentRequests
logger.Infof("limiting -search.maxUniqueTimeseries to %d according to -search.maxConcurrentRequests=%d and remaining memory=%d bytes. To increase the limit, reduce -search.maxConcurrentRequests or increase memory available to the process.", mts, maxConcurrentRequests, remainingMemory)
return mts
}
// GetMaxUniqueTimeSeries returns the max metrics limit calculated by available resources.
func GetMaxUniqueTimeSeries() int {
return maxUniqueTimeseriesValue
}
// copied from https://github.com/prometheus/common/blob/adea6285c1c7447fcb7bfdeb6abfc6eff893e0a7/model/metric.go#L483
// it's not possible to use direct import due to increased binary size
func unescapePrometheusLabelName(name string) string {

View File

@@ -4,7 +4,6 @@ import (
"math"
"net/http"
"reflect"
"runtime"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
@@ -230,29 +229,3 @@ func TestGetLatencyOffsetMillisecondsFailure(t *testing.T) {
}
f("http://localhost?latency_offset=foobar")
}
func TestCalculateMaxMetricsLimitByResource(t *testing.T) {
f := func(maxConcurrentRequest, remainingMemory, expect int) {
t.Helper()
maxMetricsLimit := calculateMaxUniqueTimeSeriesForResource(maxConcurrentRequest, remainingMemory)
if maxMetricsLimit != expect {
t.Fatalf("unexpected max metrics limit: got %d, want %d", maxMetricsLimit, expect)
}
}
// Skip when GOARCH=386
if runtime.GOARCH != "386" {
// 8 CPU & 32 GiB
f(16, int(math.Round(32*1024*1024*1024*0.4)), 4294967)
// 4 CPU & 32 GiB
f(8, int(math.Round(32*1024*1024*1024*0.4)), 8589934)
}
// 2 CPU & 4 GiB
f(4, int(math.Round(4*1024*1024*1024*0.4)), 2147483)
// other edge cases
f(0, int(math.Round(4*1024*1024*1024*0.4)), 2e9)
f(4, 0, 0)
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -37,11 +37,11 @@
<meta property="og:title" content="UI for VictoriaMetrics">
<meta property="og:url" content="https://victoriametrics.com/">
<meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data">
<script type="module" crossorigin src="./assets/index-BjJ7fDL7.js"></script>
<script type="module" crossorigin src="./assets/index-CoGukb-x.js"></script>
<link rel="modulepreload" crossorigin href="./assets/rolldown-runtime-COnpUsM8.js">
<link rel="modulepreload" crossorigin href="./assets/vendor-C8Kwp93_.js">
<link rel="stylesheet" crossorigin href="./assets/vendor-CnsZ1jie.css">
<link rel="stylesheet" crossorigin href="./assets/index-BL7jEFBa.css">
<link rel="stylesheet" crossorigin href="./assets/index-BBUnmLOr.css">
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>

View File

@@ -1,7 +1,6 @@
package vmstorage
import (
"errors"
"flag"
"fmt"
"io"
@@ -9,12 +8,10 @@ import (
"net/http"
"strconv"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
@@ -23,11 +20,9 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricnamestats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/stringsutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/syncwg"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vminsertapi"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmselectapi"
)
var (
@@ -39,11 +34,8 @@ var (
snapshotAuthKey = flagutil.NewPassword("snapshotAuthKey", "authKey, which must be passed in query string to /snapshot* pages. It overrides -httpAuth.*")
forceMergeAuthKey = flagutil.NewPassword("forceMergeAuthKey", "authKey, which must be passed in query string to /internal/force_merge pages. It overrides -httpAuth.*")
forceFlushAuthKey = flagutil.NewPassword("forceFlushAuthKey", "authKey, which must be passed in query string to /internal/force_flush pages. It overrides -httpAuth.*")
snapshotsMaxAge = flagutil.NewRetentionDuration("snapshotsMaxAge", "3d", "Automatically delete snapshots older than -snapshotsMaxAge if it is set to non-zero duration. Make sure that backup process has enough time to finish the backup before the corresponding snapshot is automatically deleted")
_ = flag.Duration("snapshotCreateTimeout", 0, "Deprecated: this flag does nothing")
precisionBits = flag.Int("precisionBits", 64, "The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss")
_ = flag.Duration("finalMergeDelay", 0, "Deprecated: this flag does nothing")
_ = flag.Int("bigMergeConcurrency", 0, "Deprecated: this flag does nothing")
_ = flag.Int("smallMergeConcurrency", 0, "Deprecated: this flag does nothing")
@@ -117,11 +109,7 @@ func DataPath() string {
}
// Init initializes vmstorage.
func Init(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
if err := encoding.CheckPrecisionBits(uint8(*precisionBits)); err != nil {
logger.Fatalf("invalid `-precisionBits`: %s", err)
}
func Init(vmselectMaxConcurrentRequests int, resetCacheIfNeeded func(mrs []storage.MetricRow)) {
storage.SetDedupInterval(*minScrapeInterval)
storage.SetDataFlushInterval(*inmemoryDataFlushInterval)
storage.LegacySetRetentionTimezoneOffset(*retentionTimezoneOffset)
@@ -165,7 +153,7 @@ func Init(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
LogNewSeries: *logNewSeries,
}
strg := storage.MustOpenStorage(*storageDataPath, opts)
initStaleSnapshotsRemover(strg)
vmStorage = newVMStorageSingleNode(strg, vmselectMaxConcurrentRequests, resetCacheIfNeeded)
var m storage.Metrics
strg.UpdateMetrics(&m)
@@ -179,151 +167,32 @@ func Init(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
// register storage metrics
storageMetrics = metrics.NewSet()
storageMetrics.RegisterMetricsWriter(func(w io.Writer) {
writeStorageMetrics(w, strg)
})
storageMetrics.RegisterMetricsWriter(vmStorage.writeStorageMetrics)
metrics.RegisterSet(storageMetrics)
WG = syncwg.WaitGroup{}
resetResponseCacheIfNeeded = resetCacheIfNeeded
Storage = strg
VMInsertAPI = vmStorage
VMSelectAPI = vmStorage
GetSearch = vmStorage.GetSearch
PutSearch = vmStorage.PutSearch
RequestHandler = vmStorage.requestHandler
DebugFlush = vmStorage.vms.s.DebugFlush
}
var storageMetrics *metrics.Set
// Storage is a storage.
//
// Every storage call must be wrapped into WG.Add(1) ... WG.Done()
// for proper graceful shutdown when Stop is called.
var Storage *storage.Storage
var (
// vmStorageSingleNode is an instance of vmstorage used by vminsert and
// vmselect for writing and reading data.
vmStorage *VMStorageSingleNode
VMInsertAPI vminsertapi.API
VMSelectAPI vmselectapi.API
GetSearch func(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (*storage.Search, int, error)
PutSearch func(sr *storage.Search)
RequestHandler func(w http.ResponseWriter, r *http.Request) bool
// WG must be incremented before Storage call.
//
// Use syncwg instead of sync, since Add is called from concurrent goroutines.
var WG syncwg.WaitGroup
// resetResponseCacheIfNeeded is a callback for automatic resetting of response cache if needed.
var resetResponseCacheIfNeeded func(mrs []storage.MetricRow)
// AddRows adds mrs to the storage.
//
// The caller should limit the number of concurrent calls to AddRows() in order to limit memory usage.
func AddRows(mrs []storage.MetricRow) error {
if Storage.IsReadOnly() {
return errReadOnly
}
resetResponseCacheIfNeeded(mrs)
WG.Add(1)
Storage.AddRows(mrs, uint8(*precisionBits))
WG.Done()
return nil
}
// AddMetadataRows adds mrs to the storage.
//
// The caller should limit the number of concurrent calls to AddMetadataRows() in order to limit memory usage.
func AddMetadataRows(mms []metricsmetadata.Row) error {
if Storage.IsReadOnly() {
return errReadOnly
}
WG.Add(1)
Storage.AddMetadataRows(mms)
WG.Done()
return nil
}
var errReadOnly = errors.New("the storage is in read-only mode; check -storage.minFreeDiskSpaceBytes command-line flag value")
// RegisterMetricNames registers all the metrics from mrs in the storage.
func RegisterMetricNames(qt *querytracer.Tracer, mrs []storage.MetricRow) {
WG.Add(1)
Storage.RegisterMetricNames(qt, mrs)
WG.Done()
}
// DeleteSeries deletes series matching tfss.
//
// Returns the number of deleted series.
func DeleteSeries(qt *querytracer.Tracer, tfss []*storage.TagFilters, maxMetrics int) (int, error) {
WG.Add(1)
n, err := Storage.DeleteSeries(qt, tfss, maxMetrics)
WG.Done()
return n, err
}
// GetMetricNamesStats returns metric names usage stats with give limit and lte predicate
func GetMetricNamesStats(qt *querytracer.Tracer, limit, le int, matchPattern string) (metricnamestats.StatsResult, error) {
WG.Add(1)
r := Storage.GetMetricNamesStats(qt, limit, le, matchPattern)
WG.Done()
return r, nil
}
// ResetMetricNamesStats resets state for metric names usage tracker
func ResetMetricNamesStats(qt *querytracer.Tracer) {
WG.Add(1)
Storage.ResetMetricNamesStats(qt)
WG.Done()
}
// SearchMetricNames returns metric names for the given tfss on the given tr.
func SearchMetricNames(qt *querytracer.Tracer, tfss []*storage.TagFilters, tr storage.TimeRange, maxMetrics int, deadline uint64) ([]string, error) {
WG.Add(1)
metricNames, err := Storage.SearchMetricNames(qt, tfss, tr, maxMetrics, deadline)
WG.Done()
return metricNames, err
}
// SearchLabelNames searches for tag keys matching the given tfss on tr.
func SearchLabelNames(qt *querytracer.Tracer, tfss []*storage.TagFilters, tr storage.TimeRange, maxTagKeys, maxMetrics int, deadline uint64) ([]string, error) {
WG.Add(1)
labelNames, err := Storage.SearchLabelNames(qt, tfss, tr, maxTagKeys, maxMetrics, deadline)
WG.Done()
return labelNames, err
}
// SearchLabelValues searches for label values for the given labelName, tfss and
// tr.
func SearchLabelValues(qt *querytracer.Tracer, labelName string, tfss []*storage.TagFilters, tr storage.TimeRange, maxLabelValues, maxMetrics int, deadline uint64) ([]string, error) {
WG.Add(1)
labelValues, err := Storage.SearchLabelValues(qt, labelName, tfss, tr, maxLabelValues, maxMetrics, deadline)
WG.Done()
return labelValues, err
}
// SearchTagValueSuffixes returns all the tag value suffixes for the given tagKey and tagValuePrefix on the given tr.
//
// This allows implementing https://graphite-api.readthedocs.io/en/latest/api.html#metrics-find or similar APIs.
func SearchTagValueSuffixes(qt *querytracer.Tracer, tr storage.TimeRange, tagKey, tagValuePrefix string, delimiter byte, maxTagValueSuffixes int, deadline uint64) ([]string, error) {
WG.Add(1)
suffixes, err := Storage.SearchTagValueSuffixes(qt, tr, tagKey, tagValuePrefix, delimiter, maxTagValueSuffixes, deadline)
WG.Done()
return suffixes, err
}
// SearchGraphitePaths returns all the metric names matching the given Graphite query.
func SearchGraphitePaths(qt *querytracer.Tracer, tr storage.TimeRange, query []byte, maxPaths int, deadline uint64) ([]string, error) {
WG.Add(1)
paths, err := Storage.SearchGraphitePaths(qt, tr, query, maxPaths, deadline)
WG.Done()
return paths, err
}
// GetTSDBStatus returns TSDB status for given filters on the given date.
func GetTSDBStatus(qt *querytracer.Tracer, tfss []*storage.TagFilters, date uint64, focusLabel string, topN, maxMetrics int, deadline uint64) (*storage.TSDBStatus, error) {
WG.Add(1)
status, err := Storage.GetTSDBStatus(qt, tfss, date, focusLabel, topN, maxMetrics, deadline)
WG.Done()
return status, err
}
// GetSeriesCount returns the number of time series in the storage.
func GetSeriesCount(deadline uint64) (uint64, error) {
WG.Add(1)
n, err := Storage.GetSeriesCount(deadline)
WG.Done()
return n, err
}
// TODO(@rtm0): Remove this dependency from vmalert-tool unit tests.
DebugFlush func()
)
// Stop stops the vmstorage
func Stop() {
@@ -333,17 +202,22 @@ func Stop() {
logger.Infof("gracefully closing the storage at %s", *storageDataPath)
startTime := time.Now()
WG.WaitAndBlock()
stopStaleSnapshotsRemover()
Storage.MustClose()
vmStorage.Stop()
logger.Infof("successfully closed the storage in %.3f seconds", time.Since(startTime).Seconds())
fs.MustStopDirRemover()
logger.Infof("the storage has been stopped")
logger.Infof("the vmstorage has been stopped")
}
// RequestHandler is a storage request handler.
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
func (vmssn *VMStorageSingleNode) requestHandler(w http.ResponseWriter, r *http.Request) bool {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.requestHandler(w, r)
}
// requestHandler is a storage request handler.
// TODO(@rtm0): Move to a separate file, request_handler.go
func (vms *VMStorage) requestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
if path == "/internal/force_merge" {
if !httpserver.CheckAuthFlag(w, r, forceMergeAuthKey) {
@@ -356,7 +230,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
defer activeForceMerges.Dec()
logger.Infof("forced merge for partition_prefix=%q has been started", partitionNamePrefix)
startTime := time.Now()
if err := Storage.ForceMergePartitions(partitionNamePrefix); err != nil {
if err := vms.s.ForceMergePartitions(partitionNamePrefix); err != nil {
logger.Errorf("error in forced merge for partition_prefix=%q: %s", partitionNamePrefix, err)
return
}
@@ -369,7 +243,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
logger.Infof("flushing storage to make pending data available for reading")
Storage.DebugFlush()
vms.s.DebugFlush()
return true
}
@@ -389,7 +263,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
logger.Infof("enabling logging of new series for the next %s. This may increase resource usage during this period.", time.Duration(dealine)*time.Second)
endTime := fasttime.UnixTimestamp() + uint64(dealine)
Storage.SetLogNewSeriesUntil(endTime)
vms.s.SetLogNewSeriesUntil(endTime)
fmt.Fprintf(w, `{"status":"success","data":{"logEndTime":%q}}`, time.Unix(int64(endTime), 0))
return true
}
@@ -411,13 +285,13 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/create":
snapshotsCreateTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshotName := Storage.MustCreateSnapshot()
snapshotName := vms.s.MustCreateSnapshot()
// Verify whether the client already closed the connection.
// In this case it is better to drop the created snapshot, since the client isn't interested in it.
if err := r.Context().Err(); err != nil {
logger.Infof("deleting already created snapshot at %s because the client canceled the request", snapshotName)
if err := deleteSnapshot(snapshotName); err != nil {
if err := vms.deleteSnapshot(snapshotName); err != nil {
logger.Infof("cannot delete just created snapshot: %s", err)
return true
}
@@ -433,7 +307,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/list":
snapshotsListTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshots := Storage.MustListSnapshots()
snapshots := vms.s.MustListSnapshots()
fmt.Fprintf(w, `{"status":"ok","snapshots":[`)
if len(snapshots) > 0 {
for _, snapshot := range snapshots[:len(snapshots)-1] {
@@ -447,7 +321,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
snapshotsDeleteTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshotName := r.FormValue("snapshot")
if err := deleteSnapshot(snapshotName); err != nil {
if err := vms.deleteSnapshot(snapshotName); err != nil {
jsonResponseError(w, err)
snapshotsDeleteErrorsTotal.Inc()
return true
@@ -457,9 +331,9 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case "/delete_all":
snapshotsDeleteAllTotal.Inc()
w.Header().Set("Content-Type", "application/json")
snapshots := Storage.MustListSnapshots()
snapshots := vms.s.MustListSnapshots()
for _, snapshotName := range snapshots {
if err := Storage.DeleteSnapshot(snapshotName); err != nil {
if err := vms.s.DeleteSnapshot(snapshotName); err != nil {
err = fmt.Errorf("cannot delete snapshot %q: %w", snapshotName, err)
jsonResponseError(w, err)
snapshotsDeleteAllErrorsTotal.Inc()
@@ -473,50 +347,6 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
}
func deleteSnapshot(snapshotName string) error {
snapshots := Storage.MustListSnapshots()
for _, snName := range snapshots {
if snName == snapshotName {
if err := Storage.DeleteSnapshot(snName); err != nil {
return fmt.Errorf("cannot delete snapshot %q: %w", snName, err)
}
return nil
}
}
return fmt.Errorf("cannot find snapshot %q", snapshotName)
}
func initStaleSnapshotsRemover(strg *storage.Storage) {
staleSnapshotsRemoverCh = make(chan struct{})
if snapshotsMaxAge.Duration() <= 0 {
return
}
snapshotsMaxAgeDur := snapshotsMaxAge.Duration()
staleSnapshotsRemoverWG.Go(func() {
d := timeutil.AddJitterToDuration(time.Second * 11)
t := time.NewTicker(d)
defer t.Stop()
for {
select {
case <-staleSnapshotsRemoverCh:
return
case <-t.C:
}
strg.MustDeleteStaleSnapshots(snapshotsMaxAgeDur)
}
})
}
func stopStaleSnapshotsRemover() {
close(staleSnapshotsRemoverCh)
staleSnapshotsRemoverWG.Wait()
}
var (
staleSnapshotsRemoverCh chan struct{}
staleSnapshotsRemoverWG sync.WaitGroup
)
var (
activeForceMerges = metrics.NewCounter("vm_active_force_merges")
@@ -531,7 +361,16 @@ var (
snapshotsDeleteAllErrorsTotal = metrics.NewCounter(`vm_http_request_errors_total{path="/snapshot/delete_all"}`)
)
func writeStorageMetrics(w io.Writer, strg *storage.Storage) {
// TODO(@rtm0): Move to metrics.go.
func (vmssn *VMStorageSingleNode) writeStorageMetrics(w io.Writer) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
vmssn.vms.writeStorageMetrics(w)
}
// TODO(@rtm0): Move to metrics.go.
func (vms *VMStorage) writeStorageMetrics(w io.Writer) {
strg := vms.s
var m storage.Metrics
strg.UpdateMetrics(&m)
tm := &m.TableMetrics
@@ -755,6 +594,8 @@ func writeStorageMetrics(w io.Writer, strg *storage.Storage) {
metrics.WriteGaugeUint64(w, `vm_downsampling_partitions_scheduled`, tm.ScheduledDownsamplingPartitions)
metrics.WriteGaugeUint64(w, `vm_downsampling_partitions_scheduled_size_bytes`, tm.ScheduledDownsamplingPartitionsSize)
metrics.WriteGaugeUint64(w, `vm_search_max_unique_timeseries`, uint64(vms.maxUniqueTimeSeriesCalculated))
metrics.WriteGaugeUint64(w, `vm_metrics_metadata_storage_items`, m.MetadataStorageItemsCurrent)
metrics.WriteCounterUint64(w, `vm_metrics_metadata_storage_size_bytes`, m.MetadataStorageCurrentSizeBytes)
metrics.WriteCounterUint64(w, `vm_metrics_metadata_storage_max_size_bytes`, m.MetadataStorageMaxSizeBytes)

391
app/vmstorage/vmstorage.go Normal file
View File

@@ -0,0 +1,391 @@
package vmstorage
import (
"flag"
"fmt"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricnamestats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmselectapi"
)
var (
precisionBits = flag.Int("precisionBits", 64, "The number of precision bits to store per each value. Lower precision bits improves data compression "+
"at the cost of precision loss")
maxUniqueTimeseries = flag.Int("search.maxUniqueTimeseries", 0, "The maximum number of unique time series, which can be scanned during every query. "+
"This allows protecting against heavy queries, which select unexpectedly high number of series. When set to zero, the limit is automatically calculated based on -search.maxConcurrentRequests (inversely proportional) and memory available to the process (proportional). See also -search.max* command-line flags at vmselect")
maxTagKeys = flag.Int("search.maxTagKeys", 100e3, "The maximum number of tag keys returned per search. "+
"See also -search.maxLabelsAPISeries and -search.maxLabelsAPIDuration")
maxTagValues = flag.Int("search.maxTagValues", 100e3, "The maximum number of tag values returned per search. "+
"See also -search.maxLabelsAPISeries and -search.maxLabelsAPIDuration")
maxTagValueSuffixesPerSearch = flag.Int("search.maxTagValueSuffixesPerSearch", 100e3, "The maximum number of tag value suffixes returned from /metrics/find")
snapshotsMaxAge = flagutil.NewRetentionDuration("snapshotsMaxAge", "3d", "Automatically delete snapshots older than -snapshotsMaxAge if it is set to non-zero duration. Make sure that backup process has enough time to finish the backup before the corresponding snapshot is automatically deleted")
)
// newVMStorage creates a new instance of of VMStorage.
//
// The created VMStorage instance takes ownership of s.
func newVMStorage(s *storage.Storage, vmselectMaxConcurrentRequests int) *VMStorage {
if err := encoding.CheckPrecisionBits(uint8(*precisionBits)); err != nil {
logger.Fatalf("invalid -precisionBits=%d: %s", *precisionBits, err)
}
maxUniqueTimeseriesCalculated := *maxUniqueTimeseries
if maxUniqueTimeseriesCalculated <= 0 {
maxUniqueTimeseriesCalculated = calculateMaxUniqueTimeseries(vmselectMaxConcurrentRequests, memory.Remaining())
}
vms := &VMStorage{
s: s,
maxUniqueTimeseries: *maxUniqueTimeseries,
maxUniqueTimeSeriesCalculated: maxUniqueTimeseriesCalculated,
staleSnapshotsRemoverCh: make(chan struct{}),
}
vms.initStaleSnapshotsRemover()
return vms
}
// calculateMaxUniqueTimeseries calculates the maxUniqueTimeseries based on the
// available system resources.
func calculateMaxUniqueTimeseries(maxConcurrentRequests, remainingMemory int) int {
if maxConcurrentRequests <= 0 {
// This line should NOT be reached unless the user has set an incorrect `search.maxConcurrentRequests`.
// In such cases, fallback to unlimited.
logger.Warnf("limiting -search.maxUniqueTimeseries to %v because -search.maxConcurrentRequests=%d.", 2e9, maxConcurrentRequests)
return 2e9
}
// Calculate the max metrics limit for a single request in the worst-case concurrent scenario.
// The approximate size of 1 unique series that could occupy in the vmstorage is 200 bytes.
mts := remainingMemory / 200 / maxConcurrentRequests
logger.Infof("limiting -search.maxUniqueTimeseries to %d according to -search.maxConcurrentRequests=%d and remaining memory=%d bytes. To increase the limit, reduce -search.maxConcurrentRequests or increase memory available to the process.", mts, maxConcurrentRequests, remainingMemory)
return mts
}
// VMStorage impelements vmselectapi.API and vminsertapi.API.
type VMStorage struct {
s *storage.Storage
maxUniqueTimeseries int
maxUniqueTimeSeriesCalculated int
staleSnapshotsRemoverCh chan struct{}
staleSnapshotsRemoverWG sync.WaitGroup
}
func (vms *VMStorage) initStaleSnapshotsRemover() {
if snapshotsMaxAge.Duration() <= 0 {
return
}
snapshotsMaxAgeDuration := snapshotsMaxAge.Duration()
vms.staleSnapshotsRemoverWG.Go(func() {
d := timeutil.AddJitterToDuration(time.Second * 11)
t := time.NewTicker(d)
defer t.Stop()
for {
select {
case <-vms.staleSnapshotsRemoverCh:
return
case <-t.C:
}
vms.s.MustDeleteStaleSnapshots(snapshotsMaxAgeDuration)
}
})
}
func (vms *VMStorage) Stop() {
close(vms.staleSnapshotsRemoverCh)
vms.staleSnapshotsRemoverWG.Wait()
vms.s.MustClose()
}
// WriteRows writes metric rows to the storage.
//
// The caller should limit the number of concurrent calls to WriteRows() in
// order to limit memory usage.
func (vms *VMStorage) WriteRows(rows []storage.MetricRow) error {
vms.s.AddRows(rows, uint8(*precisionBits))
return nil
}
// WriteMetadata writes metrics metadata to storage.
//
// The caller should limit the number of concurrent calls to WriteMetadata() in
// order to limit memory usage.
func (vms *VMStorage) WriteMetadata(rows []metricsmetadata.Row) error {
vms.s.AddMetadataRows(rows)
return nil
}
// IsReadOnly returns true is the storage is in read-only mode.
func (vms *VMStorage) IsReadOnly() bool {
return vms.s.IsReadOnly()
}
func (vms *VMStorage) InitSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (vmselectapi.BlockIterator, error) {
tr := sq.GetTimeRange()
maxMetrics := vms.getMaxMetrics(sq.MaxMetrics)
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
return nil, err
}
if len(tfss) == 0 {
return nil, fmt.Errorf("missing tag filters")
}
bi := getBlockIterator()
bi.sr.Init(qt, vms.s, tfss, tr, maxMetrics, deadline)
if err := bi.sr.Error(); err != nil {
bi.MustClose()
return nil, err
}
return bi, nil
}
func (vms *VMStorage) getMaxMetrics(searchQueryLimit int) int {
if searchQueryLimit <= 0 {
return vms.maxUniqueTimeSeriesCalculated
}
// searchQueryLimit cannot exceed `-search.maxUniqueTimeseries`
if vms.maxUniqueTimeseries != 0 && searchQueryLimit > vms.maxUniqueTimeseries {
searchQueryLimit = vms.maxUniqueTimeseries
}
return searchQueryLimit
}
// blockIterator implements vmselectapi.BlockIterator
type blockIterator struct {
sr storage.Search
mb storage.MetricBlock
}
var blockIteratorsPool sync.Pool
func (bi *blockIterator) MustClose() {
bi.sr.MustClose()
bi.mb.MetricName = nil
bi.mb.Block.Reset()
blockIteratorsPool.Put(bi)
}
func getBlockIterator() *blockIterator {
v := blockIteratorsPool.Get()
if v == nil {
v = &blockIterator{}
}
return v.(*blockIterator)
}
func (bi *blockIterator) NextBlock(dst []byte) ([]byte, bool) {
if !bi.sr.NextMetricBlock() {
return dst, false
}
mb := bi.mb
mb.MetricName = bi.sr.MetricBlockRef.MetricName
bi.sr.MetricBlockRef.BlockRef.MustReadBlock(&mb.Block)
dst = mb.Marshal(dst[:0])
return dst, true
}
func (bi *blockIterator) Error() error {
return bi.sr.Error()
}
// SearchMetricNames returns metric names for the given tfss on the given tr.
func (vms *VMStorage) SearchMetricNames(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) ([]string, error) {
tr := sq.GetTimeRange()
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
// fallback to maxUniqueTimeSeries if no limit is provided,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857
maxMetrics = vms.maxUniqueTimeSeriesCalculated
}
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
return nil, err
}
if len(tfss) == 0 {
return nil, fmt.Errorf("missing tag filters")
}
return vms.s.SearchMetricNames(qt, tfss, tr, maxMetrics, deadline)
}
// SearchLabelValues searches for label values for the given labelName, tfss and
// tr.
func (vms *VMStorage) LabelValues(qt *querytracer.Tracer, sq *storage.SearchQuery, labelName string, maxLabelValues int, deadline uint64) ([]string, error) {
tr := sq.GetTimeRange()
if maxLabelValues <= 0 || maxLabelValues > *maxTagValues {
maxLabelValues = *maxTagValues
}
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
// fallback to maxUniqueTimeSeries if no limit is provided,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857
maxMetrics = vms.maxUniqueTimeSeriesCalculated
}
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
return nil, err
}
return vms.s.SearchLabelValues(qt, labelName, tfss, tr, maxLabelValues, maxMetrics, deadline)
}
// TagValueSuffixes returns all the tag value suffixes for the given tagKey and
// tagValuePrefix on the given tr.
//
// This allows implementing
// https://graphite-api.readthedocs.io/en/latest/api.html#metrics-find or
// similar APIs.
func (vms *VMStorage) TagValueSuffixes(qt *querytracer.Tracer, _, _ uint32, tr storage.TimeRange, tagKey, tagValuePrefix string, delimiter byte,
maxSuffixes int, deadline uint64) ([]string, error) {
if maxSuffixes <= 0 || maxSuffixes > *maxTagValueSuffixesPerSearch {
maxSuffixes = *maxTagValueSuffixesPerSearch
}
suffixes, err := vms.s.SearchTagValueSuffixes(qt, tr, tagKey, tagValuePrefix, delimiter, maxSuffixes, deadline)
if err != nil {
return nil, err
}
if len(suffixes) >= maxSuffixes {
return nil, fmt.Errorf("more than -search.maxTagValueSuffixesPerSearch=%d suffixes returned; "+
"either narrow down the search or increase -search.maxTagValueSuffixesPerSearch command-line flag value", maxSuffixes)
}
return suffixes, nil
}
// SearchLabelNames searches for tag keys matching the given tfss on tr.
func (vms *VMStorage) LabelNames(qt *querytracer.Tracer, sq *storage.SearchQuery, maxLabelNames int, deadline uint64) ([]string, error) {
tr := sq.GetTimeRange()
if maxLabelNames <= 0 || maxLabelNames > *maxTagKeys {
maxLabelNames = *maxTagKeys
}
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
// fallback to maxUniqueTimeSeries if no limit is provided,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857
maxMetrics = vms.maxUniqueTimeSeriesCalculated
}
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
return nil, err
}
return vms.s.SearchLabelNames(qt, tfss, tr, maxLabelNames, maxMetrics, deadline)
}
func (vms *VMStorage) SeriesCount(_ *querytracer.Tracer, _, _ uint32, deadline uint64) (uint64, error) {
return vms.s.GetSeriesCount(deadline)
}
func (vms *VMStorage) Tenants(_ *querytracer.Tracer, _ storage.TimeRange, _ uint64) ([]string, error) {
return nil, nil
}
// GetTSDBStatus returns TSDB status for given filters on the given date.
func (vms *VMStorage) TSDBStatus(qt *querytracer.Tracer, sq *storage.SearchQuery, focusLabel string, topN int, deadline uint64) (*storage.TSDBStatus, error) {
tr := sq.GetTimeRange()
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
// fallback to maxUniqueTimeSeries if no limit is provided,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857
maxMetrics = vms.maxUniqueTimeSeriesCalculated
}
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
return nil, err
}
date := uint64(sq.MinTimestamp) / (24 * 3600 * 1000)
return vms.s.GetTSDBStatus(qt, tfss, date, focusLabel, topN, maxMetrics, deadline)
}
// DeleteSeries deletes series matching tfss.
//
// Returns the number of deleted series.
func (vms *VMStorage) DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (int, error) {
tr := sq.GetTimeRange()
maxMetrics := sq.MaxMetrics
if maxMetrics <= 0 {
// fallback to maxUniqueTimeSeries if no limit is provided,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7857
maxMetrics = vms.maxUniqueTimeSeriesCalculated
}
tfss, err := vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
return 0, err
}
if len(tfss) == 0 {
return 0, fmt.Errorf("missing tag filters")
}
return vms.s.DeleteSeries(qt, tfss, maxMetrics)
}
func (vms *VMStorage) RegisterMetricNames(qt *querytracer.Tracer, mrs []storage.MetricRow, _ uint64) error {
vms.s.RegisterMetricNames(qt, mrs)
return nil
}
// GetMetricNamesUsageStats returns metric name usage stats.
func (vms *VMStorage) GetMetricNamesUsageStats(qt *querytracer.Tracer, _ *storage.TenantToken, limit, le int, matchPattern string, _ uint64) (metricnamestats.StatsResult, error) {
return vms.s.GetMetricNamesStats(qt, limit, le, matchPattern), nil
}
// ResetMetricNamesStats resets state for metric names usage tracker
func (vms *VMStorage) ResetMetricNamesUsageStats(qt *querytracer.Tracer, _ uint64) error {
vms.s.ResetMetricNamesStats(qt)
return nil
}
func (vms *VMStorage) setupTfss(qt *querytracer.Tracer, sq *storage.SearchQuery, tr storage.TimeRange, maxMetrics int, deadline uint64) ([]*storage.TagFilters, error) {
tfss := make([]*storage.TagFilters, 0, len(sq.TagFilterss))
for _, tagFilters := range sq.TagFilterss {
tfs := storage.NewTagFilters()
for i := range tagFilters {
tf := &tagFilters[i]
if string(tf.Key) == "__graphite__" {
query := tf.Value
qtChild := qt.NewChild("searching for series matching __graphite__=%q", query)
paths, err := vms.s.SearchGraphitePaths(qtChild, tr, query, maxMetrics, deadline)
qtChild.Donef("found %d series", len(paths))
if err != nil {
return nil, fmt.Errorf("error when searching for Graphite paths for query %q: %w", query, err)
}
if len(paths) >= maxMetrics {
return nil, fmt.Errorf("more than %d time series match Graphite query %q; "+
"either narrow down the query or increase the corresponding -search.max* command-line flag value at vmselect nodes; "+
"see https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#resource-usage-limits", maxMetrics, query)
}
tfs.AddGraphiteQuery(query, paths, tf.IsNegative)
continue
}
if err := tfs.Add(tf.Key, tf.Value, tf.IsNegative, tf.IsRegexp); err != nil {
return nil, fmt.Errorf("cannot parse tag filter %s: %w", tf, err)
}
}
tfss = append(tfss, tfs)
}
return tfss, nil
}
func (vms *VMStorage) GetMetadataRecords(qt *querytracer.Tracer, _ *storage.TenantToken, limit int, metricName string, _ uint64) ([]*metricsmetadata.Row, error) {
return vms.s.GetMetadataRows(qt, limit, metricName), nil
}
// deleteSnapshot deletes a snapshot by its name.
//
// Callers must wrap the call with wg.Add(1)...wg.Done().
func (vms *VMStorage) deleteSnapshot(snapshotName string) error {
snapshots := vms.s.MustListSnapshots()
for _, snName := range snapshots {
if snName == snapshotName {
if err := vms.s.DeleteSnapshot(snName); err != nil {
return fmt.Errorf("cannot delete snapshot %q: %w", snName, err)
}
return nil
}
}
return fmt.Errorf("cannot find snapshot %q", snapshotName)
}

View File

@@ -0,0 +1,213 @@
package vmstorage
import (
"errors"
"fmt"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricnamestats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage/metricsmetadata"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/syncwg"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/vmselectapi"
)
// newVMStorageSingleNode creates a new instance of of VMStorage for vmsingle.
func newVMStorageSingleNode(s *storage.Storage, maxConcurrentRequests int, resetCacheIfNeeded func(mrs []storage.MetricRow)) *VMStorageSingleNode {
vms := newVMStorage(s, maxConcurrentRequests)
return &VMStorageSingleNode{
vms: vms,
wg: syncwg.WaitGroup{},
resetCacheIfNeeded: resetCacheIfNeeded,
}
}
type VMStorageSingleNode struct {
vms *VMStorage
// wg is used to wrap every storage call into wg.Add(1) ... wg.Done()
// for proper graceful shutdown when Stop is called.
//
// Use syncwg instead of sync, since Add is called from concurrent
// goroutines.
wg syncwg.WaitGroup
// resetCacheIfNeeded is a callback for automatic resetting of response
// cache if needed.
resetCacheIfNeeded func(mrs []storage.MetricRow)
}
func (vmssn *VMStorageSingleNode) Stop() {
vmssn.wg.WaitAndBlock()
vmssn.vms.Stop()
}
// WriteRows writes metric rows to the storage.
//
// Returns an error if the storage is in read-only mode.
//
// The caller should limit the number of concurrent calls to WriteRows() in
// order to limit memory usage.
func (vmssn *VMStorageSingleNode) WriteRows(rows []storage.MetricRow) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
if vmssn.vms.IsReadOnly() {
return errReadOnly
}
vmssn.resetCacheIfNeeded(rows)
return vmssn.vms.WriteRows(rows)
}
// WriteMetadata writes metrics metadata to storage.
//
// Returns an error if the storage is in read-only mode.
//
// The caller should limit the number of concurrent calls to WriteMetadata() in
// order to limit memory usage.
func (vmssn *VMStorageSingleNode) WriteMetadata(rows []metricsmetadata.Row) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
if vmssn.vms.IsReadOnly() {
return errReadOnly
}
return vmssn.vms.WriteMetadata(rows)
}
var errReadOnly = errors.New("the storage is in read-only mode; check -storage.minFreeDiskSpaceBytes command-line flag value")
func (vmssn *VMStorageSingleNode) IsReadOnly() bool {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.IsReadOnly()
}
func (vmssn *VMStorageSingleNode) InitSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (vmselectapi.BlockIterator, error) {
return nil, fmt.Errorf("not implemented in vmsingle")
}
// GetSearch sets up an instance of storage search and returns it to the caller
// along with the max series count that the search can return.
//
// This method is not part of the vmselectapi.API and must only be used by
// vmsingle HTTP handlers.
//
// Callers of this method must call PutSearch() once the search instance is not
// needed anymore.
func (vmssn *VMStorageSingleNode) GetSearch(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (*storage.Search, int, error) {
vmssn.wg.Add(1)
tr := sq.GetTimeRange()
maxMetrics := vmssn.vms.getMaxMetrics(sq.MaxMetrics)
tfss, err := vmssn.vms.setupTfss(qt, sq, tr, maxMetrics, deadline)
if err != nil {
vmssn.wg.Done()
return nil, 0, err
}
sr := getSearch()
maxSeriesCount := sr.Init(qt, vmssn.vms.s, tfss, tr, sq.MaxMetrics, deadline)
return sr, maxSeriesCount, nil
}
// PutSearch resets the search once it is not needed anymore and puts it aside
// for future reuse.
//
// This method is not part of the vmselectapi.API and must only be used by
// vmsingle HTTP handlers.
//
// The method must only be used on search instances that have been created with
// GetSearch().
func (vmssn *VMStorageSingleNode) PutSearch(sr *storage.Search) {
putSearch(sr)
vmssn.wg.Done()
}
func getSearch() *storage.Search {
v := ssPool.Get()
if v == nil {
return &storage.Search{}
}
return v.(*storage.Search)
}
func putSearch(sr *storage.Search) {
sr.MustClose()
ssPool.Put(sr)
}
var ssPool sync.Pool
func (vmssn *VMStorageSingleNode) SearchMetricNames(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.SearchMetricNames(qt, sq, deadline)
}
func (vmssn *VMStorageSingleNode) LabelValues(qt *querytracer.Tracer, sq *storage.SearchQuery, labelName string, maxLabelValues int, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.LabelValues(qt, sq, labelName, maxLabelValues, deadline)
}
func (vmssn *VMStorageSingleNode) TagValueSuffixes(qt *querytracer.Tracer, accountID, projectID uint32, tr storage.TimeRange, tagKey, tagValuePrefix string, delimiter byte, maxSuffixes int, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.TagValueSuffixes(qt, accountID, projectID, tr, tagKey, tagValuePrefix, delimiter, maxSuffixes, deadline)
}
func (vmssn *VMStorageSingleNode) LabelNames(qt *querytracer.Tracer, sq *storage.SearchQuery, maxLabelNames int, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.LabelNames(qt, sq, maxLabelNames, deadline)
}
func (vmssn *VMStorageSingleNode) SeriesCount(qt *querytracer.Tracer, accountID, projectID uint32, deadline uint64) (uint64, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.SeriesCount(qt, accountID, projectID, deadline)
}
func (vmssn *VMStorageSingleNode) Tenants(qt *querytracer.Tracer, tr storage.TimeRange, deadline uint64) ([]string, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.Tenants(qt, tr, deadline)
}
func (vmssn *VMStorageSingleNode) TSDBStatus(qt *querytracer.Tracer, sq *storage.SearchQuery, focusLabel string, topN int, deadline uint64) (*storage.TSDBStatus, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.TSDBStatus(qt, sq, focusLabel, topN, deadline)
}
func (vmssn *VMStorageSingleNode) DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline uint64) (int, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.DeleteSeries(qt, sq, deadline)
}
func (vmssn *VMStorageSingleNode) RegisterMetricNames(qt *querytracer.Tracer, mrs []storage.MetricRow, deadline uint64) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.RegisterMetricNames(qt, mrs, deadline)
}
func (vmssn *VMStorageSingleNode) GetMetricNamesUsageStats(qt *querytracer.Tracer, tt *storage.TenantToken, limit, le int, matchPattern string, deadline uint64) (metricnamestats.StatsResult, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.GetMetricNamesUsageStats(qt, tt, limit, le, matchPattern, deadline)
}
func (vmssn *VMStorageSingleNode) ResetMetricNamesUsageStats(qt *querytracer.Tracer, deadline uint64) error {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.ResetMetricNamesUsageStats(qt, deadline)
}
func (vmssn *VMStorageSingleNode) GetMetadataRecords(qt *querytracer.Tracer, tt *storage.TenantToken, limit int, metricName string, deadline uint64) ([]*metricsmetadata.Row, error) {
vmssn.wg.Add(1)
defer vmssn.wg.Done()
return vmssn.vms.GetMetadataRecords(qt, tt, limit, metricName, deadline)
}

View File

@@ -0,0 +1,62 @@
package vmstorage
import (
"math"
"strconv"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
)
func TestCalculateMaxMetricsLimitByResource(t *testing.T) {
f := func(maxConcurrentRequest, remainingMemory, expect int) {
t.Helper()
maxMetricsLimit := calculateMaxUniqueTimeseries(maxConcurrentRequest, remainingMemory)
if maxMetricsLimit != expect {
t.Fatalf("unexpected max metrics limit: got %d, want %d", maxMetricsLimit, expect)
}
}
// 64-bit architectures support memory sizes > 4GB.
if strconv.IntSize == 64 {
// 8 CPU & 32 GiB
f(16, int(math.Round(32*1024*1024*1024*0.4)), 4294967)
// 4 CPU & 32 GiB
f(8, int(math.Round(32*1024*1024*1024*0.4)), 8589934)
}
// 2 CPU & 4 GiB
f(4, int(math.Round(4*1024*1024*1024*0.4)), 2147483)
// other edge cases
f(0, int(math.Round(4*1024*1024*1024*0.4)), 2e9)
f(4, 0, 0)
}
func TestGetMaxMetrics(t *testing.T) {
originalMaxUniqueTimeSeries := *maxUniqueTimeseries
defer func() {
*maxUniqueTimeseries = originalMaxUniqueTimeSeries
fs.MustRemoveDir(t.Name())
}()
maxConcurrentRequests := 2 * cgroup.AvailableCPUs()
f := func(searchQueryLimit, storageMaxUniqueTimeseries, expect int) {
t.Helper()
*maxUniqueTimeseries = storageMaxUniqueTimeseries
s := storage.MustOpenStorage(t.Name(), storage.OpenOptions{})
vms := newVMStorage(s, maxConcurrentRequests)
defer vms.Stop()
maxMetrics := vms.getMaxMetrics(searchQueryLimit)
if maxMetrics != expect {
t.Fatalf("unexpected max metrics: got %d, want %d", maxMetrics, expect)
}
}
f(0, 1e6, 1e6)
f(2e6, 0, 2e6)
f(2e6, 1e6, 1e6)
}

View File

@@ -1,4 +1,4 @@
FROM golang:1.26.3 AS build-web-stage
FROM golang:1.26.4 AS build-web-stage
COPY build /build
WORKDIR /build

View File

@@ -1,7 +1,7 @@
export const seriesFetchedWarning = `No match!
export const seriesFetchedWarning = `No match!
This query hasn't selected any time series from database.
Either the requested metrics are missing in the database,
or there is a typo in series selector.`;
export const partialWarning = `The shown results are marked as PARTIAL.
The result is marked as partial if one or more vmstorage nodes failed to respond to the query.`;
The result is marked as partial if one or more storage nodes failed to respond to the query.`;

View File

@@ -71,7 +71,7 @@ const RulesHeader = ({
<TextField
label="Search"
value={search}
placeholder="Filter by rule, name or labels"
placeholder="Filter by group or rule name"
startIcon={<SearchIcon />}
onChange={onChangeSearch}
/>

View File

@@ -79,24 +79,25 @@ type PrometheusWriteQuerier interface {
// QueryOpts contains various params used for querying or ingesting data
type QueryOpts struct {
Tenant string
Timeout string
Start string
End string
Time string
Step string
ExtraFilters []string
ExtraLabels []string
Trace string
ReduceMemUsage string
MaxLookback string
LatencyOffset string
Format string
NoCache string
Headers http.Header
From string
Until string
StorageStep string
Tenant string
Timeout string
Start string
End string
Time string
Step string
ExtraFilters []string
ExtraLabels []string
Trace string
ReduceMemUsage string
MaxLookback string
LatencyOffset string
Format string
NoCache string
Headers http.Header
From string
Until string
StorageStep string
DenyPartialResponse string
}
func (qos *QueryOpts) getHeaders() http.Header {
@@ -132,6 +133,7 @@ func (qos *QueryOpts) asURLValues() url.Values {
addNonEmpty("from", qos.From)
addNonEmpty("until", qos.Until)
addNonEmpty("storage_step", qos.StorageStep)
addNonEmpty("deny_partial_response", qos.DenyPartialResponse)
return uv
}

View File

@@ -1015,35 +1015,42 @@ func testGroupSkipSlowReplicas(tc *apptest.TestCase, opts *testGroupReplicationO
func testGroupPartialResponse(tc *apptest.TestCase, opts *testGroupReplicationOpts) {
t := tc.T()
assertSeries := func(app *apptest.Vmselect, wantPartial bool) {
assertSeries := func(app *apptest.Vmselect, denyPartialResponse string, want *apptest.PrometheusAPIV1SeriesResponse) {
t.Helper()
tc.Assert(&apptest.AssertOptions{
Msg: "unexpected /api/v1/series response",
Got: func() any {
return app.PrometheusAPIV1Series(t, `{__name__=~".*"}`, apptest.QueryOpts{
Start: "2024-01-01T00:00:00Z",
End: "2024-01-31T00:00:00Z",
Start: "2024-01-01T00:00:00Z",
End: "2024-01-31T00:00:00Z",
DenyPartialResponse: denyPartialResponse,
}).Sort()
},
Want: &apptest.PrometheusAPIV1SeriesResponse{
Status: "success",
IsPartial: wantPartial,
},
Want: want,
CmpOpts: []cmp.Option{
cmpopts.IgnoreFields(apptest.PrometheusAPIV1SeriesResponse{}, "Data"),
cmpopts.IgnoreFields(apptest.PrometheusAPIV1SeriesResponse{}, "Data", "Error"),
},
})
}
mustReturnPartialResponse := true
mustReturnFullResponse := false
allowPartialResponse := ""
denyPartialResponse := "1"
mustReturnPartialResponse := &apptest.PrometheusAPIV1SeriesResponse{
Status: "success",
IsPartial: true,
}
mustReturnFullResponse := &apptest.PrometheusAPIV1SeriesResponse{
Status: "success",
IsPartial: false,
}
// All vmstorage replicas are available so both vmselects must return full
// response.
assertSeries(opts.c.vmselect, mustReturnFullResponse)
assertSeries(opts.c.vmselectGroupRF, mustReturnFullResponse)
assertSeries(opts.c.vmselectGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnFullResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnFullResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnFullResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnFullResponse)
// Stop groupRF-1 vmstorage nodes in first group.
//
@@ -1053,10 +1060,10 @@ func testGroupPartialResponse(tc *apptest.TestCase, opts *testGroupReplicationOp
// about the replication factor and therefore they must still be able to
// return full dataset.
opts.c.storageGroups[0].stopNodes(tc, opts.groupRF-1)
assertSeries(opts.c.vmselect, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, mustReturnFullResponse)
assertSeries(opts.c.vmselectGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnFullResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnFullResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnFullResponse)
// Stop groupRF-1 vmstorages in the remaining groups.
//
@@ -1066,10 +1073,10 @@ func testGroupPartialResponse(tc *apptest.TestCase, opts *testGroupReplicationOp
for g := 1; g < len(opts.c.storageGroups); g++ {
opts.c.storageGroups[g].stopNodes(tc, opts.groupRF-1)
}
assertSeries(opts.c.vmselect, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, mustReturnFullResponse)
assertSeries(opts.c.vmselectGlobalRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnFullResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnFullResponse)
// Stop one more vmstorage in the first group.
//
@@ -1077,10 +1084,10 @@ func testGroupPartialResponse(tc *apptest.TestCase, opts *testGroupReplicationOp
// because it is unaware of replication across groups. vmselectGroupGlobalRF
// will continue retuning full dataset.
opts.c.storageGroups[0].stopNodes(tc, 1)
assertSeries(opts.c.vmselect, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnFullResponse)
// Stop one more vmstoarge in remaining globarRF-1 groups.
//
@@ -1089,19 +1096,56 @@ func testGroupPartialResponse(tc *apptest.TestCase, opts *testGroupReplicationOp
for g := 1; g < opts.globalRF-1; g++ {
opts.c.storageGroups[g].stopNodes(tc, 1)
}
assertSeries(opts.c.vmselect, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, mustReturnFullResponse)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnFullResponse)
// Stop one more vmstoarge in one more group.
//
// vmselectGroupGlobalRF must now return partial dataset.
opts.c.storageGroups[opts.globalRF].stopNodes(tc, 1)
assertSeries(opts.c.vmselect, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, mustReturnPartialResponse)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnPartialResponse)
// Stop all the remaining vmstorage nodes except a single node.
//
// At this point vmselects still must be able to return partial response
// because at least one vmstorage node has successfully returned results.
n := len(opts.c.storageGroups[0].vmstorages)
opts.c.storageGroups[0].stopNodes(tc, n-1)
for g := 1; g < len(opts.c.storageGroups); g++ {
n := len(opts.c.storageGroups[g].vmstorages)
opts.c.storageGroups[g].stopNodes(tc, n)
}
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnPartialResponse)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnPartialResponse)
mustReturnUnavailableError := &apptest.PrometheusAPIV1SeriesResponse{
Status: "error",
ErrorType: "503",
}
// vmselects must return an error for the same request when partial
// responses are denied explicitly.
assertSeries(opts.c.vmselect, denyPartialResponse, mustReturnUnavailableError)
assertSeries(opts.c.vmselectGroupRF, denyPartialResponse, mustReturnUnavailableError)
assertSeries(opts.c.vmselectGlobalRF, denyPartialResponse, mustReturnUnavailableError)
assertSeries(opts.c.vmselectGroupGlobalRF, denyPartialResponse, mustReturnUnavailableError)
// Stop the last remaining vmstorage node.
//
// vmselects must return an error when there are no successful vmstorage
// responses.
opts.c.storageGroups[0].stopNodes(tc, 1)
assertSeries(opts.c.vmselect, allowPartialResponse, mustReturnUnavailableError)
assertSeries(opts.c.vmselectGroupRF, allowPartialResponse, mustReturnUnavailableError)
assertSeries(opts.c.vmselectGlobalRF, allowPartialResponse, mustReturnUnavailableError)
assertSeries(opts.c.vmselectGroupGlobalRF, allowPartialResponse, mustReturnUnavailableError)
}
// TestClusterReplication_PartialResponseMultitenant checks how vmselect handles some

View File

@@ -0,0 +1,51 @@
{
"ulid": "01JFJBS3YP1SHZ3PJQ6HK76EC3",
"minTime": 1734709200000,
"maxTime": 1734709320000,
"stats": {
"numSamples": 400,
"numSeries": 100,
"numChunks": 100
},
"compaction": {
"level": 1,
"sources": [
"01JFJBS3YP1SHZ3PJQ6HK76EC3"
],
"parents": [
{
"ulid": "00000000000000000000000000",
"minTime": 0,
"maxTime": 0
}
],
"hints": [
"from-out-of-order"
]
},
"version": 1,
"out_of_order": false,
"thanos": {
"labels": {},
"downsample": {
"resolution": 0
},
"source": "receive",
"segment_files": [
"000001"
],
"files": [
{
"rel_path": "chunks/000001",
"size_bytes": 4808
},
{
"rel_path": "index",
"size_bytes": 55021
},
{
"rel_path": "meta.json"
}
]
}
}

Binary file not shown.

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,139 @@
package tests
import (
"encoding/json"
"fmt"
"io"
"os"
"testing"
"github.com/google/go-cmp/cmp"
"github.com/google/go-cmp/cmp/cmpopts"
"github.com/VictoriaMetrics/VictoriaMetrics/apptest"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
)
const (
testMimirPath = "testdata/mimir-tsdb"
expectedMimirResponseFile = "./testdata/mimir-tsdb/expected_response.json"
)
func TestSingleVmctlMimirProtocol(t *testing.T) {
fs.MustRemoveDir(t.Name())
tc := apptest.NewTestCase(t)
defer tc.Stop()
vmsingleDst := tc.MustStartDefaultVmsingle()
vmAddr := fmt.Sprintf("http://%s/", vmsingleDst.HTTPAddr())
dir, err := os.Getwd()
if err != nil {
t.Fatalf("cannot get current working directory: %s", err)
}
path := fmt.Sprintf("fs://%s/%s", dir, testMimirPath)
vmctlFlags := []string{
`mimir`,
`--mimir-tenant-id=anonymous`,
`--mimir-filter-time-start=2024-12-01T00:00:00Z`,
`--mimir-filter-time-end=2024-12-31T23:59:59Z`,
`--mimir-custom-s3-endpoint=http://localhost:9000`,
`--mimir-path=` + path,
`--vm-addr=` + vmAddr,
`--disable-progress-bar=true`,
`--vm-concurrency=6`,
`--mimir-concurrency=6`,
}
testMimirProtocol(tc, vmsingleDst, vmctlFlags)
}
func TestClusterVmctlMimirProtocol(t *testing.T) {
fs.MustRemoveDir(t.Name())
tc := apptest.NewTestCase(t)
defer tc.Stop()
cluster := tc.MustStartDefaultCluster()
vmAddr := fmt.Sprintf("http://%s/", cluster.Vminsert.HTTPAddr())
dir, err := os.Getwd()
if err != nil {
t.Fatalf("cannot get current working directory: %s", err)
}
path := fmt.Sprintf("fs://%s/%s", dir, testMimirPath)
vmctlFlags := []string{
`mimir`,
`--mimir-tenant-id=anonymous`,
`--mimir-filter-time-start=2024-12-01T00:00:00Z`,
`--mimir-filter-time-end=2024-12-31T23:59:59Z`,
`--mimir-custom-s3-endpoint=http://localhost:9000`,
`--mimir-path=` + path,
`--vm-addr=` + vmAddr,
`--disable-progress-bar=true`,
`--vm-concurrency=6`,
`--mimir-concurrency=6`,
}
testMimirProtocol(tc, cluster, vmctlFlags)
}
func testMimirProtocol(tc *apptest.TestCase, sut apptest.PrometheusWriteQuerier, vmctlFlags []string) {
t := tc.T()
t.Helper()
cmpOpt := cmpopts.IgnoreFields(apptest.PrometheusAPIV1QueryResponse{}, "Status", "Data.ResultType")
// test for empty data request
got := sut.PrometheusAPIV1Query(t, `{__name__=~".*"}`, apptest.QueryOpts{
Step: "5m",
Time: "2025-06-02T17:14:00Z",
})
want := apptest.NewPrometheusAPIV1QueryResponse(t, `{"data":{"result":[]}}`)
if diff := cmp.Diff(want, got, cmpOpt); diff != "" {
t.Errorf("unexpected response (-want, +got):\n%s", diff)
}
tc.MustStartVmctl("vmctl", vmctlFlags)
sut.ForceFlush(t)
// open the expected series response file
file, err := os.Open(expectedMimirResponseFile)
if err != nil {
t.Fatalf("cannot open expected series response file: %s", err)
}
defer file.Close()
bytes, err := io.ReadAll(file)
if err != nil {
t.Fatalf("cannot read expected series response file: %s", err)
}
var wantResponse apptest.PrometheusAPIV1QueryResponse
if err := json.Unmarshal(bytes, &wantResponse); err != nil {
t.Fatalf("cannot unmarshal expected series response file: %s", err)
}
wantResponse.Sort()
tc.Assert(&apptest.AssertOptions{
// For cluster version, we need to wait longer for the metrics to be stored
Retries: 300,
Msg: `unexpected metrics stored on vmsingle via the prometheus protocol`,
Got: func() any {
expected := sut.PrometheusAPIV1Export(t, `{__name__=~".*"}`, apptest.QueryOpts{
Start: "2024-12-01T15:31:10Z",
End: "2024-12-31T15:32:20Z",
})
expected.Sort()
return expected.Data.Result
},
Want: wantResponse.Data.Result,
CmpOpts: []cmp.Option{
cmpopts.IgnoreFields(apptest.PrometheusAPIV1QueryResponse{}, "Status", "Data.ResultType"),
},
})
}

View File

@@ -130,7 +130,7 @@
"calcs": [
"lastNotNull"
],
"fields": "/^short_version$/",
"fields": "/^version$/",
"values": false
},
"showPercentChange": false,
@@ -146,11 +146,10 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "vm_app_version{job=~\"$job\",instance=~\"$instance\"}",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"interval": "",
"legendFormat": "{{short_version}}",
"range": false,
"refId": "A"
}
@@ -1696,7 +1695,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -791,7 +791,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -1960,7 +1960,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2083,7 +2083,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.\n\nSee also major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2727,7 +2727,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nThe lower the better.\n\n- waiting: at least one runnable thread blocked on block-`I/O` (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`; no useful user code ran during these periods → true `I/O` thrashing.\n\nIf stalled > 0 while querying, it's recommended to increase queue depth on NVMe, raise blk-mq budgets, or relax cgroup I/O limits.",
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one runnable thread blocked on IO (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`.\n\nIf stalled > 0, consider increasing queue depth on NVMe, raising blk-mq budgets, or relaxing cgroup I/O limits.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3391,7 +3391,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {
@@ -4110,7 +4110,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows cache miss ratio. Lower is better.",
"description": "Shows cache miss ratio.\n\n**Lower is better.**",
"fieldConfig": {
"defaults": {
"color": {
@@ -5715,106 +5715,6 @@
],
"title": "Pending",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"description": "Network usage by internal VictoriaMetrics RPC protocol",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"links": [],
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8388
},
"id": 74,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "9.1.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"editorMode": "code",
"expr": "sum(rate(vm_tcpdialer_written_bytes_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) * 8",
"legendFormat": "network usage",
"range": true,
"refId": "A"
}
],
"title": "RPC network usage ($instance)",
"type": "timeseries"
}
],
"title": "Interconnection ($job)",

View File

@@ -789,7 +789,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -1954,7 +1954,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2063,7 +2063,7 @@
"format": "time_series",
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{job}}-{{instance}} - waiting",
"legendFormat": "{{job}}-{{instance}} - stalled",
"range": true,
"refId": "B"
}
@@ -2388,7 +2388,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.\n\nSee also major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3023,7 +3023,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\nThe lower the better.",
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one runnable thread blocked on IO (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`.\n\nIf stalled > 0, consider increasing queue depth on NVMe, raising blk-mq budgets, or relaxing cgroup I/O limits.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3690,7 +3690,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {
@@ -4510,7 +4510,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows cache miss ratio. Lower is better.",
"description": "Shows cache miss ratio.\n\n**Lower is better.**",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -131,7 +131,7 @@
"calcs": [
"lastNotNull"
],
"fields": "/^short_version$/",
"fields": "/^version$/",
"values": false
},
"showPercentChange": false,
@@ -147,11 +147,10 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "vm_app_version{job=~\"$job\",instance=~\"$instance\"}",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"interval": "",
"legendFormat": "{{short_version}}",
"range": false,
"refId": "A"
}
@@ -1697,7 +1696,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -792,7 +792,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -1961,7 +1961,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2084,7 +2084,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.\n\nSee also major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2728,7 +2728,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nThe lower the better.\n\n- waiting: at least one runnable thread blocked on block-`I/O` (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`; no useful user code ran during these periods → true `I/O` thrashing.\n\nIf stalled > 0 while querying, it's recommended to increase queue depth on NVMe, raise blk-mq budgets, or relax cgroup I/O limits.",
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one runnable thread blocked on IO (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`.\n\nIf stalled > 0, consider increasing queue depth on NVMe, raising blk-mq budgets, or relaxing cgroup I/O limits.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3392,7 +3392,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {
@@ -4111,7 +4111,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows cache miss ratio. Lower is better.",
"description": "Shows cache miss ratio.\n\n**Lower is better.**",
"fieldConfig": {
"defaults": {
"color": {
@@ -5716,106 +5716,6 @@
],
"title": "Pending",
"type": "timeseries"
},
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Network usage by internal VictoriaMetrics RPC protocol",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"links": [],
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green"
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8388
},
"id": 74,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "9.1.0",
"targets": [
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"editorMode": "code",
"expr": "sum(rate(vm_tcpdialer_written_bytes_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) * 8",
"legendFormat": "network usage",
"range": true,
"refId": "A"
}
],
"title": "RPC network usage ($instance)",
"type": "timeseries"
}
],
"title": "Interconnection ($job)",

View File

@@ -790,7 +790,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -1955,7 +1955,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2064,7 +2064,7 @@
"format": "time_series",
"interval": "",
"intervalFactor": 2,
"legendFormat": "{{job}}-{{instance}} - waiting",
"legendFormat": "{{job}}-{{instance}} - stalled",
"range": true,
"refId": "B"
}
@@ -2389,7 +2389,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.\n\nSee also major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3024,7 +3024,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\nThe lower the better.",
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one runnable thread blocked on IO (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`.\n\nIf stalled > 0, consider increasing queue depth on NVMe, raising blk-mq budgets, or relaxing cgroup I/O limits.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3691,7 +3691,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {
@@ -4511,7 +4511,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows cache miss ratio. Lower is better.",
"description": "Shows cache miss ratio.\n\n**Lower is better.**",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -785,7 +785,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -2042,7 +2042,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2165,7 +2165,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2542,7 +2542,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\nThe lower the better.",
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one runnable thread blocked on IO (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`.\n\nIf stalled > 0, consider increasing queue depth on NVMe, raising blk-mq budgets, or relaxing cgroup I/O limits.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3015,7 +3015,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -566,7 +566,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -1719,7 +1719,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -1840,7 +1840,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2177,7 +2177,7 @@
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2804,10 +2804,10 @@
"overrides": []
},
"gridPos": {
"h": 8,
"h": 7,
"w": 12,
"x": 0,
"y": 352
"y": 11
},
"id": 63,
"options": {
@@ -2843,7 +2843,113 @@
],
"title": "Restarts ($job)",
"type": "timeseries"
}
},
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"description": "Group iteration reset can be caused by irregular delays during evaluation or by the system wall clock being moved backward.\nIf it is caused by host clock changes, vmalert could generate duplicate results for the group rules, since some evaluations could be repeated.\nCheck the host clock time synchronization configuration if this happens frequently.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "bars",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 11
},
"id": 70,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "12.2.0",
"targets": [
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(increase(vmalert_iteration_reset_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by(job, group, file) > 0",
"interval": "1m",
"legendFormat": "({{job}}) {{group}}({{file}})",
"range": true,
"refId": "A"
}
],
"title": "Group Iteration Reset ($instance)",
"type": "timeseries"
}
],
"title": "Troubleshooting",
"type": "row"

View File

@@ -492,14 +492,14 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by (job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
"refId": "A"
}
],
"title": "Version",
"title": "",
"type": "table"
},
{

View File

@@ -784,7 +784,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -2041,7 +2041,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2164,7 +2164,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2541,7 +2541,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\nThe lower the better.",
"description": "Shows IO pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one runnable thread blocked on IO (disk, NVMe, network-storage) while others could still make progress.\n- stalled: all non-idle threads simultaneously waiting on `I/O`.\n\nIf stalled > 0, consider increasing queue depth on NVMe, raising blk-mq budgets, or relaxing cgroup I/O limits.",
"fieldConfig": {
"defaults": {
"color": {
@@ -3014,7 +3014,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {

View File

@@ -565,7 +565,7 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by(job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
@@ -1718,7 +1718,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html). Helps troubleshoot high CPU usage or throttling.\nLower is better.\n\n- waiting: The percentage of time at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: The percentage of time all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 1%. It only becomes a concern if it consistently climbs above 510% and aligns with latency spikes or GC slowdowns.",
"description": "CPU pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one task in the process was ready to run (runnable) but couldn't get scheduled on the CPU.\n- stalled: all tasks in the process (except idle ones) were unable to get CPU time — a full CPU stall.\n\nIf there's a CPU burst, it's normal to see waiting or stalled > 100ms. It only becomes a concern if it consistently climbs above 50-100ms and aligns with latency spikes or GC slowdowns.",
"fieldConfig": {
"defaults": {
"color": {
@@ -1839,7 +1839,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\nLower is better.\n\n- waiting: Time fraction where at least one thread was blocked on memory.\n- stalled: Time fraction where every thread was blocked on memory (severe pressure).\n\nIf queries slow down and both series spike, the host is likely limited by RAM or I/O throughput.",
"description": "Shows memory pressure based on [Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).\n\n**Lower is better.**\n\nPressure is measured as amount of time within 1sec time window the process was:\n- waiting: at least one thread was blocked on memory.\n- stalled: every thread was blocked on memory (severe pressure).\n\nElevated memory pressure can slowdown the process performance by utilizing more disk IO. Consider increasing amount of available RAM limit or decreasing the load on the process.\n\nSeу major page faults rate panel in Troubleshooting section if this metric continued to be high.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2176,7 +2176,7 @@
"type": "prometheus",
"uid": "$ds"
},
"description": "Shows the time goroutines have spent in runnable state before actually running. The lower is better.\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"description": "Shows the time goroutines have spent in runnable state before actually running.\n\n**Lower is better.**\n\nHigh values or values exceeding the threshold is usually a sign of insufficient CPU resources or CPU throttling. \n\nVerify that service has enough CPU resources. Otherwise, the service could work unreliably with delays in processing.",
"fieldConfig": {
"defaults": {
"color": {
@@ -2803,10 +2803,10 @@
"overrides": []
},
"gridPos": {
"h": 8,
"h": 7,
"w": 12,
"x": 0,
"y": 352
"y": 11
},
"id": 63,
"options": {
@@ -2842,7 +2842,113 @@
],
"title": "Restarts ($job)",
"type": "timeseries"
}
},
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"description": "Group iteration reset can be caused by irregular delays during evaluation or by the system wall clock being moved backward.\nIf it is caused by host clock changes, vmalert could generate duplicate results for the group rules, since some evaluations could be repeated.\nCheck the host clock time synchronization configuration if this happens frequently.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "bars",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"showValues": false,
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 11
},
"id": 70,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"hideZeros": false,
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "12.2.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(increase(vmalert_iteration_reset_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by(job, group, file) > 0",
"interval": "1m",
"legendFormat": "({{job}}) {{group}}({{file}})",
"range": true,
"refId": "A"
}
],
"title": "Group Iteration Reset ($instance)",
"type": "timeseries"
}
],
"title": "Troubleshooting",
"type": "row"

View File

@@ -491,14 +491,14 @@
},
"editorMode": "code",
"exemplar": false,
"expr": "sum(vm_app_version{job=~\"$job\", instance=~\"$instance\"}) by (job, short_version)",
"expr": "sum by(job, version) (label_replace(vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version!=\"\"}, \"version\", \"$1\", \"short_version\", \"(.*)\") or vm_app_version{job=~\"$job\", instance=~\"$instance\", short_version=\"\"})",
"format": "table",
"instant": true,
"range": false,
"refId": "A"
}
],
"title": "Version",
"title": "",
"type": "table"
},
{

View File

@@ -7,7 +7,7 @@ ROOT_IMAGE ?= alpine:3.23.4
ROOT_IMAGE_SCRATCH ?= scratch
CERTS_IMAGE := alpine:3.23.4
GO_BUILDER_IMAGE := golang:1.26.3
GO_BUILDER_IMAGE := golang:1.26.4
BUILDER_IMAGE := local/builder:2.0.0-$(shell echo $(GO_BUILDER_IMAGE) | tr :/ __)-1
BASE_IMAGE := local/base:1.1.4-$(shell echo $(ROOT_IMAGE) | tr :/ __)-$(shell echo $(CERTS_IMAGE) | tr :/ __)

View File

@@ -3,7 +3,7 @@ services:
# It scrapes targets defined in --promscrape.config
# And forward them to --remoteWrite.url
vmagent:
image: victoriametrics/vmagent:v1.144.0
image: victoriametrics/vmagent:v1.145.0
depends_on:
- "vmauth"
ports:
@@ -42,14 +42,14 @@ services:
# vmstorage shards. Each shard receives 1/N of all metrics sent to vminserts,
# where N is number of vmstorages (2 in this case).
vmstorage-1:
image: victoriametrics/vmstorage:v1.144.0-cluster
image: victoriametrics/vmstorage:v1.145.0-cluster
volumes:
- strgdata-1:/storage
command:
- "--storageDataPath=/storage"
restart: always
vmstorage-2:
image: victoriametrics/vmstorage:v1.144.0-cluster
image: victoriametrics/vmstorage:v1.145.0-cluster
volumes:
- strgdata-2:/storage
command:
@@ -59,7 +59,7 @@ services:
# vminsert is ingestion frontend. It receives metrics pushed by vmagent,
# pre-process them and distributes across configured vmstorage shards.
vminsert-1:
image: victoriametrics/vminsert:v1.144.0-cluster
image: victoriametrics/vminsert:v1.145.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -68,7 +68,7 @@ services:
- "--storageNode=vmstorage-2:8400"
restart: always
vminsert-2:
image: victoriametrics/vminsert:v1.144.0-cluster
image: victoriametrics/vminsert:v1.145.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -80,7 +80,7 @@ services:
# vmselect is a query fronted. It serves read queries in MetricsQL or PromQL.
# vmselect collects results from configured `--storageNode` shards.
vmselect-1:
image: victoriametrics/vmselect:v1.144.0-cluster
image: victoriametrics/vmselect:v1.145.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -90,7 +90,7 @@ services:
- "--vmalert.proxyURL=http://vmalert:8880"
restart: always
vmselect-2:
image: victoriametrics/vmselect:v1.144.0-cluster
image: victoriametrics/vmselect:v1.145.0-cluster
depends_on:
- "vmstorage-1"
- "vmstorage-2"
@@ -105,7 +105,7 @@ services:
# read requests from Grafana, vmui, vmalert among vmselects.
# It can be used as an authentication proxy.
vmauth:
image: victoriametrics/vmauth:v1.144.0
image: victoriametrics/vmauth:v1.145.0
depends_on:
- "vmselect-1"
- "vmselect-2"
@@ -119,7 +119,7 @@ services:
# vmalert executes alerting and recording rules
vmalert:
image: victoriametrics/vmalert:v1.144.0
image: victoriametrics/vmalert:v1.145.0
depends_on:
- "vmauth"
ports:

View File

@@ -3,7 +3,7 @@ services:
# It scrapes targets defined in --promscrape.config
# And forward them to --remoteWrite.url
vmagent:
image: victoriametrics/vmagent:v1.144.0
image: victoriametrics/vmagent:v1.145.0
depends_on:
- "victoriametrics"
ports:
@@ -18,7 +18,7 @@ services:
# VictoriaMetrics instance, a single process responsible for
# storing metrics and serve read requests.
victoriametrics:
image: victoriametrics/victoria-metrics:v1.144.0
image: victoriametrics/victoria-metrics:v1.145.0
ports:
- 8428:8428
- 8089:8089
@@ -59,7 +59,7 @@ services:
# vmalert executes alerting and recording rules
vmalert:
image: victoriametrics/vmalert:v1.144.0
image: victoriametrics/vmalert:v1.145.0
depends_on:
- "victoriametrics"
- "alertmanager"

View File

@@ -64,6 +64,18 @@ groups:
group \"{{ $labels.group }}\". See https://docs.victoriametrics.com/victoriametrics/vmalert/#groups.
If rule expressions are taking longer than expected, please see https://docs.victoriametrics.com/victoriametrics/troubleshooting/#slow-queries."
- alert: GroupIterationReset
expr: increase(vmalert_iteration_reset_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Evaluation iteration for group {{ $labels.group }} in file {{ $labels.file }} is reset"
description: "Evaluation iteration for group \"{{ $labels.group }}\" in file \"{{ $labels.file }}\" is reset on vmalert instance {{ $labels.instance }}.
This can be caused by irregular delays during evaluation or by the system wall clock being moved backward. If it is caused by host clock changes, vmalert could
generate duplicate results for the group rules since some evaluations could be repeated. Check host clock time synchronization configurations if this happens frequently."
- alert: RemoteWriteErrors
expr: increase(vmalert_remotewrite_errors_total[5m]) > 0
for: 15m
@@ -108,4 +120,3 @@ groups:
summary: "vmalert instance {{ $labels.instance }} is failing to send notifications to Alertmanager"
description: "vmalert instance {{ $labels.instance }} is failing to send alert notifications to \"{{ $labels.addr }}\".
Check vmalert's logs for detailed error message."

View File

@@ -1,6 +1,6 @@
services:
vmagent:
image: victoriametrics/vmagent:v1.144.0
image: victoriametrics/vmagent:v1.145.0
depends_on:
- "victoriametrics"
ports:
@@ -14,7 +14,7 @@ services:
restart: always
victoriametrics:
image: victoriametrics/victoria-metrics:v1.144.0
image: victoriametrics/victoria-metrics:v1.145.0
ports:
- 8428:8428
volumes:
@@ -40,7 +40,7 @@ services:
restart: always
vmalert:
image: victoriametrics/vmalert:v1.144.0
image: victoriametrics/vmalert:v1.145.0
depends_on:
- "victoriametrics"
ports:

View File

@@ -271,6 +271,8 @@ endif
(cd /tmp/vm-opensource && ./bin/vmctl remote-read -help > /tmp/vmctl_remote-read_flags_tmp.md)
(cd /tmp/vm-opensource && ./bin/vmctl prometheus -help > /tmp/vmctl_prometheus_flags_tmp.md)
(cd /tmp/vm-opensource && ./bin/vmctl vm-native -help > /tmp/vmctl_vm-native_flags_tmp.md)
(cd /tmp/vm-opensource && ./bin/vmctl thanos -help > /tmp/vmctl_thanos_flags_tmp.md)
(cd /tmp/vm-opensource && ./bin/vmctl mimir -help > /tmp/vmctl_mimir_flags_tmp.md)
echo "$$FLAGS_HEADER" > docs/victoriametrics/vmctl/vmctl_flags.md && \
cat /tmp/vmctl_flags_tmp.md >> docs/victoriametrics/vmctl/vmctl_flags.md && \
@@ -296,6 +298,14 @@ endif
cat /tmp/vmctl_vm-native_flags_tmp.md >> docs/victoriametrics/vmctl/vmctl_vm-native_flags.md && \
printf '```\n' >> docs/victoriametrics/vmctl/vmctl_vm-native_flags.md
echo "$$FLAGS_HEADER" > docs/victoriametrics/vmctl/vmctl_thanos_flags.md && \
cat /tmp/vmctl_thanos_flags_tmp.md >> docs/victoriametrics/vmctl/vmctl_thanos_flags.md && \
printf '```\n' >> docs/victoriametrics/vmctl/vmctl_thanos_flags.md
echo "$$FLAGS_HEADER" > docs/victoriametrics/vmctl/vmctl_mimir_flags.md && \
cat /tmp/vmctl_mimir_flags_tmp.md >> docs/victoriametrics/vmctl/vmctl_mimir_flags.md && \
printf '```\n' >> docs/victoriametrics/vmctl/vmctl_mimir_flags.md
# remove Total time line from all vmctl flag files to reduce diffs noise
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_flags.md
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_opentsdb_flags.md
@@ -303,6 +313,8 @@ endif
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_remote-read_flags.md
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_prometheus_flags.md
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_vm-native_flags.md
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_thanos_flags.md
sed -i '/Total time:/d' docs/victoriametrics/vmctl/vmctl_mimir_flags.md
# remove Version line and the actual version line from vmctl_flags.md to reduce diffs noise
sed -i '/^VERSION:/,+1d' docs/victoriametrics/vmctl/vmctl_flags.md

View File

@@ -130,7 +130,7 @@ Released: 2025-11-05
## v1.27.0
Released: 2025-10-31
- FEATURE: Added runtime state compatibility guard for [stateful](https://docs.victoriametrics.com/anomaly-detection/components/settings/#restore-state) deployments. The service now persists normalized versions, evaluates an [upgrade/downgrade compatibility matrix](https://docs.victoriametrics.com/anomaly-detection/migration/#compatibility-matrix), and selectively drops or reuses DB records and on-disk artifacts to keep migrations safe and automatic. Please refer to the [migration page](https://docs.victoriametrics.com/anomaly-detection/migration/) for more details.
- FEATURE: Added runtime state compatibility guard for [stateful](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) deployments. The service now persists normalized versions, evaluates an [upgrade/downgrade compatibility matrix](https://docs.victoriametrics.com/anomaly-detection/migration/#compatibility-matrix), and selectively drops or reuses DB records and on-disk artifacts to keep migrations safe and automatic. Please refer to the [migration page](https://docs.victoriametrics.com/anomaly-detection/migration/) for more details.
- IMPROVEMENT: Parallelization now honours container cgroup CPU/RAM limits, so `settings.n_workers` in the [settings section](https://docs.victoriametrics.com/anomaly-detection/components/settings/#parallelization), internal routines and the `vmanomaly_available_memory_bytes`/`vmanomaly_cpu_cores_available` [startup metrics](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#startup-metrics) report or use container resources instead of host totals, keeping the [self-monitoring dashboard](https://docs.victoriametrics.com/anomaly-detection/self-monitoring/#grafana-dashboard) accurate.
@@ -164,7 +164,7 @@ Released: 2025-10-02
- FEATURE: Introduced vmui-like [UI](https://docs.victoriametrics.com/anomaly-detection/ui/) for `vmanomaly` service to simplify the configuration and backtesting of anomaly detection models before it goes to production. It provides an intuitive interface to finetune model configurations, visualize its predictions and anomaly scores, and perform backtesting on historical data. The UI is accessible via a web browser and can be run as a [standalone service](https://docs.victoriametrics.com/anomaly-detection/ui/#preset-usage) or [integrated with productionalized deployments](https://docs.victoriametrics.com/anomaly-detection/ui/#mixed-usage). For more details, refer to the [documentation](https://docs.victoriametrics.com/anomaly-detection/ui/).
- FEATURE: Added support for reading data from [VictoriaLogs stats queries](https://docs.victoriametrics.com/victorialogs/querying/#querying-log-range-stats) with `VLogsReader`. This reader allows querying and analyzing log data stored in VictoriaLogs, enabling anomaly detection on metrics generated from logs. It supports similar configuration options as `VmReader`, including `datasource_url`, `tenant_id`, `queries`, etc. For more details, refer to the [documentation](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vlogs-reader). It can be also used in [UI mode](https://docs.victoriametrics.com/anomaly-detection/ui/) for backtesting log-based anomaly detection configurations.
- FEATURE: Added support for reading data from [VictoriaLogs stats queries](https://docs.victoriametrics.com/victorialogs/querying/#querying-log-range-stats) with `VLogsReader`. This reader allows querying and analyzing log data stored in VictoriaLogs, enabling anomaly detection on metrics generated from logs. It supports similar configuration options as `VmReader`, including `datasource_url`, `tenant_id`, `queries`, etc. For more details, refer to the [documentation](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader). It can be also used in [UI mode](https://docs.victoriametrics.com/anomaly-detection/ui/) for backtesting log-based anomaly detection configurations.
- IMPROVEMENT: Resolved the case in the [`IsolationForestModel`](https://docs.victoriametrics.com/anomaly-detection/components/models/#isolation-forest-multivariate) with `provide_series` common model [argument](https://docs.victoriametrics.com/anomaly-detection/components/models/#provide-series) including `yhat.*` series (prediction and confidence boundaries), which are not produced by this model. Now config validation will fail with a clear error message if such series names are requested.
@@ -190,7 +190,7 @@ Released: 2025-08-19
## v1.25.2
Released: 2025-07-30
- BUGFIX: Resolved inconsistent state between in-memory models and state database (if [stateful mode](https://docs.victoriametrics.com/anomaly-detection/components/settings/#stateful-mode) is enabled). This bug caused `Model instance not found` warnings during inference calls and prevented proper cleanup of stale models from disk. The fix also prevents state updates when operations are terminated mid-execution of scheduled fit/infer jobs.
- BUGFIX: Resolved inconsistent state between in-memory models and state database (if [stateful mode](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) is enabled). This bug caused `Model instance not found` warnings during inference calls and prevented proper cleanup of stale models from disk. The fix also prevents state updates when operations are terminated mid-execution of scheduled fit/infer jobs.
- BUGFIX: Added explicit handling for inference calls on models that were deleted from disk by the time of their usage, but still referenced in the state database, preventing `'NoneType' object has no attribute 'infer'` rows in logs. Now a warning is logged and the inference call is skipped, which is expected behavior for deleted models.
@@ -210,7 +210,7 @@ Released: 2025-07-24
- BUGFIX: Prevented `OneOffScheduler` and `BacktestingScheduler` [schedulers](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/) from receiving no data (when [state restoration](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) is enabled). Now a warning is logged and such scheduler types are implicitly used without state restoration, which is expected behavior for these one-time-job schedulers.
- BUGFIX: Now the paths to artifact database (if [stateful mode](https://docs.victoriametrics.com/anomaly-detection/components/settings/#stateful-mode) is enabled) are properly resolved to absolute, preventing errors at initialization time (like `sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file`) or warnings (like `SAWarning: fully NULL primary key identity cannot load any object.`).
- BUGFIX: Now the paths to artifact database (if [stateful mode](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) is enabled) are properly resolved to absolute, preventing errors at initialization time (like `sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file`) or warnings (like `SAWarning: fully NULL primary key identity cannot load any object.`).
## v1.25.0
Released: 2025-07-17
@@ -273,7 +273,7 @@ Released: 2025-06-05
- FEATURE: Added `decay` [argument](https://docs.victoriametrics.com/anomaly-detection/components/models/#decay) to [online models](https://docs.victoriametrics.com/anomaly-detection/components/models/#online-models). This parameters allows for newer data to be weighted more heavily in online models. By default this is set to 1 which means all data points are weighted the same to maintain backward compatibility with existing configs. The closer this value is to 0 the more important new data is.
- IMPROVEMENT: **Restored back parallelization** in the read/fit/infer pipeline, previously disabled in [v1.22.0](#v1220-experimental) due to deadlock issues. The new implementation prevents deadlocks, allowing to control the parallelization level via `n_workers` in [settings section](https://docs.victoriametrics.com/anomaly-detection/components/settings/). It's suggested to upgrade from [v1.22.0](#v1220) - [v1.22.1](#v1221) to this version to regain the performance benefits of parallel processing.
- IMPROVEMENT: **Restored back parallelization** in the read/fit/infer pipeline, previously disabled in [v1.22.0](#v1220-experimental) due to deadlock issues. The new implementation prevents deadlocks, allowing to control the parallelization level via `n_workers` in [settings section](https://docs.victoriametrics.com/anomaly-detection/components/settings/). It's suggested to upgrade from [v1.22.0](#v1220-experimental) - [v1.22.1](#v1221) to this version to regain the performance benefits of parallel processing.
- IMPROVEMENT: Added `--dryRun` [argument](https://docs.victoriametrics.com/anomaly-detection/quickstart/#command-line-arguments) to `vmanomaly` to enable dry run mode. This mode allows to validate configuration without executing any actual operations and doesn't require a license. It is particularly useful to test the configurations before deploying them in a production environment.
@@ -559,7 +559,7 @@ Released: 2024-08-10
- **Lowest anomaly scores** (=0) when the *model's predictions (`yhat`) fall outside the expected range*, signaling uncertain predictions.
- For more details, please refer to the [documentation](https://docs.victoriametrics.com/anomaly-detection/components/reader/#per-query-parameters).
- IMPROVEMENT: Added `latency_offset` argument to the [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) to override the default `-search.latencyOffset` [flag of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/#list-of-command-line-flags) (30s). The default value is set to 1ms, which should help in cases where `sampling_frequency` is low (10-60s) and `sampling_frequency` equals `infer_every` in the [PeriodicScheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler). This prevents users from receiving `service - WARNING - [Scheduler [scheduler_alias]] No data available for inference.` warnings in logs and allows for consecutive `infer` calls without gaps. To restore the backward compatible behavior, set it equal to your `-search.latencyOffset` value in [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) config section.
- IMPROVEMENT: Added `latency_offset` argument to the [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) to override the default `-search.latencyOffset` [flag of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/#list-of-command-line-flags) (30s). The default value is set to 1ms, which should help in cases where `sampling_period` is low (10-60s) and `sampling_period` equals `infer_every` in the [PeriodicScheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler). This prevents users from receiving `service - WARNING - [Scheduler [scheduler_alias]] No data available for inference.` warnings in logs and allows for consecutive `infer` calls without gaps. To restore the backward compatible behavior, set it equal to your `-search.latencyOffset` value in [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) config section.
- BUGFIX: Ensure the `use_transform` argument of the [`OnlineQuantileModel`](https://docs.victoriametrics.com/anomaly-detection/components/models/#online-seasonal-quantile) functions as intended.
- BUGFIX: Add a docstring for `query_from_last_seen_timestamp` arg of [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader).

View File

@@ -53,7 +53,7 @@ Please see example graph illustrating this logic below:
**VictoriaMetrics (metrics):** use full [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/) for selection, sampling, and processing; [global filters](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#prometheus-querying-api-enhancements) are also supported. See the [VmReader](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader) for the details.
**VictoriaLogs (logs → metrics):** {{% available_from "v1.26.0" anomaly %}} use [LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/) via the [`VLogsReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vlogs-reader) to create log-derived or traces-derived metrics for anomaly detection (e.g., error rates, request latencies, error spans count).
**VictoriaLogs (logs → metrics):** {{% available_from "v1.26.0" anomaly %}} use [LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/) via the [`VLogsReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader) to create log-derived or traces-derived metrics for anomaly detection (e.g., error rates, request latencies, error spans count).
> [!NOTE]
> Please note that only LogsQL queries with [stats pipe](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe) functions [subset](https://docs.victoriametrics.com/anomaly-detection/components/reader/#valid-stats-functions) are supported, as they produce **numeric** time series.
@@ -281,7 +281,7 @@ reader:
datasource_url: 'some_url_to_read_data_from'
queries:
query_alias1: 'some_metricsql_query'
sampling_frequency: '1m' # change to whatever you need in data granularity
sampling_period: '1m' # change to whatever you need in data granularity
# other params if needed
# https://docs.victoriametrics.com/anomaly-detection/components/reader/#vm-reader
@@ -294,7 +294,7 @@ writer:
# https://docs.victoriametrics.com/anomaly-detection/components/monitoring/
```
Configuration above will produce N intervals of full length (`fit_window`=14d + `fit_every`=1h) until `to_iso` timestamp is reached to run N consecutive `fit` calls to train models; Then these models will be used to produce `M = [fit_every / sampling_frequency]` infer datapoints for `fit_every` range at the end of each such interval, imitating M consecutive calls of `infer_every` in `PeriodicScheduler` [config](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler). These datapoints then will be written back to VictoriaMetrics TSDB, defined in `writer` [section](https://docs.victoriametrics.com/anomaly-detection/components/writer/#vm-writer) for further visualization (i.e. in VMUI or Grafana)
Configuration above will produce N intervals of full length (`fit_window`=14d + `fit_every`=1h) until `to_iso` timestamp is reached to run N consecutive `fit` calls to train models; Then these models will be used to produce `M = [fit_every / sampling_period]` infer datapoints for `fit_every` range at the end of each such interval, imitating M consecutive calls of `infer_every` in `PeriodicScheduler` [config](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler). These datapoints then will be written back to VictoriaMetrics TSDB, defined in `writer` [section](https://docs.victoriametrics.com/anomaly-detection/components/writer/#vm-writer) for further visualization (i.e. in VMUI or Grafana)
## Forecasting
@@ -499,7 +499,7 @@ schedulers:
models:
zscore_example:
class: 'zscore_online'
min_n_samples_seen: 120 # i.e. minimal relevant seasonality or (initial) fit_window / sampling_frequency
min_n_samples_seen: 120 # i.e. minimal relevant seasonality or (initial) fit_window / sampling_period
decay: 0.999 # decay factor to control how fast the model adapts to new data, the lower, the faster it adapts
schedulers: ['periodic']
# other model params ...

View File

@@ -406,10 +406,10 @@ For optimal service behavior, consider the following tweaks when configuring `vm
**Reader**:
- Setup the datasource to read data from in the [reader](https://docs.victoriametrics.com/anomaly-detection/components/reader/) section. Include tenant ID if using a [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) (`multitenant` value {{% available_from "v1.16.2" anomaly %}} can be also used here).
- Define queries for input data using [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/) under `reader.queries` section. Note, it's possible to override reader-level arguments at query level for increased flexibility, e.g. specifying per-query [timezone](https://docs.victoriametrics.com/anomaly-detection/faq/#handling-timezones) or [sampling period](https://docs.victoriametrics.com/anomaly-detection/components/reader/#sampling-period).
- Define queries for input data using [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/) under `reader.queries` section. Note, it's possible to override reader-level arguments at query level for increased flexibility, e.g. specifying per-query [timezone](https://docs.victoriametrics.com/anomaly-detection/faq/#handling-timezones) or [sampling period](https://docs.victoriametrics.com/anomaly-detection/components/reader/#config-parameters).
- For longer `fit_window` intervals in scheduler, consider splitting queries into smaller time ranges to avoid excessive memory usage, timeouts and hitting server-side constraints, so they can be queried separately and reconstructed on `vmanomaly` side. Please refer to this [example](https://docs.victoriametrics.com/anomaly-detection/faq/#handling-large-queries-in-vmanomaly) for more details.
> If applicable - consider [`VLogsReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vlogs-reader) {{% available_from "v1.26.0" anomaly %}} to perform anomaly detection on **log-derived metrics**. This is particularly useful for scenarios where log data needs to be analyzed for unusual patterns or behaviors, such as error rates or request latencies.
> If applicable - consider [`VLogsReader`](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader) {{% available_from "v1.26.0" anomaly %}} to perform anomaly detection on **log-derived metrics**. This is particularly useful for scenarios where log data needs to be analyzed for unusual patterns or behaviors, such as error rates or request latencies.
**Writer**:
- Specify where and how to store anomaly detection metrics in the [writer](https://docs.victoriametrics.com/anomaly-detection/components/writer/) section.

View File

@@ -694,7 +694,7 @@ vmanomaly version: [v1.29.1](https://docs.victoriametrics.com/anomaly-detection/
- BUGFIX: Now Visualization Panel correctly switches in between "query" and "detect" modes when respective buttons are hit in the [Visualization Panel](#visualization-panel), without showing stale results from the previous mode, once running anomaly detection task is explicitly cancelled (regression introduced in [v1.5.0](#v150)).
- BUGFIX: Fixed an issue with [crypto.randomUUID](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID) introduced in [v1.29.0](#v1290) in [UI copilot](https://docs.victoriametrics.com/anomaly-detection/ui/#ai-assistance) that led to the front app showing a blank page.
- BUGFIX: Fixed an issue with [crypto.randomUUID](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID) introduced in [v1.29.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1290) in [UI copilot](https://docs.victoriametrics.com/anomaly-detection/ui/#ai-assistance) that led to the front app showing a blank page.
### v1.5.0
Released: 2026-03-05

View File

@@ -41,7 +41,7 @@ settings:
restore_state: True # restore state from previous run, if available
retention: # how long to keep stale models on disk/in memory
ttl: "1d" # time-to-live duration, if the model was not used for inference within this duration, it will be considered stale
check_every: "1h" # how often to check for stale models and remove them
check_interval: "1h" # how often to check for stale models and remove them
# how and when to run the models is defined by schedulers
# https://docs.victoriametrics.com/anomaly-detection/components/scheduler/
@@ -143,15 +143,15 @@ server:
> This feature is better used in conjunction with [stateful service](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) to preserve the state of the models and schedulers between restarts and reuse what can be reused, thus avoiding unnecessary re-training of models, re-initialization of schedulers and re-reading of data.
{{% available_from "v1.25.0" anomaly %}} Service supports hot reload of configuration files, which allows for automatic reloading of configurations on config files change filesystem events without the need of explicit service restart. This can be enabled via the `--watch` [CLI argument](https://docs.victoriametrics.com/anomaly-detection/quickstart/#command-line-arguments). `vmanomaly_hot_reload_enabled` flag in [self-monitoring metrics](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#startup-metrics) will be set to 1 (if enabled) or 0 (if disabled).
{{% available_from "v1.25.0" anomaly %}} Service supports hot reload of configuration files, which allows for automatic reloading of configurations on config files change filesystem events without the need of explicit service restart. This can be enabled via the `--watch` [CLI argument](https://docs.victoriametrics.com/anomaly-detection/quickstart/#command-line-arguments). `vmanomaly_config_reload_enabled` flag in [self-monitoring metrics](https://docs.victoriametrics.com/anomaly-detection/components/monitoring/#startup-metrics) will be set to 1 (if enabled) or 0 (if disabled).
### How it works
It works by watching for file system events, such as modifications, creations, or deletions of `.yml|.yaml` files in the specified directories. When a change is detected, the service will attempt to reload the configuration files, rebuild the [global config](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#global-config) and reinitialize the components. If the reload is successful, the `vmanomaly_hot_reload_events_total` metric will be incremented for `status="success"` label, otherwise it will be incremented with `status="failure"` label and a respective error message on config validation failure(s) will be logged.
It works by watching for file system events, such as modifications, creations, or deletions of `.yml|.yaml` files in the specified directories. When a change is detected, the service will attempt to reload the configuration files, rebuild the [global config](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#global-configuration) and reinitialize the components. If the reload is successful, the `vmanomaly_config_reloads_total` metric will be incremented for `status="success"` label, otherwise it will be incremented with `status="failure"` label and a respective error message on config validation failure(s) will be logged.
> If the reload fails, the service will log an error message indicating the reason for the failure, and the **previous configuration will remain active until a successful reload occurs** to preserve the service's stability. This means that if there are errors in the new configuration, the service will continue to operate with the last valid configuration until the issues are resolved.
If used on [sharded setup](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#horizontal-scalability), upon [global config](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#global-config) change, all shards will be reinitialized with the new configurations.
If used on [sharded setup](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#horizontal-scalability), upon [global config](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#global-configuration) change, all shards will be reinitialized with the new configurations.
> Please note, that even if [state restoration](https://docs.victoriametrics.com/anomaly-detection/components/settings/#state-restoration) is enabled, the models, queries and schedulers might "migrate" to new shards if the order or the amount of [sub-configs](https://docs.victoriametrics.com/anomaly-detection/scaling-vmanomaly/#sub-configuration) changes after new config is hot-reloaded, so the state restoration won't be **fully** efficient in this case.
@@ -219,7 +219,7 @@ reader:
# ... (rest of the config remains unchanged)
```
After saving the changes, hot reload will automatically detect the changes in `config.yaml` and attempt to reload the configuration. As the changes are valid, the service will log a success message and increment the `vmanomaly_hot_reload_events_total` metric with `status="success"` label:
After saving the changes, hot reload will automatically detect the changes in `config.yaml` and attempt to reload the configuration. As the changes are valid, the service will log a success message and increment the `vmanomaly_config_reloads_total` metric with `status="success"` label:
- All the model instances of class `zscore_online`, that were trained on `host_network_receive_errors` can be reused as they are still valid and "fresh" for making inference on new datapoints until the next `fit_every` happens.
- All the model instances of class `zscore_online`, that were trained on `cpu_seconds_total` will be re-trained with the new query expression and frequency, as old model instances are not valid anymore.

View File

@@ -41,7 +41,7 @@ models:
# ...
```
Old-style configs (< [1.10.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#1100))
Old-style configs (< [1.10.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1100))
```yaml
model:
@@ -66,7 +66,7 @@ models:
## Common args
From [1.10.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#1100), **common args**, supported by *every model (and model type)* were introduced.
From [1.10.0](https://docs.victoriametrics.com/anomaly-detection/changelog/#v1100), **common args**, supported by *every model (and model type)* were introduced.
### Queries

View File

@@ -369,7 +369,7 @@ If True, then query will be performed from the last seen timestamp for a given s
`1ms`
</td>
<td>
It allows overriding the default `-search.latencyOffset`{{% available_from "v1.15.1" anomaly %}} [flag of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/#list-of-command-line-flags) (30s). The default value is set to 1ms, which should help in cases where `sampling_frequency` is low (10-60s) and `sampling_frequency` equals `infer_every` in the [PeriodicScheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler). This prevents users from receiving `service - WARNING - [Scheduler [scheduler_alias]] No data available for inference.` warnings in logs and allows for consecutive `infer` calls without gaps. To restore the old behavior, set it equal to your `-search.latencyOffset` [flag value](https://docs.victoriametrics.com/victoriametrics/#list-of-command-line-flags).
It allows overriding the default `-search.latencyOffset`{{% available_from "v1.15.1" anomaly %}} [flag of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/#list-of-command-line-flags) (30s). The default value is set to 1ms, which should help in cases where `sampling_period` is low (10-60s) and `sampling_period` equals `infer_every` in the [PeriodicScheduler](https://docs.victoriametrics.com/anomaly-detection/components/scheduler/#periodic-scheduler). This prevents users from receiving `service - WARNING - [Scheduler [scheduler_alias]] No data available for inference.` warnings in logs and allows for consecutive `infer` calls without gaps. To restore the old behavior, set it equal to your `-search.latencyOffset` [flag value](https://docs.victoriametrics.com/victoriametrics/#list-of-command-line-flags).
</td>
</tr>
<tr>
@@ -760,7 +760,7 @@ Frequency of the points returned. Will be converted to `/select/stats_query_rang
`10000`
</td>
<td>
(Optional) For splitting long `fit_window` [queries](https://docs.victoriametrics.com/anomaly-detection/components/reader/#vlogs-reader) into smaller sub-intervals. This helps users avoid hitting the timeout limits for individual queries by distributing initial query across multiple subquery requests with minimal overhead. Can be also set on [per-query](#per-query-parameters-1) basis to override reader-level settings.
(Optional) For splitting long `fit_window` [queries](https://docs.victoriametrics.com/anomaly-detection/components/reader/#victorialogs-reader) into smaller sub-intervals. This helps users avoid hitting the timeout limits for individual queries by distributing initial query across multiple subquery requests with minimal overhead. Can be also set on [per-query](#per-query-parameters-1) basis to override reader-level settings.
</td>
</tr>
<tr>

View File

@@ -240,23 +240,23 @@ vmagent will write data into VictoriaMetrics single-node and cluster (with tenan
# compose.yaml
services:
vmsingle:
image: victoriametrics/victoria-metrics:v1.144.0
image: victoriametrics/victoria-metrics:v1.145.0
vmstorage:
image: victoriametrics/vmstorage:v1.144.0-cluster
image: victoriametrics/vmstorage:v1.145.0-cluster
vminsert:
image: victoriametrics/vminsert:v1.144.0-cluster
image: victoriametrics/vminsert:v1.145.0-cluster
command:
- -storageNode=vmstorage:8400
vmselect:
image: victoriametrics/vmselect:v1.144.0-cluster
image: victoriametrics/vmselect:v1.145.0-cluster
command:
- -storageNode=vmstorage:8401
vmagent:
image: victoriametrics/vmagent:v1.144.0
image: victoriametrics/vmagent:v1.145.0
volumes:
- ./scrape.yaml:/etc/vmagent/config.yaml
command:
@@ -308,7 +308,7 @@ Now add the vmauth service to `compose.yaml`:
# compose.yaml
services:
vmauth:
image: docker.io/victoriametrics/vmauth:v1.144.0
image: docker.io/victoriametrics/vmauth:v1.145.0
ports:
- 8427:8427
volumes:

View File

@@ -155,15 +155,15 @@ These services will store and query the metrics scraped by vmagent.
# compose.yaml
services:
vmstorage:
image: victoriametrics/vmstorage:v1.144.0-cluster
image: victoriametrics/vmstorage:v1.145.0-cluster
vminsert:
image: victoriametrics/vminsert:v1.144.0-cluster
image: victoriametrics/vminsert:v1.145.0-cluster
command:
- -storageNode=vmstorage:8400
vmselect:
image: victoriametrics/vmselect:v1.144.0-cluster
image: victoriametrics/vmselect:v1.145.0-cluster
command:
- -storageNode=vmstorage:8401
ports:
@@ -196,7 +196,7 @@ Add the vmauth service to `compose.yaml`:
# compose.yaml
services:
vmauth:
image: victoriametrics/vmauth:v1.144.0-enterprise
image: victoriametrics/vmauth:v1.145.0-enterprise
ports:
- 8427:8427
volumes:
@@ -251,7 +251,7 @@ Add the vmagent service to `compose.yaml` with OAuth2 configuration:
# compose.yaml
services:
vmagent:
image: victoriametrics/vmagent:v1.144.0
image: victoriametrics/vmagent:v1.145.0
volumes:
- ./scrape.yaml:/etc/vmagent/config.yaml
command:

View File

@@ -107,7 +107,7 @@ The final piece is the Docker Compose file. This ties all the services together
# compose.yml
services:
victoriametrics:
image: victoriametrics/victoria-metrics:v1.144.0
image: victoriametrics/victoria-metrics:v1.145.0
command:
- "--storageDataPath=/victoria-metrics-data"
- "--selfScrapeInterval=10s"
@@ -128,7 +128,7 @@ services:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
vmalert:
image: victoriametrics/vmalert:v1.144.0
image: victoriametrics/vmalert:v1.145.0
depends_on:
- victoriametrics
- alertmanager

View File

@@ -28,7 +28,7 @@ If you like VictoriaMetrics and want to contribute, then it would be great:
## Issues
When making a new issue, make sure to create no duplicates. Use GitHub search to find whether similar issues exist already.
The new issue should be written in English and contain concise description of the problem and environment where it exists.
The new issue should be written in English and contain a concise description of the problem and the environment where it exists.
We'd very much prefer to have a specific use-case included in the description, since it could have workaround or alternative solutions.
When looking for an issue to contribute, always prefer working on [bugs](https://github.com/VictoriaMetrics/VictoriaMetrics/issues?q=is%3Aopen+is%3Aissue+label%3Abug)
@@ -48,7 +48,7 @@ We use [labels](https://docs.github.com/en/issues/using-labels-and-milestones-to
1. `need more info`, assigned to issues that require elaboration from the issue creator.
For example, if we weren't able to reproduce the reported bug based on the ticket description then we ask additional
questions which could help to reproduce the issue and add `need more info` label. This label helps other maintainers
to understand that this issue wasn't forgotten but waits for the feedback from user.
to understand that this issue wasn't forgotten but waits for the feedback from the user.
1. `completed`, assigned to issues that required code changes and those changes were merged to upstream, but not released yet.
Once a release is made, maintainers go through all labeled issues, leave a comment about the new release, and close the issue.
1. `vmui`, assigned to issues related to [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) or [VictoriaLogs webui](https://docs.victoriametrics.com/victorialogs/querying/#web-ui)
@@ -63,32 +63,31 @@ Pull requests requirements:
1. Don't use `master` branch for making PRs, as it makes it impossible for reviewers to modify the changes.
1. All commits need to be [signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits).
1. Pull request title should be prefixed with `<dir>/<component>:` to show what component has been changed, i.e. `app/vmalert: fix...`.
Pull request description should contain clear and concise description of what was done, why it is needed and for what purpose.
Pull request description should contain a clear and concise description of what was done, why it is needed and for what purpose.
Use clear language, so reviewers can quickly understand the change and its impact.
1. A link to the issue(s) related to the change, if any. Use `Fixes [issue link]` if the PR resolves the issue, or `Related to [issue link]` for reference.
1. Tests proving that the change is effective. Tests are expected for non-trivial new functionality or non-trivial modifications.
Bug fixes must include tests unless a maintainer explicitly agrees otherwise.
See [this style guide](https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e) for tests.
To run tests and code checks locally, execute commands `make test-full` and `make check-all`.
See [this style guide](https://itnext.io/f-tests-as-a-replacement-for-table-driven-tests-in-go-8814a8b19e9e) for tests. See [this section](#testing) for how to run tests.
1. Try to not extend the scope of the pull requests outside the issue, do not make unrelated changes.
1. Update [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/docs) if needed. For example, adding a new flag or changing behavior of existing flags or features
1. Update [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/docs) if needed. For example, adding a new flag or changing the behavior of existing flags or features
requires reflecting these changes in the documentation. For new features add `{{%/* available_from "#" */%}}` shortcode to the documentation.
It will be later automatically replaced with an actual release version.
1. A line in the [changelog](https://docs.victoriametrics.com/victoriametrics/changelog/#tip) mentioning the change and related issue in a way
that would be clear to other readers even if they don't have the full context.
1. Avoid modifying code in the `/vendor` folder manually, even when the vendored package originates are from the VictoriaMetrics GitHub organization.
1. Avoid modifying code in the `/vendor` folder manually, even when the vendored package originates from the VictoriaMetrics GitHub organization.
For instance, VictoriaLogs vendors packages under the `/lib` folder from VictoriaMetrics, and VictoriaTraces vendors the `/lib/logstorage` package from VictoriaLogs.
Submit a pull request to the upstream repository first. Afterward, a separate pull request can be opened to update the version of the vendored folder in downstream repository.
Submit a pull request to the upstream repository first. Afterward, a separate pull request can be opened to update the version of the vendored folder in the downstream repository.
* For common packages, the vendored package can be updated with this command: `go get <dependency>@vX.Y.Z`.
* For VictoriaMetrics packages, use `go get <dependency>@canonical_commit_hash`.
Finally, run `go mod tidy` and `go mod vendor` to update `go.mod`, `go.sum`, and `/vendor`.
1. Ping reviewers who you think have the best expertise on the matter.
See good example of a [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6487).
See a good example of a [pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6487).
## Merging Pull Request
The person who merges the Pull Request is responsible for satisfying requirements below:
The person who merges the Pull Request is responsible for satisfying the requirements below:
1. Make sure that PR satisfies [Pull Request checklist](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist),
it is approved by at least one reviewer, all CI checks are green.
@@ -97,9 +96,9 @@ The person who merges the Pull Request is responsible for satisfying requirement
1. If applicable, cherry-pick the change to [LTS release lines](https://docs.victoriametrics.com/victoriametrics/lts-releases/)
and mention in the PR comment what was or wasn't cherry-picked.
1. Update related issues with a meaningful message of what has changed and when it will be
released. _This helps users to understand the change without reading PR._
released. _This helps users to understand the change without reading the PR._
1. Add label `completed` to related issues.
1. Do not close related tickets until release is made. If ticket was auto-closed by GitHub or user - re-open it.
1. Do not close related tickets until the release is made. If the ticket was auto-closed by GitHub or a user - re-open it.
## KISS principle
@@ -115,9 +114,9 @@ We are open to third-party pull requests provided they follow [KISS design princ
- Minimize the number of moving parts in the distributed system.
- Avoid automated decisions, which may hurt cluster availability, consistency, performance or debuggability.
Adhering to `KISS` principle, simplifies the resulting code and architecture so it can be reviewed, understood and debugged by a wider audience.
Adhering to the `KISS` principle, simplifies the resulting code and architecture so it can be reviewed, understood and debugged by a wider audience.
Due to `KISS`, [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) has none of the following "features" popular in distributed computing world:
Due to `KISS`, [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/) has none of the following "features" popular in distributed computing:
- Fragile gossip protocols. See [failed attempt in Thanos](https://github.com/improbable-eng/thanos/blob/030bc345c12c446962225221795f4973848caab5/docs/proposals/completed/201809_gossip-removal.md).
- Hard-to-understand-and-implement-properly [Paxos protocols](https://www.quora.com/In-distributed-systems-what-is-a-simple-explanation-of-the-Paxos-algorithm).
@@ -126,3 +125,17 @@ Due to `KISS`, [cluster version of VictoriaMetrics](https://docs.victoriametrics
- Automatic cluster resizing, which may cost you a lot of money if improperly configured.
- Automatic discovering and addition of new nodes in the cluster, which may mix data between dev and prod clusters :)
- Automatic leader election, which may result in split brain disaster on network errors.
## Testing
We recommend running the following sequence of checks and tests before submitting a pull request:
```sh
# run static checks
make check-all
# run unit test
make test-full
# run integration tests
make apptest
```

View File

@@ -205,13 +205,15 @@ curl 'http://vmselect:8481/select/multitenant/prometheus/api/v1/query' \
The precedence for applying filters for tenants follows this order:
1. Filter tenants by `extra_label` and `extra_filters` filters.
1. Filter tenants by `extra_label`, `extra_filters` and `extra_filters[]` filters.
These filters have the highest priority and are applied first when provided through the query arguments.
Filters use `OR` logic - a tenant is selected if it matches any of the filters.
2. Filter tenants from labels selectors defined at metricsQL query expression.
> **Security considerations**
It is recommended restricting access to `multitenant` endpoints only to trusted sources,
since untrusted source may break per-tenant data by writing unwanted samples or get access to data of arbitrary tenants.
See also [vmauth security doc](https://docs.victoriametrics.com/victoriametrics/vmauth/#security).
## Binaries

View File

@@ -61,9 +61,9 @@ Download the newest available [VictoriaMetrics release](https://docs.victoriamet
from [DockerHub](https://hub.docker.com/r/victoriametrics/victoria-metrics) or [Quay](https://quay.io/repository/victoriametrics/victoria-metrics?tab=tags):
```sh
docker pull victoriametrics/victoria-metrics:v1.144.0
docker pull victoriametrics/victoria-metrics:v1.145.0
docker run -it --rm -v `pwd`/victoria-metrics-data:/victoria-metrics-data -p 8428:8428 \
victoriametrics/victoria-metrics:v1.144.0 --selfScrapeInterval=5s -storageDataPath=victoria-metrics-data
victoriametrics/victoria-metrics:v1.145.0 --selfScrapeInterval=5s -storageDataPath=victoria-metrics-data
```
_For Enterprise images, see [this link](https://docs.victoriametrics.com/victoriametrics/enterprise/#docker-images)._

View File

@@ -1136,6 +1136,8 @@ By default, the last point on the interval `[now - max_lookback ... now]` is scr
For instance, `/federate?match[]=up&max_lookback=1h` would return last points on the `[now - 1h ... now]` interval. This may be useful for time series federation
with scrape intervals exceeding `5m`.
VictoriaMetrics supports Prometheus v3.0 utf-8 content encoding with `Accept` header. If `Accept: allow-utf-8` HTTP header provided, `/federate` API response changes according to [Prometheus utf-8](https://prometheus.io/docs/guides/utf8/#querying) specification - `metric_name{tag="value"}` transforms into `{"metric_name","tag"="value"}`.
## Capacity planning
VictoriaMetrics uses lower amounts of CPU, RAM and storage space on production workloads compared to competing solutions (Prometheus, Thanos, Cortex, TimescaleDB, InfluxDB, QuestDB, M3DB) according to [our case studies](https://docs.victoriametrics.com/victoriametrics/casestudies/).

View File

@@ -26,10 +26,29 @@ See also [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-rel
## tip
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): add `-opentelemetry.promoteAllResourceAttributes` and `-opentelemetry.promoteScopeMetadata` command-line flags to allow managing label promotion for resource attributes and OTel scope metadata. See [OpenTelemetry](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) docs and [#10931](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10931).
## [v1.145.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.145.0)
Released at 2026-06-08
* SECURITY: upgrade Go builder from Go1.26.3 to Go1.26.4. See [the list of issues addressed in Go1.26.4](https://github.com/golang/go/issues?q=milestone%3AGo1.26.4%20label%3ACherryPickApproved).
* FEATURE: [enterprise](https://docs.victoriametrics.com/enterprise/) [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): add the new metrics `vm_downsampling_partitions_scheduled_rows` and `vm_retention_filters_partitions_scheduled_rows` for measuring background historical data merge completion time. See [#10960](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10960)
* FEATURE: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): support `match[]=<label_selector>` query parameters in `/api/v1/rules` and `/api/v1/alerts` APIs to return only the rules that have configured labels satisfying the provided label selectors. See [11020](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11020).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vminsert` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): add `-opentelemetry.promoteAllResourceAttributes` and `-opentelemetry.promoteScopeMetadata` command-line flags to allow managing label promotion for resource attributes and OTel scope metadata. See [OpenTelemetry](https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/) docs and [#10931](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10931).
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent/) : introduce `vmagent_remotewrite_kafka_outbuf_latency_seconds` and `vmagent_remotewrite_kafka_rtt_seconds` metrics for [kafka integration](https://docs.victoriametrics.com/victoriametrics/integrations/kafka/). The metrics could help identify throughput bottlenecks. See [#10730](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10730).
* FEATURE: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): properly log user information when a missing route error occurs. See [#11052](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11052).
* FEATURE: [vmctl](https://docs.victoriametrics.com/vmctl/): add the ability to migrate data from [Mimir](https://docs.victoriametrics.com/victoriametrics/vmctl/mimir/#) object storage to VictoriaMetrics. See [#7717](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/7717).
* FEATURE: [dashboards](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/dashboards): show the full `version` label in the `Version` panel when `short_version` label is empty (e.g. custom builds from feature branch). Previously, the panel could appear empty. See [#11047](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11047).
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): fix the `Notifiers` page in web UI appearing blank despite the API returning notifier data correctly. See [#11035](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11035).
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): reset the group evaluation timestamp if it exceeds the current host time. Previously, vmalert could use future timestamps for evaluations if the system clock was shifted backward. See [#10985](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10985).
* BUGFIX: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): properly parse [Prometheus Native Histograms](https://prometheus.io/docs/specs/native_histograms/), previously Protobuf parser could produce unexpected `vmrange` labels. See [#11041](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11041).
* BUGFIX: [vmauth](https://docs.victoriametrics.com/victoriametrics/vmauth/): properly calculate number of loaded users to be printed in startup log. Previously, it was only accounting for static users and skipped JWT configuration entries. See [#11050](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11050/).
* BUGFIX: [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/): `integrate()` no longer extrapolates the last sample's value past the end of the time series. Previously, querying `integrate(metric[1h])` at a timestamp where the series had already ended would keep accruing area as if the last value continued indefinitely, producing values much larger than the true integral. See [#9474](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9474). Thanks to @wtfashwin for contribution.
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): avoid returning HTTP 503 for queries with partial results when a storage group is unavailable and `-search.denyPartialResponse` is disabled. See [#11009](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11009). Thanks to @fxrlv for the contribution.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly escape `utf-8` label names for [/federate](https://docs.victoriametrics.com/victoriametrics/#federation) API requests. See [#10968](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10968).
* BUGFIX: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): persist the `Disable deduplication` toggle under its own local storage key. Before this fix, the toggle state was lost after reload and could overwrite the `Compact view` table setting. See [#11004](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11004). Thanks to @immanuwell for the contribution.
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix intermittent `write: connection timed out` errors caused by silently dropped TCP connections being reused from the connection pool. See [#10735](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10735#issuecomment-4535832301).
## [v1.144.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.144.0)
@@ -245,6 +264,24 @@ It enables back `Discovered targets` debug UI by default.
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly apply `extra_filters[]` filter when querying `vm_account_id` or `vm_project_id` labels via [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) request for `/api/v1/label/…/values` API. Before, `extra_filters` was ignored. See [#10503](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/10503).
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): revert the use of rollup result cache for [instant queries](https://docs.victoriametrics.com/keyConcepts.html#instant-query) that contain [`rate`](https://docs.victoriametrics.com/MetricsQL.html#rate) function with a lookbehind window larger than `-search.minWindowForInstantRollupOptimization`. The cache usage was removed since [v1.132.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.132.0). See [#10098](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10098#issuecomment-3895011084) for more details.
## [v1.136.11](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.136.11)
Released at 2026-06-05
**v1.136.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
The v1.136.x line will be supported for at least 12 months since [v1.136.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11360) release**
* SECURITY: upgrade Go builder from Go1.26.3 to Go1.26.4. See [the list of issues addressed in Go1.26.4](https://github.com/golang/go/issues?q=milestone%3AGo1.26.4%20label%3ACherryPickApproved).
* BUGFIX: [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/): `integrate()` no longer extrapolates the last sample's value past the end of the time series. Previously, querying `integrate(metric[1h])` at a timestamp where the series had already ended would keep accruing area as if the last value continued indefinitely, producing values much larger than the true integral. See [#9474](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9474). Thanks to @wtfashwin for contribution.
* BUGFIX: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): persist the `Disable deduplication` toggle under its own local storage key. Before this fix, the toggle state was lost after reload and could overwrite the `Compact view` table setting. See [#11004](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/11004). Thanks to @immanuwell for the contribution.
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): fix the `Notifiers` page in web UI appearing blank despite the API returning notifier data correctly. See [#11035](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11035).
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): reset the group evaluation timestamp if it exceeds the current host time. Previously, vmalert could use future timestamps for evaluations if the system clock was shifted backward. See [#10985](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10985).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): avoid returning HTTP 503 for queries with partial results when a storage group is unavailable and `-search.denyPartialResponse` is disabled. See [#11009](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11009). Thanks to @fxrlv for the contribution.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly escape `utf-8` label names for [/federate](https://docs.victoriametrics.com/victoriametrics/#federation) API requests. See [#10968](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10968).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix intermittent `write: connection timed out` errors caused by silently dropped TCP connections being reused from the connection pool. See [#10735](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10735#issuecomment-4535832301).
## [v1.136.10](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.136.10)
Released at 2026-05-22
@@ -588,6 +625,22 @@ See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/ch
See changes [here](https://docs.victoriametrics.com/victoriametrics/changelog/changelog_2025/#v11230)
## [v1.122.24](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.122.24)
Released at 2026-06-05
**v1.122.x is a line of [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-releases/). It contains important up-to-date bugfixes for [VictoriaMetrics enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/).
All these fixes are also included in [the latest community release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest).
The v1.122.x line will be supported for at least 12 months since [v1.122.0](https://docs.victoriametrics.com/victoriametrics/changelog/#v11220) release**
* SECURITY: upgrade Go builder from Go1.26.3 to Go1.26.4. See [the list of issues addressed in Go1.26.4](https://github.com/golang/go/issues?q=milestone%3AGo1.26.4%20label%3ACherryPickApproved).
* BUGFIX: [MetricsQL](https://docs.victoriametrics.com/victoriametrics/metricsql/): `integrate()` no longer extrapolates the last sample's value past the end of the time series. Previously, querying `integrate(metric[1h])` at a timestamp where the series had already ended would keep accruing area as if the last value continued indefinitely, producing values much larger than the true integral. See [#9474](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9474). Thanks to @wtfashwin for contribution.
* BUGFIX: [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/): reset the group evaluation timestamp if it exceeds the current host time. Previously, vmalert could use future timestamps for evaluations if the system clock was shifted backward. See [#10985](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10985).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): avoid returning HTTP 503 for queries with partial results when a storage group is unavailable and `-search.denyPartialResponse` is disabled. See [#11009](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/11009). Thanks to @fxrlv for the contribution.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly escape `utf-8` label names for [/federate](https://docs.victoriametrics.com/victoriametrics/#federation) API requests. See [#10968](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10968).
* BUGFIX: `vmselect` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): fix intermittent `write: connection timed out` errors caused by silently dropped TCP connections being reused from the connection pool. See [#10735](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10735#issuecomment-4535832301).
## [v1.122.23](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.122.23)
Released at 2026-05-22

View File

@@ -121,7 +121,7 @@ It is allowed to run Enterprise components in [cases listed here](https://docs.v
Binary releases of Enterprise components are available at [the releases page for VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest),
[the releases page for VictoriaLogs](https://github.com/VictoriaMetrics/VictoriaLogs/releases/latest)
and [the releases page for VictoriaTraces](https://github.com/VictoriaMetrics/VictoriaTraces/releases/latest).
Enterprise binaries and packages have `enterprise` suffix in their names. For example, `victoria-metrics-linux-amd64-v1.144.0-enterprise.tar.gz`.
Enterprise binaries and packages have `enterprise` suffix in their names. For example, `victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz`.
In order to run binary release of Enterprise component, please download the `*-enterprise.tar.gz` archive for your OS and architecture
from the corresponding releases page and unpack it. Then run the unpacked binary.
@@ -139,8 +139,8 @@ For example, the following command runs VictoriaMetrics Enterprise binary with t
obtained at [this page](https://victoriametrics.com/products/enterprise/trial/):
```sh
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.144.0/victoria-metrics-linux-amd64-v1.144.0-enterprise.tar.gz
tar -xzf victoria-metrics-linux-amd64-v1.144.0-enterprise.tar.gz
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.145.0/victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz
tar -xzf victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz
./victoria-metrics-prod -license=BASE64_ENCODED_LICENSE_KEY
```
@@ -155,7 +155,7 @@ Alternatively, VictoriaMetrics Enterprise license can be stored in the file and
It is allowed to run Enterprise components in [cases listed here](https://docs.victoriametrics.com/victoriametrics/enterprise/#valid-cases-for-victoriametrics-enterprise).
Docker images for Enterprise components are available at [VictoriaMetrics Docker Hub](https://hub.docker.com/u/victoriametrics) and [VictoriaMetrics Quay](https://quay.io/organization/victoriametrics).
Enterprise docker images have `enterprise` suffix in their names. For example, `victoriametrics/victoria-metrics:v1.144.0-enterprise`.
Enterprise docker images have `enterprise` suffix in their names. For example, `victoriametrics/victoria-metrics:v1.145.0-enterprise`.
In order to run Docker image of VictoriaMetrics Enterprise component, it is required to provide the license key via the command-line
flag as described in the [binary-releases](https://docs.victoriametrics.com/victoriametrics/enterprise/#binary-releases) section.
@@ -165,13 +165,13 @@ Enterprise license key can be obtained at [this page](https://victoriametrics.co
For example, the following command runs VictoriaMetrics Enterprise Docker image with the specified license key:
```sh
docker run --name=victoria-metrics victoriametrics/victoria-metrics:v1.144.0-enterprise -license=BASE64_ENCODED_LICENSE_KEY
docker run --name=victoria-metrics victoriametrics/victoria-metrics:v1.145.0-enterprise -license=BASE64_ENCODED_LICENSE_KEY
```
Alternatively, the license code can be stored in the file and then referred via `-licenseFile` command-line flag:
```sh
docker run --name=victoria-metrics -v /vm-license:/vm-license victoriametrics/victoria-metrics:v1.144.0-enterprise -licenseFile=/path/to/vm-license
docker run --name=victoria-metrics -v /vm-license:/vm-license victoriametrics/victoria-metrics:v1.145.0-enterprise -licenseFile=/path/to/vm-license
```
Example docker-compose configuration:
@@ -181,7 +181,7 @@ version: "3.5"
services:
victoriametrics:
container_name: victoriametrics
image: victoriametrics/victoria-metrics:v1.144.0
image: victoriametrics/victoria-metrics:v1.145.0
ports:
- 8428:8428
volumes:
@@ -213,7 +213,7 @@ is used to provide the license key in plain-text:
```yaml
server:
image:
tag: v1.144.0-enterprise
tag: v1.145.0-enterprise
license:
key: {BASE64_ENCODED_LICENSE_KEY}
@@ -224,7 +224,7 @@ In order to provide the license key via existing secret, the following values fi
```yaml
server:
image:
tag: v1.144.0-enterprise
tag: v1.145.0-enterprise
license:
secret:
@@ -274,7 +274,7 @@ spec:
license:
key: {BASE64_ENCODED_LICENSE_KEY}
image:
tag: v1.144.0-enterprise
tag: v1.145.0-enterprise
```
In order to provide the license key via an existing secret, the following custom resource is used:
@@ -291,7 +291,7 @@ spec:
name: vm-license
key: license
image:
tag: v1.144.0-enterprise
tag: v1.145.0-enterprise
```
Example secret with license key:
@@ -342,7 +342,7 @@ Builds are available for amd64 and arm64 architectures.
Example archive:
`victoria-metrics-linux-amd64-v1.144.0-enterprise.tar.gz`
`victoria-metrics-linux-amd64-v1.145.0-enterprise.tar.gz`
Includes:
@@ -351,7 +351,7 @@ Includes:
Example Docker image:
`victoriametrics/victoria-metrics:v1.144.0-enterprise-fips` uses the FIPS-compatible binary and based on `scratch` image.
`victoriametrics/victoria-metrics:v1.145.0-enterprise-fips` uses the FIPS-compatible binary and based on `scratch` image.
## What Happens to Licensed Components When a License Expires

View File

@@ -35,8 +35,8 @@ scrape_configs:
After you created the `scrape.yaml` file, download and unpack [single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) to the same directory:
```sh
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.144.0/victoria-metrics-linux-amd64-v1.144.0.tar.gz
tar xzf victoria-metrics-linux-amd64-v1.144.0.tar.gz
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.145.0/victoria-metrics-linux-amd64-v1.145.0.tar.gz
tar xzf victoria-metrics-linux-amd64-v1.145.0.tar.gz
```
Then start VictoriaMetrics and instruct it to scrape targets defined in `scrape.yaml` and save scraped metrics
@@ -150,8 +150,8 @@ Then start [single-node VictoriaMetrics](https://docs.victoriametrics.com/victor
```yaml
# Download and unpack single-node VictoriaMetrics
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.144.0/victoria-metrics-linux-amd64-v1.144.0.tar.gz
tar xzf victoria-metrics-linux-amd64-v1.144.0.tar.gz
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.145.0/victoria-metrics-linux-amd64-v1.145.0.tar.gz
tar xzf victoria-metrics-linux-amd64-v1.145.0.tar.gz
# Run single-node VictoriaMetrics with the given scrape.yaml
./victoria-metrics-prod -promscrape.config=scrape.yaml

View File

@@ -219,17 +219,21 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/
Whether to convert only metric names into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
-opentelemetry.ignoreResourceAttributes array
Control which resource attributes to ignore, can only be set when 'opentelemetry.promoteAllResourceAttributes' is true.
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-opentelemetry.labelNameUnderscoreSanitization
Whether to enable prepending of 'key' to labels starting with '_' when -opentelemetry.usePrometheusNaming is enabled. Reserved labels starting with '__' are not modified. See https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/ (default true)
-opentelemetry.maxRequestSize size
The maximum size in bytes of a single OpenTelemetry request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
-opentelemetry.promoteAllResourceAttributes
Whether to promote all resource attributes to labels, except for the ones configured with 'opentelemetry.ignoreResourceAttributes'.
Whether to promote all resource attributes to labels, except for the ones configured with 'opentelemetry.ignoreResourceAttributes'. (default true)
-opentelemetry.promoteResourceAttributes array
Promote specific list of resource attributes to labels.
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-opentelemetry.promoteScopeMetadata
Whether to promote OTel scope metadata (i.e. name, version, schema URL, and attributes) to metric labels.
Whether to promote OTel scope metadata (i.e. name, version, schema URL, and attributes) to metric labels. (default true)
-opentelemetry.usePrometheusNaming
Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
-opentsdbHTTPListenAddr string

View File

@@ -186,17 +186,21 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/vmagent/ .
Whether to convert only metric names into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
-opentelemetry.ignoreResourceAttributes array
Control which resource attributes to ignore, can only be set when 'opentelemetry.promoteAllResourceAttributes' is true.
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-opentelemetry.labelNameUnderscoreSanitization
Whether to enable prepending of 'key' to labels starting with '_' when -opentelemetry.usePrometheusNaming is enabled. Reserved labels starting with '__' are not modified. See https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/ (default true)
-opentelemetry.maxRequestSize size
The maximum size in bytes of a single OpenTelemetry request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
-opentelemetry.promoteAllResourceAttributes
Whether to promote all resource attributes to labels, except for the ones configured with 'opentelemetry.ignoreResourceAttributes'.
Whether to promote all resource attributes to labels, except for the ones configured with 'opentelemetry.ignoreResourceAttributes'. (default true)
-opentelemetry.promoteResourceAttributes array
Promote specific list of resource attributes to labels.
Supports an array of values separated by comma or specified via multiple flags.
Each array item can contain comma inside single-quoted or double-quoted string, {}, [] and () braces.
-opentelemetry.promoteScopeMetadata
Whether to promote OTel scope metadata (i.e. name, version, schema URL, and attributes) to metric labels.
Whether to promote OTel scope metadata (i.e. name, version, schema URL, and attributes) to metric labels. (default true)
-opentelemetry.usePrometheusNaming
Whether to convert metric names and labels into Prometheus-compatible format for the metrics ingested via OpenTelemetry protocol; see https://docs.victoriametrics.com/victoriametrics/integrations/opentelemetry/
-opentsdbHTTPListenAddr string
@@ -502,7 +506,7 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/vmagent/ .
Supports array of values separated by comma or specified via multiple flags.
Empty values are set to default value.
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default, digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics (default 100)
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default, digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics. See also -remoteWrite.significantFigures (default 100)
Supports array of values separated by comma or specified via multiple flags.
Empty values are set to default value.
-remoteWrite.sendTimeout array

View File

@@ -801,17 +801,15 @@ Please refer to the [VictoriaMetrics Cloud documentation](https://docs.victoriam
`vmalert` runs a web-server (`-httpListenAddr`) for serving metrics and alerts endpoints:
* `http://<vmalert-addr>` - UI;
* `http://<vmalert-addr>/api/v1/rules` - list of all loaded groups and rules. Supports `search`, `group_limit`, and `page_num` parameters, as well as additional [filtering](https://prometheus.io/docs/prometheus/latest/querying/api/#rules);
* `http://<vmalert-addr>/api/v1/alerts` - list of all active alerts;
* `http://<vmalert-addr>/api/v1/notifiers` - list all available notifiers;
* `http://<vmalert-addr>/vmalert/api/v1/alert?group_id=<group_id>&alert_id=<alert_id>` - get alert status in JSON format.
* `http://<vmalert-addr>/vmalert/api/v1/rule?group_id=<group_id>&rule_id=<rule_id>` - get rule status in JSON format.
* `http://<vmalert-addr>/vmalert/api/v1/group?group_id=<group_id>` - get group status in JSON format.
Used as alert source in AlertManager.
* `http://<vmalert-addr>/vmalert/alert?group_id=<group_id>&alert_id=<alert_id>` - get alert status in web UI.
* `http://<vmalert-addr>/vmalert/rule?group_id=<group_id>&rule_id=<rule_id>` - get rule status in web UI.
* `http://<vmalert-addr>/vmalert/api/v1/rule?group_id=<group_id>&alert_id=<alert_id>` - get rule status in JSON format.
* `http://<vmalert-addr>/metrics` - application metrics.
* `http://<vmalert-addr>/api/v1/rules` - returns a list of all loaded groups and rules. Supports the `datasource_type`, `search`, `group_limit`, and `page_num` parameters, as well as additional [filtering](https://prometheus.io/docs/prometheus/latest/querying/api/#rules);
* `http://<vmalert-addr>/api/v1/alerts` - returns a list of all active alerts. Supports the `datasource_type`, `rule_group[]`, `file[]` and `match[]`(applied on templated alert labels) query parameters;
* `http://<vmalert-addr>/api/v1/notifiers` - returns a list of all available notifiers;
* `http://<vmalert-addr>/vmalert/api/v1/alert?group_id=<group_id>&alert_id=<alert_id>` - returns the alert status in JSON format;
* `http://<vmalert-addr>/vmalert/api/v1/rule?group_id=<group_id>&rule_id=<rule_id>` - returns the rule status in JSON format;
* `http://<vmalert-addr>/vmalert/api/v1/group?group_id=<group_id>` - returns the group status in JSON format. Used as the alert source in AlertManager;
* `http://<vmalert-addr>/vmalert/alert?group_id=<group_id>&alert_id=<alert_id>` - displays the alert status in the web UI;
* `http://<vmalert-addr>/vmalert/rule?group_id=<group_id>&rule_id=<rule_id>` - displays the rule status in the web UI;
* `http://<vmalert-addr>/metrics` - application metrics endpoint;
* `http://<vmalert-addr>/-/reload` - hot configuration reload.
`vmalert` web UI can be accessed from [single-node version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/)

View File

@@ -1536,6 +1536,16 @@ To enable TLS on the public listener while keeping the internal listener non-TLS
`vmauth` also supports restricting access by IP - see [these docs](#ip-filters). See also [concurrency limiting docs](#concurrency-limiting).
When `vmauth` performs tenant routing for [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenant-reads) requests, it is crucial to explicitly set `extra_label`, `extra_filters` and `extra_filters[]` in the url_prefix configuration:
```yaml
unauthorized_user:
url_prefix: http://vmselect/select/multitenant?extra_filters[]=&extra_filters=&extra_label=vm_account_id=10&extra_label=vm_project_id=100
```
This is required because `vmselect` uses `OR` logic for tenant filtering. If a client sets `extra_filters[]` or `extra_filters`, it could bypass the tenant restriction configured via `extra_label`.
## Automatic issuing of TLS certificates
`vmauth` [Enterprise](https://docs.victoriametrics.com/victoriametrics/enterprise/) supports automatic issuing of TLS certificates via [Let's Encrypt service](https://letsencrypt.org/).

View File

@@ -73,4 +73,70 @@ ou can define it via the flag `--remote-read-headers=X-Scope-OrgID:demo`.
See [remote-read mode](https://docs.victoriametrics.com/victoriametrics/vmctl/remoteread/) for more details.
See also general [vmctl migration tips](https://docs.victoriametrics.com/victoriametrics/vmctl/#migration-tips).
See also general [vmctl migration tips](https://docs.victoriametrics.com/victoriametrics/vmctl/#migration-tips).
### Read data from the remote storage like S3, GCS, Azure etc.
If you have data stored in remote storage like S3, GCS, Azure etc. you can use `vmctl` in `mimir` mode to read data from
the remote storage and import it into VictoriaMetrics. In this mode `vmctl` reads data from the remote storage or file system
and checks index file, define needed blocks to be processed. After it downloads blocks by defined filters and
use Prometheus converter to read and sent data to VictoriaMetrics.
The following example shows how to read data from the file system and import it into VictoriaMetrics:
```sh
./vmctl mimir --mimir-path="fs:///mimir/test_data/mimir-tsdb" \ ? ? orbstack
--mimir-tenant-id=anonymous \
--mimir-filter-time-start=2024-12-01T00:00:00 \
--mimir-filter-time-end=2024-12-18T23:59:59 \
--mimir-creds-file-path=creads \
--vm-concurrency=6 \
--mimir-concurrency=6 \
--vm-addr=http://localhost:8428/
```
This approach is useful when you have data stored on the local file system or you have a mounted volume,
download the data from the remote storage etc.
The following example shows how to read data from the remote storage and import it into VictoriaMetrics:
```sh
./vmctl mimir --mimir-path="s3:///mimir-tsdb/anonymous" \
--mimir-filter-time-start=2024-12-01T00:00:00 \
--mimir-filter-time-end=2024-12-17T23:59:59 \
--mimir-creds-file-path=creads \
--mimir-custom-s3-endpoint='http://localhost:9000' \
--vm-concurrency=6 \
--mimir-concurrency=6 \
--vm-addr=http://localhost:8428/
```
In the example above we are used `--mimir-custom-s3-endpoint` flag to specify the custom S3 endpoint if it is needed.
When the process finishes, you will see the following:
```sh
2025/01/18 13:01:59 Fetching blocks from remote storage
Found 204 blocks to import. Continue? [Y/n] y
VM worker 0:? 1589405 samples/s
VM worker 1:? 1911834 samples/s
VM worker 2:? 1849187 samples/s
VM worker 3:? 1648820 samples/s
VM worker 4:? 1539212 samples/s
VM worker 5:? 1411485 samples/s
Processing blocks: 204 / 204 [?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????] 100.00%
2025/01/18 13:02:18 Import finished!
2025/01/18 13:02:18 VictoriaMetrics importer stats:
idle duration: 18.485875611s;
time spent while importing: 16.40543875s;
total samples: 177961995;
samples/s: 10847743.71;
total bytes: 4.1 GB;
bytes/s: 248.2 MB;
import requests: 893;
import requests retries: 0;
2025/01/18 13:02:18 Total time: 18.867547083s
```
See `./vmctl mimir --help` for details and full list of flags:
{{% content "vmctl_mimir_flags.md" %}}

View File

@@ -260,3 +260,7 @@ Processing ranges: 8799 / 8799 [████████████████
See [remote-read mode](https://docs.victoriametrics.com/victoriametrics/vmctl/remoteread/) for more details.
See also general [vmctl migration tips](https://docs.victoriametrics.com/victoriametrics/vmctl/#migration-tips).
See `./vmctl thanos --help` for details and full list of flags:
{{% content "vmctl_thanos_flags.md" %}}

View File

@@ -34,9 +34,9 @@ vmctl command-line tool is available as:
Download and unpack vmctl:
```sh
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.144.0/vmutils-darwin-arm64-v1.144.0.tar.gz
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.145.0/vmutils-darwin-arm64-v1.145.0.tar.gz
tar xzf vmutils-darwin-arm64-v1.144.0.tar.gz
tar xzf vmutils-darwin-arm64-v1.145.0.tar.gz
```
Once binary is unpacked, see the full list of supported modes by running the following command:

View File

@@ -20,6 +20,7 @@ COMMANDS:
influx Migrate time series from InfluxDB
remote-read Migrate time series via Prometheus remote-read protocol
prometheus Migrate time series from Prometheus
mimir Migrate time series from Mimir object storage or local filesystem
thanos Migrate time series from Thanos blocks (supports raw and downsampled data)
vm-native Migrate time series between VictoriaMetrics installations
verify-block Verifies exported block with VictoriaMetrics Native format

View File

@@ -56,7 +56,7 @@ OPTIONS:
--vm-compress Whether to apply gzip compression to import requests (default: true)
--vm-batch-size value How many samples importer collects before sending the import request to VM (default: 200000)
--vm-significant-figures value The number of significant figures to leave in metric values before importing. See https://en.wikipedia.org/wiki/Significant_figures. Zero value saves all the significant figures. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-round-digits option (default: 0)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics (default: 100)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-significant-figures option (default: 100)
--vm-extra-label value [ --vm-extra-label value ] Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag will have priority. Flag can be set multiple times, to add few additional labels.
--vm-rate-limit value Optional data transfer rate limit in bytes per second.
By default, the rate limit is disabled. It can be useful for limiting load on configured via '--vm-addr' destination. (default: 0)

View File

@@ -0,0 +1,68 @@
---
build:
list: never
publishResources: false
render: never
sitemap:
disable: true
---
<!-- The file should not be updated manually. Run make docs-update-flags while preparing a new release to sync flags in docs from actual binaries. -->
```shellhelp
NAME:
vmctl mimir - Migrate time series from Mimir object storage or local filesystem
USAGE:
vmctl mimir [command options]
OPTIONS:
-s Whether to run in silent mode. If set to true no confirmation prompts will appear. (default: false)
--verbose Whether to enable verbosity in logs output. (default: false)
--disable-progress-bar Whether to disable progress bar during the import. (default: false)
--pushmetrics.url value [ --pushmetrics.url value ] Optional URL to push metrics. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#push-metrics
--pushmetrics.interval value Interval for pushing metrics to every -pushmetrics.url (default: 10s)
--pushmetrics.extraLabel value [ --pushmetrics.extraLabel value ] Extra labels to add to pushed metrics. In case of collision, label value defined by flag will have priority. Flag can be set multiple times, to add few additional labels. For example, -pushmetrics.extraLabel='instance="foo"' adds instance="foo" label to all the metrics pushed to every -pushmetrics.url
--pushmetrics.header value [ --pushmetrics.header value ] Optional HTTP headers to add to pushed metrics. Flag can be set multiple times, to add few additional headers.
--pushmetrics.disableCompression Whether to disable compression when pushing metrics. (default: false)
--mimir-path value Path to Mimir storage bucket or local folder.
--mimir-tenant-id value Tenant ID for Mimir storage
--mimir-concurrency value Number of concurrently running block readers (default: 1)
--mimir-filter-time-start value The time filter in RFC3339 format to select timeseries with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
--mimir-filter-time-end value The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
--mimir-filter-label value Mimir label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.
--mimir-filter-label-value value Regular expression to filter label from "mimir-filter-label" flag. (default: ".*")
--mimir-creds-file-path value Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
--mimir-config-file-path value Path to file with S3 configs. Configs are loaded from default location if not set. See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
--mimir-config-profile value Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used
--mimir-custom-s3-endpoint value Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set
--mimir-s3-force-path-style Prefixing endpoint with bucket name when set false, true by default. (default: true)
--mimir-s3-tls-insecure-skip-verify Whether to skip TLS verification when connecting to the S3 endpoint. (default: false)
--mimir-s3-sse-kms-key-id value SSE KMS Key ID for use with S3-compatible storages.
--mimir-s3-sse-algorithm value SSE algorithm for use with S3-compatible storages.
--vm-addr value VictoriaMetrics address to perform import requests.
Should be the same as --httpListenAddr value for single-node version or vminsert component.
When importing into the clustered version do not forget to set additionally --vm-account-id flag.
Please note, that vmctl performs initial readiness check for the given address by checking /health endpoint. (default: "http://localhost:8428")
--vm-user value VictoriaMetrics username for basic auth [$VM_USERNAME]
--vm-password value VictoriaMetrics password for basic auth [$VM_PASSWORD]
--vm-account-id value AccountID is an arbitrary 32-bit integer identifying namespace for data ingestion (aka tenant).
AccountID is required when importing into the clustered version of VictoriaMetrics.
It is possible to set it as accountID:projectID, where projectID is also arbitrary 32-bit integer.
If projectID isn't set, then it equals to 0
--vm-concurrency value Number of workers concurrently performing import requests to VM (default: 2)
--vm-compress Whether to apply gzip compression to import requests (default: true)
--vm-batch-size value How many samples importer collects before sending the import request to VM (default: 200000)
--vm-significant-figures value The number of significant figures to leave in metric values before importing. See https://en.wikipedia.org/wiki/Significant_figures. Zero value saves all the significant figures. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-round-digits option (default: 0)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-significant-figures option (default: 100)
--vm-extra-label value [ --vm-extra-label value ] Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag will have priority. Flag can be set multiple times, to add few additional labels.
--vm-rate-limit value Optional data transfer rate limit in bytes per second.
By default, the rate limit is disabled. It can be useful for limiting load on configured via '--vm-addr' destination. (default: 0)
--vm-cert-file value Optional path to client-side TLS certificate file to use when connecting to '--vm-addr'
--vm-key-file value Optional path to client-side TLS key to use when connecting to '--vm-addr'
--vm-CA-file value Optional path to TLS CA file to use for verifying connections to '--vm-addr'. By default, system CA is used
--vm-server-name value Optional TLS server name to use for connections to '--vm-addr'. By default, the server name from '--vm-addr' is used
--vm-insecure-skip-verify Whether to skip tls verification when connecting to '--vm-addr' (default: false)
--vm-backoff-retries value How many import retries to perform before giving up. (default: 10)
--vm-backoff-factor value Factor to multiply the base duration after each failed import retry. Must be greater than 1.0 (default: 1.8)
--vm-backoff-min-duration value Minimum duration to wait before the first import retry. Each subsequent import retry will be multiplied by the '--vm-backoff-factor'. (default: 2s)
--help, -h show help
```

View File

@@ -51,7 +51,7 @@ OPTIONS:
--vm-compress Whether to apply gzip compression to import requests (default: true)
--vm-batch-size value How many samples importer collects before sending the import request to VM (default: 200000)
--vm-significant-figures value The number of significant figures to leave in metric values before importing. See https://en.wikipedia.org/wiki/Significant_figures. Zero value saves all the significant figures. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-round-digits option (default: 0)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics (default: 100)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-significant-figures option (default: 100)
--vm-extra-label value [ --vm-extra-label value ] Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag will have priority. Flag can be set multiple times, to add few additional labels.
--vm-rate-limit value Optional data transfer rate limit in bytes per second.
By default, the rate limit is disabled. It can be useful for limiting load on configured via '--vm-addr' destination. (default: 0)

View File

@@ -44,7 +44,7 @@ OPTIONS:
--vm-compress Whether to apply gzip compression to import requests (default: true)
--vm-batch-size value How many samples importer collects before sending the import request to VM (default: 200000)
--vm-significant-figures value The number of significant figures to leave in metric values before importing. See https://en.wikipedia.org/wiki/Significant_figures. Zero value saves all the significant figures. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-round-digits option (default: 0)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics (default: 100)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-significant-figures option (default: 100)
--vm-extra-label value [ --vm-extra-label value ] Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag will have priority. Flag can be set multiple times, to add few additional labels.
--vm-rate-limit value Optional data transfer rate limit in bytes per second.
By default, the rate limit is disabled. It can be useful for limiting load on configured via '--vm-addr' destination. (default: 0)

View File

@@ -59,7 +59,7 @@ OPTIONS:
--vm-compress Whether to apply gzip compression to import requests (default: true)
--vm-batch-size value How many samples importer collects before sending the import request to VM (default: 200000)
--vm-significant-figures value The number of significant figures to leave in metric values before importing. See https://en.wikipedia.org/wiki/Significant_figures. Zero value saves all the significant figures. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-round-digits option (default: 0)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics (default: 100)
--vm-round-digits value Round metric values to the given number of decimal digits after the point. This option may be used for increasing on-disk compression level for the stored metrics. See also --vm-significant-figures option (default: 100)
--vm-extra-label value [ --vm-extra-label value ] Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag will have priority. Flag can be set multiple times, to add few additional labels.
--vm-rate-limit value Optional data transfer rate limit in bytes per second.
By default, the rate limit is disabled. It can be useful for limiting load on configured via '--vm-addr' destination. (default: 0)

Some files were not shown because too many files have changed in this diff Show More