Compare commits

...

152 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
0ea35215f6 app/vmalert-tool: add missing multiarch directory
This is needed for 'make publish-vmalert-tool'
2023-11-15 18:11:50 +01:00
Aliaksandr Valialkin
91f5c24f82 docs/CHANGELOG.md: cut v1.95.0 release 2023-11-15 17:45:52 +01:00
Aliaksandr Valialkin
8d9e365512 vendor: update github.com/klasuspost/compress from v1.17.2 to v1.17.3
See https://github.com/klauspost/compress/releases/tag/v1.17.3
2023-11-15 17:18:22 +01:00
Aliaksandr Valialkin
741013a33f docs/CHANGELOG.md: document v1.93.8 LTS release 2023-11-15 17:12:44 +01:00
Aliaksandr Valialkin
d9a7dea9a1 lib/querytracer: add missing blank comment line after 3121d76bee 2023-11-15 16:10:43 +01:00
Aliaksandr Valialkin
e837968e49 docs/stream-aggregation.md: clarify that stream aggregation is applied after all the configured relabeling
This is a follow-up after 68d2cb203d
2023-11-15 15:54:15 +01:00
Aliaksandr Valialkin
5bfa2a3e97 docs/CHANGELOG.md: document v1.87.11 LTS release 2023-11-15 15:53:05 +01:00
hagen1778
68d2cb203d docs/stream-aggr: specify the relabeling order during aggregation
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-15 14:28:03 +01:00
Aliaksandr Valialkin
02624c41eb app/vmctl/README.md: sync with docs/vmctl.md after 7b2e2a23c2 2023-11-15 12:25:33 +01:00
Aliaksandr Valialkin
ef2113e5df Makefile: remove package-base dependency from publish rule, since this dep is set inside all the publish-* dependencies
This is a follow-up for d4099a75be
2023-11-15 12:23:31 +01:00
John Belmonte
7b2e2a23c2 vmctl README.md typo (#5326) 2023-11-15 11:47:25 +04:00
John Belmonte
efd5d1f2dd relabeling.md: fix link (#5325) 2023-11-15 11:32:09 +04:00
Aliaksandr Valialkin
201fb6ec5c vendor: run make vendor-update 2023-11-14 22:45:07 +01:00
Aliaksandr Valialkin
3076c1f400 lib/ingestserver: properly log the number of closed connections
Previously there was off-by-one error, which resulted in logging len(conns-1) connections instead of len(conns)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922
2023-11-14 21:53:24 +01:00
Aliaksandr Valialkin
6a533023b1 docs/CHANGELOG.md: consistently prepend command-line flags with a single dash 2023-11-14 21:44:19 +01:00
Aliaksandr Valialkin
0bf48a5d2a docs/Cluster-VictoriaMetrics.md: clarify how -storage.vminsertConnsShutdownDuration command-line flag works 2023-11-14 21:41:47 +01:00
hagen1778
feff13851c docs: clarify vmalert flag changes
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 21:18:58 +01:00
Nikolay
3121d76bee lib/querytracer: makes package concurrent safe to use (#5322)
* lib/querytracer: makes package concurrent safe to use
it must fix various issues with concurrent code usage.
Especially, when it's not reasonable to wait for all goroutines to be finished

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 20:59:08 +01:00
Aliaksandr Valialkin
cb106bdf39 lib/logger: increase default -loggerMaxArgLen command-line flag value from 500 to 1000
The 500 chars limit for the maximum arg lengths during logging appeared to be too low for some cases
2023-11-14 19:52:27 +01:00
Artem Navoiev
5d61a7327d docs: update Grafana setup section, use more direct link and add noti… (#5287) 2023-11-14 14:34:01 +01:00
hagen1778
d3ae2b2f62 dashboards: update description for RSS and anonymous memory panels to be consistent for single-node, cluster and vmagent dashboards.
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 09:50:06 +01:00
hagen1778
d6ae082598 deployment/dashboards: respect job and instance filters for alerts annotation in cluster and single-node dashboards
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-14 09:38:15 +01:00
Aliaksandr Valialkin
fbc6289a21 app/vmselect/promql: typo fixes after 7cf7740d18 2023-11-14 03:34:37 +01:00
Aliaksandr Valialkin
f9bd265249 lib/ingestserver: typo fix after f7834767c1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922
2023-11-14 03:26:26 +01:00
Aliaksandr Valialkin
7cf7740d18 app/vmselect/promql: properly handle instant query optimization conrner cases for min_over_time() and max_over_time()
- If min_over_time(m[offset] @ timestamp) <= min_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.

- If max_over_time(m[offset] @ timestamp) >= max_over_time(m[offset] @ (timestamp-window)),
  then the optimization can be applied.
2023-11-14 02:58:08 +01:00
Yury Molodov
0116a333fb vmui: reduced the number of server requests (#5253)
* vmui: reduced the number of server requests

* run `make vmui-update vmui-logs-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:50:00 +01:00
Aliaksandr Valialkin
43e3302803 docs/CHANGELOG.md: document 0e056ddb2d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5203
2023-11-14 01:24:05 +01:00
Yury Molodov
0e056ddb2d vmui: fix trailing slash in serverURL (#5271)
* vmui: add function to autoremove slash at the end of serverURL (#5203)

* vmui: change removeTrailingSlash func
2023-11-14 01:21:16 +01:00
Noah Labrecque
12aefa8a4b fix: apply correct bounds to sf and tf (#5274) 2023-11-14 01:17:16 +01:00
Zakhar Bessarab
37997abd14 vmcluster: re-routing enhancement (#5293)
* app/vmstorage: close vminsert connections gradually before stopping storage

Implements graceful shutdown approach suggested here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1768146878

Test results for this can be found here - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922#issuecomment-1790640274

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update graceful shutdown logic

- close connections from vminsert in determenistic order
- update flag description
- lower default timeout to 25 seconds. 25 seconds value was chosen because the lowest default value used in default configuration deployments is 30s(default value in Kubernetes and ansible-playbooks).

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add information about re-routing enhancement during restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/changelog: add entry for new command-line flag

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* {app/vmstorage,lib/ingestserver}: address review feedback

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* docs/cluster: add note to update workload scheduler timeout

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-14 01:03:44 +01:00
Aliaksandr Valialkin
cef7a39ba3 lib/logstorage: always check the previous indexBlockHeader for blocks with matching tenantID and/or streamID
The previous indexBlockHeader may contain blocks for the matching tenantID and/or streamID,
so it must be scanned unconditionally during the search.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5295
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4856

This is a follow-up for 89dcbc2fe7
2023-11-13 23:13:53 +01:00
XLONG96
89dcbc2fe7 lib/logstorage: fix streamID and tenantID search (#4856) (#5295) 2023-11-13 23:09:39 +01:00
Aliaksandr Valialkin
8eed04b2c6 app/vmauth: add ability to drop the specified number of /-delimited prefix parts from request path
This can be done via `drop_src_path_prefix_parts` option at `url_map` and `user` levels.

See https://docs.victoriametrics.com/vmauth.html#dropping-request-path-prefix
2023-11-13 22:32:22 +01:00
Aliaksandr Valialkin
0feaeca3c1 lib/protoparser/promremotewrite: fall back to Snappy decoding if zstd decoding fails
This case is possible after the following steps:

1. vmagent tries to perform handshake with the -remoteWrite.url in order to determine whether
   the remote storage supports zstd-compressed data.
2. The remote storage is unavailable during the handshake. In this case vmagent falls back to Snappy compression
   for the data sent to the remote storage.
3. vmagent compresses the collected data into blocks with Snappy and puts these blocks to persistent queue on disk.
4. The remote storage becomes available.
5. vmagent restarts, performs the handshake with the remote storage and detects that it supports zstd-compressed data.
6. vmagent starts sending Snappy-compressed data from persistent queue to the remote storage,
   while falsely advertizing it sends zstd-compressed data.
7. The remote storage receives Snappy-compressed data and fails unpacking it with zstd.

The solution is to just fall back to Snappy decompression if zstd decompression fails.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5301
2023-11-13 21:19:08 +01:00
Aliaksandr Valialkin
8af56ea2ed lib/htmlcomponents: use relative links for the top page and for favicon.ico
This allows hiding VictoriaMetrics components behind proxies with arbitrary path prefixes.
For example, vmagent HTTP handlers can be served via /vmagent/ path prefix:

- http://proxy/vmagent/targets
- http://proxy/vmagent/service-discovery

The path prefix can be arbitrary. For example, below are vmagent urls
for /tenantID/vmagent/ path prefix:

- http://proxy/tenantID/vmagent/targets
- http://proxy/tenantID/vmagent/service-discovery

While at it, consistently serve favicon.ico from any path directory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5306
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5307
2023-11-13 20:29:05 +01:00
Aliaksandr Valialkin
cf23dc6480 all: cleanup: remove // +build ... lines, since they are no longer needed after Go1.17, and the minimum supported Go version for VictoriaMetrics source code is Go1.20 2023-11-13 19:12:51 +01:00
Aliaksandr Valialkin
b7fb7c5f77 vendor: run make vendor-update 2023-11-13 18:50:16 +01:00
Aliaksandr Valialkin
3e93fa61ad lib/regexutil: properly handle alternate regexps surrounded by .+ or .*
Previously the following regexps were improperly handled:

  .+foo|bar.+
  .*foo|bar.*

This could lead to unexpected regexp match results.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5297

Thanks to @Haleygo for the initial attempt to fix the issue at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5308
2023-11-13 18:23:38 +01:00
Aliaksandr Valialkin
c8d901424f docs/VictoriaLogs/CHANGELOG.md: follow-up for 66527c5981
Document the change

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5312
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5300
2023-11-13 10:39:48 +01:00
Yury Molodov
66527c5981 vmui: ui logs enhancements (#5312)
* vmui/logs: fix time sorting #5300

* vmui/logs: add base query validation

* vmui/logs: add a message for empty results
2023-11-13 10:36:52 +01:00
Aliaksandr Valialkin
6340911d38 lib/stringsutil: add tests for LimitStringLen() function 2023-11-13 10:32:33 +01:00
Dmytro Kozlov
4722b70c89 lib/stringsutil: fix failing test (#5313)
We have failed test on master branch.

```
--- FAIL: TestFormatLogMessage (0.00s)
    logger_test.go:24: unexpected result; got
        "foo: abcde, \"foo bar baz\", xx"
        want
        "foo: a..e, \"f..z\", xx"
```
if failed because maxArgs maxLen <= 4 in the  `LimitStringLen` in that case we always will return the income string
but in the test we limit the maxLen by value 4
```
f("foo: %s, %q, %s", []interface{}{"abcde", fmt.Errorf("foo bar baz"), "xx"}, 4, `foo: a..e, "f..z", xx`)
2023-11-13 09:51:49 +01:00
Aliaksandr Valialkin
ba058a4514 docs/CHANGELOG.md: remove trailing whitespace after bffd30b57a 2023-11-13 09:24:29 +01:00
Aliaksandr Valialkin
25ff811d78 docs/vmauth.md: add missing dashes in front of command-line flags at the Backend TLS setup section
Dashes must be consistently used in front of command-line flags across the documentation.

This is a follow up for 61594d2bd8
2023-11-13 09:12:57 +01:00
Aliaksandr Valialkin
eded218e8c app/vmauth: properly pass Host header to backends
Previously the `Host` header was remained unchanged when passing it in requests to backends.
This may improperly work if the backend uses host-based routing.

While at it, allows http/2.0 requests to backends. While VictoriaMetrics components
do not accept http/2.0 requests, other backends can require such requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
2023-11-13 09:05:39 +01:00
Aliaksandr Valialkin
61594d2bd8 app/vmauth: follow-up for 323f3720ed
- Re-use identically configured http.Transport across multiple users.
  This fixes handling of the limit on the number of connection, which can be established per each backend
  via -maxIdleConnsPerBackend command-line flag. This limit stopped working after 323f3720ed

- Add docs about backend TLS setup at https://docs.victoriametrics.com/vmauth.html#backend-tls-setup

- Add ability to disable backend TLS verification for all the users via -backend.tlsInsecureSkipVerify command-line flag.
  This flag may be useful when -auth.config contains big number of users, and every user must disable backend TLS verification.

- Add ability to specify TLS Root CA via tls_ca_file option at per-user basis and via -backend.tlsCAFile command-line flag
  across all the users.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
2023-11-13 08:33:10 +01:00
Aliaksandr Valialkin
ec72b750f2 docs/Articles.md: typo fix 2023-11-13 08:05:16 +01:00
Aliaksandr Valialkin
bfec8a3751 app/vmauth: improve docs a bit after 323f3720ed
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240
2023-11-11 12:49:28 +01:00
Aliaksandr Valialkin
0df8489e0a app/vmagent/README.md: sync with docs/vmagent.md after 930d26b2ff 2023-11-11 12:31:32 +01:00
Aliaksandr Valialkin
230230cf0b lib/logger: add -loggerMaxArgLen command-line flag for fine-tuning the maximum length of logged args 2023-11-11 12:30:08 +01:00
Aliaksandr Valialkin
80213f07fa app/vmselect/promql: optimize instant queries with min_over_time() and max_over_time() rollup functions
This is a follow-up for 41a0fdaf39
2023-11-11 12:10:03 +01:00
Aliaksandr Valialkin
2db1a664e1 deployment: update Go builder from Go1.21.3 to Go1.21.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.4+label%3ACherryPickApproved
2023-11-10 22:28:44 +01:00
Aliaksandr Valialkin
010dc15d16 lib/blockcache: do not cache entries, which were attempted to be accessed 1 or 2 times
Previously entries which were accessed only 1 time weren't cached.
It has been appeared that some rarely executed heavy queries may read indexdb block twice
in a row instead of once. There is no need in caching such a block then.
This change should eliminate cache size spikes for indexdb/dataBlocks when such heavy queries are executed.

Expose -blockcache.missesBeforeCaching command-line flag, which can be used for fine-tuning
the number of cache misses needed before storing the block in the caching.
2023-11-10 22:28:03 +01:00
Aliaksandr Valialkin
22498c5087 docs/Articles.md: sort third-party articles by importance 2023-11-10 21:26:51 +01:00
Aliaksandr Valialkin
dc668b8246 docs/Articles.md: add a link to https://blog.cloudflare.com/introducing-http-traffic-anomalies-notifications/ 2023-11-10 20:59:57 +01:00
Aliaksandr Valialkin
a9a26c20b5 docs/Single-server-VictoriaMetrics.md: make High availability section more clear 2023-11-10 20:58:32 +01:00
Aliaksandr Valialkin
d407d13e7b Makefile: update golangci-lint version from v1.54.2 to v1.55.1
See https://github.com/golangci/golangci-lint/releases/tag/v1.55.1
2023-11-10 20:23:48 +01:00
PhracturedBlue
2474281f1b Support building images via podman (#4978) 2023-11-09 00:50:21 -08:00
Zakhar Bessarab
73a1862182 docs/changelog: document vmbackupmanager bugfix (#5303)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-11-08 18:51:14 +01:00
Artem Navoiev
930d26b2ff docs: vmagent change the codeblock languages
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-11-08 18:12:29 +01:00
Github Actions
2b15f2fc8c Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@cc18249 (#5305) 2023-11-08 18:19:12 +04:00
Github Actions
746cc93e95 Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@52bdb4a (#5304) 2023-11-08 17:59:10 +04:00
Roman Khavronenko
bffd30b57a app/vmalert: update remote-write process (#5284)
* app/vmalert: update remote-write process

* automatically retry remote-write requests on closed connections. The change should reduce the amount of logs produced in environments with short-living connections or environments without support of keep-alive on network balancers.
* increment `vmalert_remotewrite_errors_total` metric if all retries to send remote-write request failed. Before, this metric was incremented only if remote-write client's buffer is overloaded.
* increment `vmalert_remotewrite_dropped_rows_total` amd `vmalert_remotewrite_dropped_bytes_total` metrics if remote-write client's buffer is overloaded. Before, these metrics were incremented only after unsuccessful HTTP calls.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2023-11-08 14:53:07 +08:00
Artem Navoiev
5afc6a5765 github actions: sync docs use the latest hugo version in CI
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-11-07 12:23:12 +01:00
Artem Navoiev
b51d16e74c docs: url example change the title h2->h3 h3->h4 for better indexing in search
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-11-07 09:54:32 +01:00
Artem Navoiev
e4f44bad91 docs: fix formatting in stream aggregation more
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-11-07 09:31:21 +01:00
Artem Navoiev
ea4ca70be1 docs: fix formatting in stream aggregation
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-11-07 09:27:44 +01:00
Github Actions
4d2b98b755 Automatic update operator docs from VictoriaMetrics/operator@b4b79da (#5291) 2023-11-06 16:04:12 +08:00
hagen1778
c07dc45786 app/vmalert: fix typo in remoteWrite.concurrency description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-03 22:04:50 +01:00
Yury Molodov
f90d2ec843 vmui: display query error on Explore metrics page (#5272)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5202
2023-11-03 16:23:19 +01:00
hagen1778
054367c421 docs: make docs-sync after 323f3720ed
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-03 16:22:23 +01:00
Zakhar Bessarab
323f3720ed app/vmauth: add option to skip TLS verification (#5256)
Add `tls_insecure_skip_verify` option on per-user basis which allows to disable TLS verification for all requests to backend on behalf of this user.

See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5240

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-11-03 12:04:17 +01:00
Aliaksandr Valialkin
da24c129db vendor: run make vendor-update 2023-11-02 21:01:21 +01:00
Aliaksandr Valialkin
f2b373dd06 go.mod: pin the latest working version of golang.org/x/exp 2023-11-02 20:55:24 +01:00
Aliaksandr Valialkin
815fda8995 docs: update -help output after recent changes to VictoriaMetrics components 2023-11-02 20:27:10 +01:00
Aliaksandr Valialkin
65db6609eb docs/CHANGELOG.md: update the description of the optimization for SLO/SLI-like queries according to latest changes
See commits 4497a08e3d and 92826b0b4a
2023-11-02 20:05:05 +01:00
Roman Khavronenko
b5254199c6 app/vmalert: add label file pointing to the group's filename to metrics (#5281)
The filename should help identifying alerting rules belonging to specific groups
with identical names but different filenames.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5267

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-02 16:01:31 +01:00
hagen1778
6eb205f8b0 app/vmalert: verify alert name correctness in restore test
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-11-02 15:28:39 +01:00
Hui Wang
90d45574bf vmalert: reduce restore query request for each alerting rule (#5265)
reduce the number of queries for restoring alerts state on start-up. 
The change should speed up the restore process and reduce pressure on `remoteRead.url`.
2023-11-02 15:22:13 +01:00
Aliaksandr Valialkin
eeb25bc4a5 app/vmselect/promql: add missing trace message in rollupResultCache.GetSeries() 2023-11-02 09:22:48 +01:00
Aliaksandr Valialkin
dd33fc0c76 docs/CHANGELOG.md: typo fix: tis -> this 2023-11-02 08:33:40 +01:00
Aliaksandr Valialkin
536c1301fd docs/Single-server-VictoriaMetrics.md: document why data inside <-storageDataPath>/snapshots directory should be manipulated only via snapshot API 2023-11-02 08:30:39 +01:00
Aliaksandr Valialkin
87a86ec9db docs/CHANGELOG.md: document v1.93.7 LTS release 2023-11-02 08:21:00 +01:00
Aliaksandr Valialkin
ed70a40669 app/vmagent/remotewrite: add -remoteWrite.shardByURL.labels command-line flag
This command-line flag can be used for specifying a list of labels used for sharding
among -remoteWrite.url entries when -remoteWrite.shardByURL command-line flag is set.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4942
2023-11-01 23:08:54 +01:00
Alexander Marshalov
828ddd4e4f vmauth: add browser authorization request for http requests without… (#5234)
* vmauth: add browser authorization request for http requests without credentials to a route that is not in the `unauthorized_user` section (when `unauthorized_user` is specified).

* add link to issue in CHANGELOG

* Extend vmauth docs

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-11-01 20:59:46 +01:00
Aliaksandr Valialkin
4497a08e3d app/vmselect/promql: reduce the minimum lookbehind window for enabling SLO/SLI optimizations from 24 hours to 6 hours
This reduction is based on production testing.

Also expose -search.minWindowForInstantRollupOptimization command-line flag, so users could fine-tune this arg for their needs
2023-11-01 20:18:44 +01:00
Aliaksandr Valialkin
f59dda3223 app/vmselect: run make quicktemplate-gen after b8739bc00b 2023-11-01 17:52:56 +01:00
Aliaksandr Valialkin
b8739bc00b app/vmselect: return stats.seriesFetched as string instead of number
vmalert expects string value for stats.seriesFetched, so it is impossible
switching to number without breaking compatibility with old vmalert releases :(

It is still unclear why stats.seriesFetched has string type in the first place...
2023-11-01 17:49:59 +01:00
Github Actions
b44b74d118 Automatic update operator docs from VictoriaMetrics/operator@49826be (#5270)
Co-authored-by: Alexander Marshalov <_@marshalov.org>
2023-11-01 17:45:56 +01:00
Artem Navoiev
324ecbeed6 add Try new Docs button in the current docs
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-11-01 17:22:49 +01:00
Aliaksandr Valialkin
da887b49e7 app/vmui: show query execution duration in the header of query input field
This should simplify the process of query optimization
2023-11-01 16:43:51 +01:00
Hui Wang
e482eeff58 vmalert: support specifying full http url in notifier static_configs target (#5261)
* vmalert: support specifying full http or https urls in notifier static_configs target address
* show right label results in ui
2023-11-01 19:53:50 +08:00
Aliaksandr Valialkin
92826b0b4a app/vmselect/promql: apply SLO-like optimization to all the count_*_over_time() functions
This is a follow-up for 41a0fdaf39
2023-11-01 09:58:16 +01:00
Aliaksandr Valialkin
0189081490 app/vmselect/promql: typo fix, which could lead to panic during range query execution
The panic is:

  BUG: unexpected values after merging new values

This is a follow-up for 41a0fdaf39
2023-11-01 09:58:15 +01:00
Github Actions
3e76c3bd50 Automatic update operator docs from VictoriaMetrics/operator@57c1bf6 (#5266) 2023-11-01 16:33:25 +08:00
Aliaksandr Valialkin
c4c6ee9485 app/vmui: fix non-working Disable cache checkbox at JSON and Table views 2023-10-31 22:58:06 +01:00
Aliaksandr Valialkin
a70818f72f app/vmselect/promql: properly calculate rollup result if lookbehind window isn't set
This is a follow-up for 41a0fdaf39
2023-10-31 22:22:37 +01:00
Aliaksandr Valialkin
ea81f6fc36 app/vmselect/promql: add outliers_iqr(q) and outlier_iqr_over_time(m[d]) functions
These functions allow detecting anomalies in series and samples using Interquartile range method.
See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range for more details.
2023-10-31 22:10:31 +01:00
Aliaksandr Valialkin
fba93dbe0b vendor: run make vendor-update 2023-10-31 20:19:51 +01:00
Aliaksandr Valialkin
41a0fdaf39 app/vmselect/promql: optimize repeated SLI-like instant queries with lookbehind windows >= 1d
Repeated instant queries with long lookbehind windows, which contain one of the following rollup functions,
are optimized via partial result caching:

- sum_over_time()
- count_over_time()
- avg_over_time()
- increase()
- rate()

The basic idea of optimization is to calculate

  rf(m[d] @ t)

as

  rf(m[offset] @ t) + rf(m[d] @ (t-offset)) - rf(m[offset] @ (t-d))

where rf(m[d] @ (t-offset)) is cached query result, which was calculated previously

The offset may be in the range of up to 1 hour.
2023-10-31 19:25:23 +01:00
Aliaksandr Valialkin
c96fc05f3e docs/Cluster-VictoriaMetrics.md: sync with cluster branch after 9d8f93050c 2023-10-31 19:13:11 +01:00
Aliaksandr Valialkin
51aab7bb17 app/vmselect/promql: wrap too long line after a950873fff 2023-10-31 18:59:10 +01:00
Aliaksandr Valialkin
714af89b13 lib/httpserver: follow-up for 0638bbe69c
- Replace spaces with underscores in the `reason` label value for the vm_http_request_errors_total metric
  in order be consistent with Prometheus-like naming

- Clarify the description for the change at docs/CHANGELOG.md

Updates https://github.com/victoriaMetrics/victoriaMetrics/issues/4590
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5166
2023-10-31 18:52:39 +01:00
Aliaksandr Valialkin
98699f203b lib/persistentqueue: properly re-create flock.lock file inside directory if persistent queue is broken.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5249

Thanks to @Sniper91 for the bugreport and initial fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5233
2023-10-31 18:38:32 +01:00
Aliaksandr Valialkin
efb6ac27c2 lib/httpserver: call Request.Header() only once instead of calling it each time a new request header is set
This is a follow-up for ad839aa492
2023-10-31 18:38:32 +01:00
Artem Navoiev
68f82b1c06 github actions: fix typo in hugo version
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-10-31 17:52:35 +01:00
Artem Navoiev
1020faa7b4 github actions: use 0.119 hugo version as far latest contains bug
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-10-31 17:50:12 +01:00
Aliaksandr Valialkin
aade16f534 docs/Cluster-VictoriaMetrics.md: clarify the description on why -dosnwampling.period must be set at both vmstorage and vmselect
This is a follow-up for ca7457d906
2023-10-31 16:53:46 +01:00
Aliaksandr Valialkin
a71efce784 docs/Single-server-VictoriaMetrics.md: cosmetic fixes after 23369321f1 2023-10-31 16:42:29 +01:00
Aliaksandr Valialkin
4ac95b6f49 docs/CHANGELOG.md: move the description for -http.header.* command-line flags from SECURITY to FEATURE
The SECURITY label should be applied only to changes, which fix security issues.
The change at ad839aa492 adds new command-line flags, which can be used
for improving security in some cases. They do not fix any security issues.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5111
2023-10-31 16:23:08 +01:00
Aliaksandr Valialkin
7ac49162c6 lib/storage: follow-up for 29cebd82fb
Use atomic.CompareAndSwapUint32() instead of atomic.LoadUint32() followed by atomic.StoreUint32().
This makes the code more clear.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159
2023-10-31 16:08:54 +01:00
hagen1778
f6208965ce dashboards/cluster: fix description about max threshold for Concurrent selects panel.
Before, it was mistakenly implying that `max` is equal to the double of available CPUs.

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5214

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 16:05:33 +01:00
Roman Khavronenko
a950873fff app/vmselect: expose vm_memory_intensive_queries_total counter metric (#5208)
The new metric gets increased each time `-search.logQueryMemoryUsage` memory limit
is exceeded by a query. This metric should help to identify expensive and heavy queries
without inspecting the logs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 13:31:09 +01:00
hagen1778
a8051d48c4 docs: follow-up for 0638bbe69c
0638bbe69c
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 12:54:30 +01:00
venkatbvc
0638bbe69c vmauth: add counter metrics for auth successes and failures (#5166)
New labels `reason="wrong basic auth creds"` and `reason="wrong auth key"` were
added to metric `vm_http_request_errors_total`  to help identify auth errors.

https://github.com/victoriaMetrics/victoriaMetrics/issues/4590

Co-authored-by: Rao, B V Chalapathi <b_v_chalapathi.rao@nokia.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-10-31 12:48:02 +01:00
hagen1778
aaf9e3d526 dashboards/vmalert: add new panel Missed evaluations
The new panel supposed to indicate alerting groups that miss their evaluations.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 10:35:19 +01:00
hagen1778
9866974a53 deployment/alerts: add TooManyMissedIterations alerting rule
The new rule for vmalert supposed to detect groups that miss their
evaulations due to slow queries.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 10:35:18 +01:00
hagen1778
8874b525b7 dashboards: fix Errors rate to Alertmanager filter
The panel `Errors rate to Alertmanager` had `group` label filter
applied to the expression, while the metric `vmalert_alerts_send_errors_total`
doesn't have that label. This resulted into always empty results.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-31 10:16:45 +01:00
Roman Khavronenko
ca7457d906 docs: explain motivation behind having -downsampling.period on vmselect (#5205)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 19:03:36 +01:00
Roman Khavronenko
23369321f1 docs: mention information loss when downsampling gauges (#5204)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 15:29:06 +01:00
Hui Wang
abcb21aa5e vmalert: fix alert firing state in replay mode (#5192)
fix possible missing firing states for alerting rules in replay mode
Before if one firing stage is bigger than single query request range, like rule with a big `for`, alerting rule won't able to be detected as firing.

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 13:54:18 +01:00
hagen1778
e964df8039 docs/troubleshooting: mention issue with un-ordered labels
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5219#issuecomment-1773441711

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 13:53:14 +01:00
hagen1778
a64b37cf24 docs: rm mention of default values for security HTTP headers
The headers, their corresponding flags are mentioned at
https://docs.victoriametrics.com/#security

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 11:46:17 +01:00
Dima Lazerka
ad839aa492 lib/httpserver: add flags to specify HSTS / Frame-Options / CSP headers for httpserver (#5111)
support `Strict-Transport-Security`, `Content-Security-Policy` and `X-Frame-Options`
HTTP headers in all VictoriaMetrics components. 
The values for headers can be specified by users via the following flags: 
`-http.header.hsts`, `-http.header.csp` and `-http.header.frameOptions`.

Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 11:33:38 +01:00
Roman Khavronenko
29cebd82fb lib/storage: log warning about RO mode only on state change (#5191)
Before, vmstorage would log the same message each second producing excessive
amount of logs.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5159

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-30 10:52:57 +01:00
Aliaksandr Valialkin
9149353a36 app/vmui: change the order of tables at Top queries tab
Move the most interesting table - queries with the most summary time to execute - to the top
2023-10-28 11:56:16 +02:00
Aliaksandr Valialkin
613b545dfd lib/promscrape/discovery/kubernetes: propagate possible errors at newAPIWatcher() to the caller
This allows substituting FATAL panics with recoverable runtime errors such as missing or invalid TLS CA file
and/or missing/invalid /var/run/secrets/kubernetes.io/serviceaccount/namespace file.
Now these errors are logged instead of PANIC'ing, so they can be fixed by updating the corresponding files
without the need to restart vmagent.

This is a follow-up for 90427abc65
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5243
2023-10-27 20:24:46 +02:00
Hui Wang
90427abc65 lib/promscrape/discovery/kubernetes: avoid possible panic if given caFile under kubernetes.SDConfig.HTTPClientConfig is not exist (#5243)
follow up d5a599badc
2023-10-27 20:20:22 +02:00
Aliaksandr Valialkin
632d788b63 lib/promscrape/discovery/kubernetes: stop all the url watchers, which belong to a particular groupWatcher, at once
Previously url watchers for pod, service and node objects could be mistakenly closed
when service discovery was set up only for endpoints and endpointslice roles,
since watchers for these roles may start start pod, service and node url watchers
with nil apiWatcher passed to groupWatcher.startWatchersForRole().

Now all the url watchers, which belong to a particular groupWatcher, are stopped at once
when this groupWatcher has no apiWatcher subscribers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5216

The issue has been introduced in v1.93.5 when addressing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4850
2023-10-27 13:51:35 +02:00
Hui Wang
7c90ce39cb do not print redundant error logs when failed to scrape consul or no… (#5239)
* do not print redundant error logs when failed to scrape consul or nomad target
prometheus performs the same because it uses consul lib which just drops the error(1806bcb38c/api/api.go (L1134))
2023-10-27 13:31:55 +08:00
hagen1778
3aec7eb44f app/vmalert: remove unclear comment
The timestamp alignment should be applied as a last step
to keep the timestamp consistent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-26 15:41:35 +02:00
Daria Karavaieva
b60bb1d98a model list - isolation forest (#5235)
* model list - isolation forest

* curse of dimensionality

* isol forest definition change, minor fixes

* blank line fix
2023-10-26 12:25:54 +02:00
Aliaksandr Valialkin
68b1b3c4d4 Makefile: move build commands for vmalert-tool closer to vmalert
This should simplify maintenance of Makefile commands related to vmalert.

This is a follow-up for dc28196237
2023-10-26 08:07:50 +02:00
Aliaksandr Valialkin
cdbc06a639 lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4 2023-10-25 17:57:56 -07:00
Dima Lazerka
8b41b506c2 Revert "lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4"
It broke CI (lint)

This reverts commit 5464376d16.
2023-10-25 16:24:31 -07:00
Aliaksandr Valialkin
5464376d16 lib/promscrape: do not add a suggestion for enabling TCP6 in error message when the dial address is TCPv4 2023-10-26 00:29:51 +02:00
Aliaksandr Valialkin
ac933cc423 lib/promscrape: properly track the number of updated service discovery routines inside Config.mustRestart()
This is a follow-up for d5a599badc
2023-10-26 00:06:29 +02:00
Aliaksandr Valialkin
612dcf231a lib/promauth: typo fix in the error message after d5a599badc: obtaine -> obtain 2023-10-25 23:38:00 +02:00
Aliaksandr Valialkin
d5a599badc lib/promauth: follow-up for e16d3f5639
- Make sure that invalid/missing TLS CA file or TLS client certificate files at vmagent startup
  don't prevent from processing the corresponding scrape targets after the file becomes correct,
  without the need to restart vmagent.
  Previously scrape targets with invalid TLS CA file or TLS client certificate files
  were permanently dropped after the first attempt to initialize them, and they didn't
  appear until the next vmagent reload or the next change in other places of the loaded scrape configs.

- Make sure that TLS CA is properly re-loaded from file after it changes without the need to restart vmagent.
  Previously the old TLS CA was used until vmagent restart.

- Properly handle errors during http request creation for the second attempt to send data to remote system
  at vmagent and vmalert. Previously failed request creation could result in nil pointer dereferencing,
  since the returned request is nil on error.

- Add more context to the logged error during AWS sigv4 request signing before sending the data to -remoteWrite.url at vmagent.
  Previously it could miss details on the source of the request.

- Do not create a new HTTP client per second when generating OAuth2 token needed to put in Authorization header
  of every http request issued by vmagent during service discovery or target scraping.
  Re-use the HTTP client instead until the corresponding scrape config changes.

- Cache error at lib/promauth.Config.GetAuthHeader() in the same way as the auth header is cached,
  e.g. the error is cached for a second now. This should reduce load on CPU and OAuth2 server
  when auth header cannot be obtained because of temporary error.

- Share tls.Config.GetClientCertificate function among multiple scrape targets with the same tls_config.
  Cache the loaded certificate and the error for one second. This should significantly reduce CPU load
  when scraping big number of targets with the same tls_config.

- Allow loading TLS certificates from HTTP and HTTPs urls by specifying these urls at `tls_config->cert_file` and `tls_config->key_file`.

- Improve test coverage at lib/promauth

- Skip unreachable or invalid files specified at `scrape_config_files` during vmagent startup, since these files may become valid later.
  Previously vmagent was exitting in this case.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4959
2023-10-25 23:19:37 +02:00
Aliaksandr Valialkin
c22e3e7b1d lib/promscrape/discovery/kubernetes/kubeconfig_test.go: make TestParseKubeConfigSuccess test code easier to follow 2023-10-25 23:17:18 +02:00
Aliaksandr Valialkin
eed5206376 lib/promauth: properly parse string contents for ca, cert and key fields at tls_config
Previously yaml parser wasn't accepting string values for these fields,
because it was mistakenly expecting a list of uint8 values instead.
2023-10-25 23:12:21 +02:00
Aliaksandr Valialkin
4afcb2a689 lib/promscrape: move duplicate code from functions, which collect ScrapeWork lists for distinct SD types into Config.getScrapeWorkGeneric()
This removes more than 200 lines of duplicate code
2023-10-25 23:03:40 +02:00
Aliaksandr Valialkin
cb34d4440c app/vmalert/config: fix flacky test TestParseBad
It could return either `failed to read` or `failed to parse` errors depending
on whether the given url can be loaded or not under the current environment
2023-10-25 21:30:55 +02:00
Aliaksandr Valialkin
42dd71bb63 all: consistently use %w instead of %s in when error is passed to fmt.Errorf()
This allows consistently using errors.Is() for verifying whether the given error wraps some other known error.
2023-10-25 21:24:03 +02:00
Aliaksandr Valialkin
305c96e384 lib/workingsetcache: fix outdated comments for Load() and New() functions 2023-10-25 21:04:20 +02:00
hagen1778
c07909a20b app/vmalert: fix typo in tests
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-25 16:28:27 +02:00
hagen1778
eed0c3c6b0 app/vmalert: fix tests after a216fe6728
a216fe6728
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-25 16:25:26 +02:00
hagen1778
a216fe6728 app/vmalert: follow-up after c9375cac5e
c9375cac5e

Descriptions were updated in attempt to make it more clear for readers,
re-phrasing and linking missing docs.

`eval_delay` was added to tests to verify it can be unmarshalled.

`eval_delay` is now applied before timestamp alignment to make it more predictable.
Before, if delay < interval the timestamp won't be aligned.

`eval_delay` and `eval_offset` was added to API output.

`PreviouslySentSeriesToRW` converted to private `previouslySentSeriesToRW`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-25 13:07:13 +02:00
Hui Wang
c9375cac5e vmalert: add -rule.evalDelay flag and eval_delay as group attribute (#5185)
Also mark `-datasource.lookback` as will be deprecated, see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155.
2023-10-25 11:54:18 +02:00
hagen1778
4e0a779efe deployment/alerts: update TooHighMemoryUsage annotation
The memory usage isn't measured on 5m interval anymore.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-24 09:53:44 +02:00
hagen1778
003ef3a518 deployment/alerts: make TooHighMemoryUsage more tolerable to spikes
Using `min_over_time` should reduce the amount of false positives when
component is running in near-the-threshold state. Now it should trigger
only if all collected samples were above the threshold on 10m interval.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-24 09:39:46 +02:00
hagen1778
685f9c3c98 deployment/alerts: make RemoteWriteConnectionIsSaturated expr readable
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-10-24 09:31:57 +02:00
923 changed files with 20726 additions and 26757 deletions

View File

@@ -17,7 +17,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.21.3
go-version: 1.21.4
id: go
- name: Code checkout
uses: actions/checkout@master

View File

@@ -57,7 +57,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.21.3
go-version: 1.21.4
check-latest: true
cache: true
if: ${{ matrix.language == 'go' }}

View File

@@ -32,7 +32,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: 1.21.3
go-version: 1.21.4
check-latest: true
cache: true
@@ -56,7 +56,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: 1.21.3
go-version: 1.21.4
check-latest: true
cache: true
@@ -81,7 +81,7 @@ jobs:
id: go
uses: actions/setup-go@v4
with:
go-version: 1.21.3
go-version: 1.21.4
check-latest: true
cache: true

View File

@@ -8,6 +8,7 @@ on:
workflow_dispatch: {}
env:
PAGEFIND_VERSION: "1.0.3"
HUGO_VERSION: "latest"
permissions:
contents: read # This is required for actions/checkout and to commit back image update
deployments: write
@@ -28,7 +29,7 @@ jobs:
path: docs
- uses: peaceiris/actions-hugo@v2
with:
hugo-version: 'latest'
hugo-version: ${{env.HUGO_VERSION}}
extended: true
- name: Install PageFind #install the static search engine for index build
uses: supplypike/setup-bin@v3

View File

@@ -25,143 +25,143 @@ all: \
victoria-logs-prod \
vmagent-prod \
vmalert-prod \
vmalert-tool-prod \
vmauth-prod \
vmbackup-prod \
vmrestore-prod \
vmctl-prod \
vmalert-tool-prod
vmctl-prod
clean:
rm -rf bin/*
publish: package-base \
publish: \
publish-victoria-metrics \
publish-vmagent \
publish-vmalert \
publish-vmalert-tool \
publish-vmauth \
publish-vmbackup \
publish-vmrestore \
publish-vmctl \
publish-vmalert-tool
publish-vmctl
package: \
package-victoria-metrics \
package-victoria-logs \
package-vmagent \
package-vmalert \
package-vmalert-tool \
package-vmauth \
package-vmbackup \
package-vmrestore \
package-vmctl \
package-vmalert-tool
package-vmctl
vmutils: \
vmagent \
vmalert \
vmalert-tool \
vmauth \
vmbackup \
vmrestore \
vmctl \
vmalert-tool
vmctl
vmutils-pure: \
vmagent-pure \
vmalert-pure \
vmalert-tool-pure \
vmauth-pure \
vmbackup-pure \
vmrestore-pure \
vmctl-pure \
vmalert-tool-pure
vmctl-pure
vmutils-linux-amd64: \
vmagent-linux-amd64 \
vmalert-linux-amd64 \
vmalert-tool-linux-amd64 \
vmauth-linux-amd64 \
vmbackup-linux-amd64 \
vmrestore-linux-amd64 \
vmctl-linux-amd64 \
vmalert-tool-linux-amd64
vmctl-linux-amd64
vmutils-linux-arm64: \
vmagent-linux-arm64 \
vmalert-linux-arm64 \
vmalert-tool-linux-arm64 \
vmauth-linux-arm64 \
vmbackup-linux-arm64 \
vmrestore-linux-arm64 \
vmctl-linux-arm64 \
vmalert-tool-linux-arm64
vmctl-linux-arm64
vmutils-linux-arm: \
vmagent-linux-arm \
vmalert-linux-arm \
vmalert-tool-linux-arm \
vmauth-linux-arm \
vmbackup-linux-arm \
vmrestore-linux-arm \
vmctl-linux-arm \
vmalert-tool-linux-arm
vmctl-linux-arm
vmutils-linux-386: \
vmagent-linux-386 \
vmalert-linux-386 \
vmalert-tool-linux-386 \
vmauth-linux-386 \
vmbackup-linux-386 \
vmrestore-linux-386 \
vmctl-linux-386 \
vmalert-tool-linux-386
vmctl-linux-386
vmutils-linux-ppc64le: \
vmagent-linux-ppc64le \
vmalert-linux-ppc64le \
vmalert-tool-linux-ppc64le \
vmauth-linux-ppc64le \
vmbackup-linux-ppc64le \
vmrestore-linux-ppc64le \
vmctl-linux-ppc64le \
vmalert-tool-linux-ppc64le
vmctl-linux-ppc64le
vmutils-darwin-amd64: \
vmagent-darwin-amd64 \
vmalert-darwin-amd64 \
vmalert-tool-darwin-amd64 \
vmauth-darwin-amd64 \
vmbackup-darwin-amd64 \
vmrestore-darwin-amd64 \
vmctl-darwin-amd64 \
vmalert-tool-darwin-amd64
vmctl-darwin-amd64
vmutils-darwin-arm64: \
vmagent-darwin-arm64 \
vmalert-darwin-arm64 \
vmalert-tool-darwin-arm64 \
vmauth-darwin-arm64 \
vmbackup-darwin-arm64 \
vmrestore-darwin-arm64 \
vmctl-darwin-arm64 \
vmalert-tool-darwin-arm64
vmctl-darwin-arm64
vmutils-freebsd-amd64: \
vmagent-freebsd-amd64 \
vmalert-freebsd-amd64 \
vmalert-tool-freebsd-amd64 \
vmauth-freebsd-amd64 \
vmbackup-freebsd-amd64 \
vmrestore-freebsd-amd64 \
vmctl-freebsd-amd64 \
vmalert-tool-freebsd-amd64
vmctl-freebsd-amd64
vmutils-openbsd-amd64: \
vmagent-openbsd-amd64 \
vmalert-openbsd-amd64 \
vmalert-tool-openbsd-amd64 \
vmauth-openbsd-amd64 \
vmbackup-openbsd-amd64 \
vmrestore-openbsd-amd64 \
vmctl-openbsd-amd64 \
vmalert-tool-openbsd-amd64
vmctl-openbsd-amd64
vmutils-windows-amd64: \
vmagent-windows-amd64 \
vmalert-windows-amd64 \
vmalert-tool-windows-amd64 \
vmauth-windows-amd64 \
vmbackup-windows-amd64 \
vmrestore-windows-amd64 \
vmctl-windows-amd64 \
vmalert-tool-windows-amd64
vmctl-windows-amd64
victoria-metrics-crossbuild: \
victoria-metrics-linux-386 \
@@ -354,72 +354,72 @@ release-vmutils-windows-amd64:
release-vmutils-goos-goarch: \
vmagent-$(GOOS)-$(GOARCH)-prod \
vmalert-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod \
vmauth-$(GOOS)-$(GOARCH)-prod \
vmbackup-$(GOOS)-$(GOARCH)-prod \
vmrestore-$(GOOS)-$(GOARCH)-prod \
vmctl-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod
vmctl-$(GOOS)-$(GOARCH)-prod
cd bin && \
tar --transform="flags=r;s|-$(GOOS)-$(GOARCH)||" -czf vmutils-$(GOOS)-$(GOARCH)-$(PKG_TAG).tar.gz \
vmagent-$(GOOS)-$(GOARCH)-prod \
vmalert-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod \
vmauth-$(GOOS)-$(GOARCH)-prod \
vmbackup-$(GOOS)-$(GOARCH)-prod \
vmrestore-$(GOOS)-$(GOARCH)-prod \
vmctl-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod
&& sha256sum vmutils-$(GOOS)-$(GOARCH)-$(PKG_TAG).tar.gz \
vmagent-$(GOOS)-$(GOARCH)-prod \
vmalert-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod \
vmauth-$(GOOS)-$(GOARCH)-prod \
vmbackup-$(GOOS)-$(GOARCH)-prod \
vmrestore-$(GOOS)-$(GOARCH)-prod \
vmctl-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod \
| sed s/-$(GOOS)-$(GOARCH)-prod/-prod/ > vmutils-$(GOOS)-$(GOARCH)-$(PKG_TAG)_checksums.txt
cd bin && rm -rf \
vmagent-$(GOOS)-$(GOARCH)-prod \
vmalert-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod \
vmauth-$(GOOS)-$(GOARCH)-prod \
vmbackup-$(GOOS)-$(GOARCH)-prod \
vmrestore-$(GOOS)-$(GOARCH)-prod \
vmctl-$(GOOS)-$(GOARCH)-prod \
vmalert-tool-$(GOOS)-$(GOARCH)-prod
vmctl-$(GOOS)-$(GOARCH)-prod
release-vmutils-windows-goarch: \
vmagent-windows-$(GOARCH)-prod \
vmalert-windows-$(GOARCH)-prod \
vmalert-tool-windows-$(GOARCH)-prod \
vmauth-windows-$(GOARCH)-prod \
vmbackup-windows-$(GOARCH)-prod \
vmrestore-windows-$(GOARCH)-prod \
vmctl-windows-$(GOARCH)-prod \
vmalert-tool-windows-$(GOARCH)-prod
vmctl-windows-$(GOARCH)-prod
cd bin && \
zip vmutils-windows-$(GOARCH)-$(PKG_TAG).zip \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmalert-tool-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmbackup-windows-$(GOARCH)-prod.exe \
vmrestore-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
vmalert-tool-windows-$(GOARCH)-prod.exe \
&& sha256sum vmutils-windows-$(GOARCH)-$(PKG_TAG).zip \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmalert-tool-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmbackup-windows-$(GOARCH)-prod.exe \
vmrestore-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
vmalert-tool-windows-$(GOARCH)-prod.exe \
> vmutils-windows-$(GOARCH)-$(PKG_TAG)_checksums.txt
cd bin && rm -rf \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmalert-tool-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmbackup-windows-$(GOARCH)-prod.exe \
vmrestore-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
vmalert-tool-windows-$(GOARCH)-prod.exe
vmctl-windows-$(GOARCH)-prod.exe
pprof-cpu:
go tool pprof -trim_path=github.com/VictoriaMetrics/VictoriaMetrics@ $(PPROF_FILE)
@@ -486,7 +486,7 @@ golangci-lint: install-golangci-lint
golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.54.2
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.55.1
govulncheck: install-govulncheck
govulncheck ./...

133
README.md
View File

@@ -338,7 +338,8 @@ which can be used as faster and less resource-hungry alternative to Prometheus.
## Grafana setup
Create [Prometheus datasource](http://docs.grafana.org/features/datasources/prometheus/) in Grafana with the following url:
Create [Prometheus datasource](https://grafana.com/docs/grafana/latest/datasources/prometheus/configure-prometheus-data-source/)
in Grafana with the following url:
```url
http://<victoriametrics-addr>:8428
@@ -352,6 +353,9 @@ or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
Alternatively, use VictoriaMetrics [datasource plugin](https://github.com/VictoriaMetrics/grafana-datasource) with support of extra features.
See more in [description](https://github.com/VictoriaMetrics/grafana-datasource#victoriametrics-data-source-for-grafana).
Creating a datasource may require [specific permissions](https://grafana.com/docs/grafana/latest/administration/data-source-management/).
If you don't see an option to create a data source - try contacting system administrator.
## How to upgrade VictoriaMetrics
VictoriaMetrics is developed at a fast pace, so it is recommended periodically checking [the CHANGELOG page](https://docs.victoriametrics.com/CHANGELOG.html) and performing regular upgrades.
@@ -442,7 +446,7 @@ This information is obtained from the `/api/v1/status/active_queries` HTTP endpo
[VMUI](#vmui) provides an ability to explore metrics exported by a particular `job` / `instance` in the following way:
1. Open the `vmui` at `http://victoriametrics:8428/vmui/`.
1. Click the `Explore metrics` tab.
1. Click the `Explore Prometheus metrics` tab.
1. Select the `job` you want to explore.
1. Optionally select the `instance` for the selected job to explore.
1. Select metrics you want to explore and compare.
@@ -1126,6 +1130,18 @@ For example, the following command builds the image on top of [scratch](https://
ROOT_IMAGE=scratch make package-victoria-metrics
```
#### Building VictoriaMetrics with Podman
VictoriaMetrics can be built with Podman in either rootful or rootless mode.
When building via rootlful Podman, simply add `DOCKER=podman` to the relevant `make` commandline. To build
via rootless Podman, add `DOCKER=podman DOCKER_RUN="podman run --userns=keep-id"` to the `make`
commandline.
For example: `make victoria-metrics-pure DOCKER=podman DOCKER_RUN="podman run --userns=keep-id"`
Note that `production` builds are not supported via Podman becuase Podman does not support `buildx`.
## Start with docker-compose
[Docker-compose](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/docker-compose.yml)
@@ -1152,6 +1168,15 @@ Snapshots are created under `<-storageDataPath>/snapshots` directory, where `<-s
is the command-line flag value. Snapshots can be archived to backup storage at any time
with [vmbackup](https://docs.victoriametrics.com/vmbackup.html).
Snapshots consist of a mix of hard-links and soft-links to various files and directories inside `-storageDataPath`.
See [this article](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
for more details. This adds some restrictions on what can be done with the contents of `<-storageDataPath>/snapshots` directory:
- Do not delete subdirectories inside `<-storageDataPath>/snapshots` with `rm` or similar commands, since this will leave some snapshot data undeleted.
Prefer using the `/snapshot/delete` API for deleting snapshot. See below for more details about this API.
- Do not copy subdirectories inside `<-storageDataPath>/snapshot` with `cp`, `rsync` or similar commands, since there are high chances
that these commands won't copy some data stored in the snapshot. Prefer using [vmbackup](https://docs.victoriametrics.com/vmbackup.html) for making copies of snapshot data.
The `http://<victoriametrics-addr>:8428/snapshot/list` page contains the list of available snapshots.
Navigate to `http://<victoriametrics-addr>:8428/snapshot/delete?snapshot=<snapshot-name>` in order
@@ -1674,43 +1699,44 @@ See also [cardinality limiter](#cardinality-limiter) and [capacity planning docs
## High availability
* Install multiple VictoriaMetrics instances in distinct datacenters (availability zones).
* Pass addresses of these instances to [vmagent](https://docs.victoriametrics.com/vmagent.html) via `-remoteWrite.url` command-line flag:
The general approach for achieving high availability is the following:
- to run two identically configured VictoriaMetrics instances in distinct datacenters (availability zones)
- to store the collected data simultaneously into these instances via [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus
- to query the first VictoriaMetrics instance and to fail over to the second instance when the first instance becomes temporarily unavailable.
Such a setup guarantees that the collected data isn't lost when one of VictoriaMetrics instance becomes unavailable.
The collected data continues to be written to the available VictoriaMetrics instance, so it should be available for querying.
Both [vmagent](https://docs.victoriametrics.com/vmagent.html) and Prometheus buffer the collected data locally if they cannot send it
to the configured remote storage. So the collected data will be written to the temporarily unavailable VictoriaMetrics instance
after it becomes available.
If you use [vmagent](https://docs.victoriametrics.com/vmagent.html) for storing the data into VictoriaMetrics,
then it can be configured with multiple `-remoteWrite.url` command-line flags, where every flag points to the VictoriaMetrics
instance in a particular availability zone, in order to replicate the collected data to all the VictoriaMetrics instances.
For example, the following command instructs `vmagent` to replicate data to `vm-az1` and `vm-az2` instances of VictoriaMetrics:
```console
/path/to/vmagent -remoteWrite.url=http://<victoriametrics-addr-1>:8428/api/v1/write -remoteWrite.url=http://<victoriametrics-addr-2>:8428/api/v1/write
/path/to/vmagent \
-remoteWrite.url=http://<vm-az1>:8428/api/v1/write \
-remoteWrite.url=http://<vm-az2>:8428/api/v1/write
```
Alternatively these addresses may be passed to `remote_write` section in Prometheus config:
If you use Prometheus for collecting and writing the data to VictoriaMetrics,
then the following [`remote_write`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) section
in Prometheus config can be used for replicating the collected data to `vm-az1` and `vm-az2` VictoriaMetrics instances:
```yml
remote_write:
- url: http://<victoriametrics-addr-1>:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
# ...
- url: http://<victoriametrics-addr-N>:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
- url: http://<vm-az1>:8428/api/v1/write
- url: http://<vm-az2>:8428/api/v1/write
```
* Apply the updated config:
It is recommended to use [vmagent](https://docs.victoriametrics.com/vmagent.html) instead of Prometheus for highly loaded setups,
since it uses lower amounts of RAM, CPU and network bandwidth than Prometheus.
```console
kill -HUP `pidof prometheus`
```
It is recommended to use [vmagent](https://docs.victoriametrics.com/vmagent.html) instead of Prometheus for highly loaded setups.
* Now Prometheus should write data into all the configured `remote_write` urls in parallel.
* Set up [Promxy](https://github.com/jacksontj/promxy) in front of all the VictoriaMetrics replicas.
* Set up Prometheus datasource in Grafana that points to Promxy.
If you have Prometheus HA pairs with replicas `r1` and `r2` in each pair, then configure each `r1`
to write data to `victoriametrics-addr-1`, while each `r2` should write data to `victoriametrics-addr-2`.
Another option is to write data simultaneously from Prometheus HA pair to a pair of VictoriaMetrics instances
with the enabled de-duplication. See [this section](#deduplication) for details.
If you use identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) instances for collecting the same data
and sending it to VictoriaMetrics, then do not forget enabling [deduplication](#deduplication) at VictoriaMetrics side.
## Deduplication
@@ -1897,8 +1923,23 @@ See how to request a free trial license [here](https://victoriametrics.com/produ
* `-downsampling.period=30d:5m,180d:1h` instructs VictoriaMetrics to deduplicate samples older than 30 days with 5 minutes interval and to deduplicate samples older than 180 days with 1 hour interval.
Downsampling is applied independently per each time series. It can reduce disk space usage and improve query performance if it is applied to time series with big number of samples per each series. The downsampling doesn't improve query performance if the database contains big number of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)), since downsampling doesn't reduce the number of time series. So the majority of time is spent on searching for the matching time series.
It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in vmagent or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to [reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).
Downsampling is applied independently per each time series and leaves a single [raw sample](https://docs.victoriametrics.com/keyConcepts.html#raw-samples)
with the biggest [timestamp](https://en.wikipedia.org/wiki/Unix_time) on the configured interval, in the same way as [deduplication](#deduplication) does.
It works the best for [counters](https://docs.victoriametrics.com/keyConcepts.html#counter) and [histograms](https://docs.victoriametrics.com/keyConcepts.html#histogram),
as their values are always increasing. But downsampling [gauges](https://docs.victoriametrics.com/keyConcepts.html#gauge)
and [summaries](https://docs.victoriametrics.com/keyConcepts.html#summary)
would mean losing the changes within the downsampling interval. Please note, you can use [recording rules](https://docs.victoriametrics.com/vmalert.html#rules)
or [steaming aggregation](https://docs.victoriametrics.com/stream-aggregation.html)
to apply custom aggregation functions, like min/max/avg etc., in order to make gauges more resilient to downsampling.
Downsampling can reduce disk space usage and improve query performance if it is applied to time series with big number
of samples per each series. The downsampling doesn't improve query performance if the database contains big number
of time series with small number of samples per each series (aka [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)),
since downsampling doesn't reduce the number of time series. In this case the majority of query time is spent on searching for the matching time series
instead of processing the found samples.
It is possible to use [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) in [vmagent](https://docs.victoriametrics.com/vmagent.html)
or recording rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) in order to
[reduce the number of time series](https://docs.victoriametrics.com/vmalert.html#downsampling-and-aggregation-via-vmalert).
Downsampling happens during [background merges](https://docs.victoriametrics.com/#storage)
and can't be performed if there is not enough of free disk space or if vmstorage
@@ -1956,6 +1997,8 @@ VictoriaMetrics provides the following security-related command-line flags:
* `-flagsAuthKey` for protecting `/flags` endpoint.
* `-pprofAuthKey` for protecting `/debug/pprof/*` endpoints, which can be used for [profiling](#profiling).
* `-denyQueryTracing` for disallowing [query tracing](#query-tracing).
* `-http.header.hsts`, `-http.header.csp`, and `-http.header.frameOptions` for serving `Strict-Transport-Security`, `Content-Security-Policy`
and `X-Frame-Options` HTTP response headers.
Explicitly set internal network interface for TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats.
For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<internal_iface_ip>:2003`. This protects from unexpected requests from untrusted network interfaces.
@@ -2487,6 +2530,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
```
-bigMergeConcurrency int
Deprecated: this flag does nothing. Please use -smallMergeConcurrency for controlling the concurrency of background merges. See https://docs.victoriametrics.com/#storage
-blockcache.missesBeforeCaching int
The number of cache misses before putting the block into cache. Higher values may reduce indexdb/dataBlocks cache size at the cost of higher CPU and disk read usage (default 2)
-cacheExpireDuration duration
Items are removed from in-memory caches after they aren't accessed for this duration. Lower values may reduce memory usage at the cost of higher CPU usage. See also -prevCacheRemovalPercent (default 30m0s)
-configAuthKey string
@@ -2507,7 +2552,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-denyQueryTracing
Whether to disable the ability to trace queries. See https://docs.victoriametrics.com/#query-tracing
-downsampling.period array
Comma-separated downsampling periods in the format 'offset:period'. For example, '30d:10m' instructs to leave a single sample per 10 minutes for samples older than 30 days. When setting multiple downsampling periods, it is necessary for the periods to be multiples of each other. See https://docs.victoriametrics.com/#downsampling for details. This flag is available only in Enterprise binaries. See https://docs.victoriametrics.com/enterprise.html
Comma-separated downsampling periods in the format 'offset:period'. For example, '30d:10m' instructs to leave a single sample per 10 minutes for samples older than 30 days. When setting multiple downsampling periods, it is necessary for the periods to be multiples of each other. See https://docs.victoriametrics.com/#downsampling for details. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
Supports an array of values separated by comma or specified via multiple flags.
-dryRun
Whether to check config files without running VictoriaMetrics. The following config files are checked: -promscrape.config, -relabelConfig and -streamAggr.config. Unknown config entries aren't allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse=false command-line flag
@@ -2541,6 +2586,12 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -2608,6 +2659,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerMaxArgLen int
The maximum length of a single logged argument. Longer arguments are replaced with 'arg_start..arg_end', where 'arg_start' and 'arg_end' is prefix and suffix of the arg with the length not exceeding -loggerMaxArgLen / 2 (default 500)
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
@@ -2630,6 +2683,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-newrelic.maxInsertRequestSize size
The maximum size in bytes of a single NewRelic request to /newrelic/infra/v2/metrics/events/bulk
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
-opentsdbHTTPListenAddr string
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
@@ -2670,7 +2726,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.config.strictParse
Whether to deny unsupported fields in -promscrape.config . Set to false in order to silently skip unsupported fields (default true)
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes
Interval for checking for changes in -promscrape.config file. By default, the checking is disabled. See how to reload -promscrape.config file at https://docs.victoriametrics.com/vmagent.html#configuration-update
-promscrape.consul.waitTime duration
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
@@ -2753,11 +2809,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. The path can point either to local file or to http url. See https://docs.victoriametrics.com/#relabeling for details. The config is reloaded on SIGHUP signal
-retentionFilter array
Retention filter in the format 'filter:retention'. For example, '{env="dev"}:3d' configures the retention for time series with env="dev" label to 3 days. See https://docs.victoriametrics.com/#retention-filters for details. This flag is available only in Enterprise binaries. See https://docs.victoriametrics.com/enterprise.html
Retention filter in the format 'filter:retention'. For example, '{env="dev"}:3d' configures the retention for time series with env="dev" label to 3 days. See https://docs.victoriametrics.com/#retention-filters for details. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
Supports an array of values separated by comma or specified via multiple flags.
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted. The minimum retentionPeriod is 24h or 1d. See also -retentionFilter
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
The following optional suffixes are supported: s (second), m (minute), h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
-retentionTimezoneOffset duration
The offset for performing indexdb rotation. If set to 0, then the indexdb rotation is performed at 4am UTC time per each -retentionPeriod. If set to 2h, then the indexdb rotation is performed at 4am EET time (the timezone with +2h offset)
-search.cacheTimestampOffset duration
@@ -2773,7 +2829,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-search.latencyOffset duration
The time when data points become visible in query results after the collection. It can be overridden on per-query basis via latency_offset arg. Too small value can result in incomplete last points for query results (default 30s)
-search.logQueryMemoryUsage size
Log queries, which require more memory than specified by this flag. This may help detecting and optimizing heavy queries. Query logging is disabled by default. See also -search.logSlowQueryDuration and -search.maxMemoryPerQuery
Log query and increment vm_memory_intensive_queries_total metric each time the query requires more memory than specified by this flag. This may help detecting and optimizing heavy queries. Query logging is disabled by default. See also -search.logSlowQueryDuration and -search.maxMemoryPerQuery
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging. See also -search.logQueryMemoryUsage (default 5s)
@@ -2835,6 +2891,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The maximum number of CPU cores a single query can use. The default value should work good for most cases. The flag can be set to lower values for improving performance of big number of concurrently executed queries. The flag can be set to bigger values for improving performance of heavy queries, which scan big number of time series (>10K) and/or big number of samples (>100M). There is no sense in setting this flag to values bigger than the number of CPU cores available on the system (default 4)
-search.minStalenessInterval duration
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
-search.minWindowForInstantRollupOptimization value
Enable cache-based optimization for repeated queries to /api/v1/query (aka instant queries), which contain rollup functions with lookbehind window exceeding the given value
The following optional suffixes are supported: s (second), m (minute), h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 6h)
-search.noStaleMarkers
Set this flag to true if the database doesn't contain Prometheus stale markers, so there is no need in spending additional CPU time on its handling. Staleness markers may exist only in data obtained from Prometheus scrape targets
-search.queryStats.lastQueriesCount int
@@ -2861,7 +2920,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The timeout for creating new snapshot. If set, make sure that timeout is lower than backup period
-snapshotsMaxAge value
Automatically delete snapshots older than -snapshotsMaxAge if it is set to non-zero duration. Make sure that backup process has enough time to finish the backup before the corresponding snapshot is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 0)
The following optional suffixes are supported: s (second), m (minute), h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 0)
-sortLabels
Whether to sort labels for incoming samples before writing them to storage. This may be needed for reducing memory usage at storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}. Enabled sorting for labels can slow down ingestion performance a bit
-storage.cacheSizeIndexDBDataBlocks size

View File

@@ -120,10 +120,10 @@ func compressData(s string) string {
var bb bytes.Buffer
zw := gzip.NewWriter(&bb)
if _, err := zw.Write([]byte(s)); err != nil {
panic(fmt.Errorf("unexpected error when compressing data: %s", err))
panic(fmt.Errorf("unexpected error when compressing data: %w", err))
}
if err := zw.Close(); err != nil {
panic(fmt.Errorf("unexpected error when closing gzip writer: %s", err))
panic(fmt.Errorf("unexpected error when closing gzip writer: %w", err))
}
return bb.String()
}

View File

@@ -43,7 +43,7 @@ func benchmarkReadBulkRequest(b *testing.B, isGzip bool) {
r.Reset(dataBytes)
_, err := readBulkRequest(r, isGzip, timeField, msgField, processLogMessage)
if err != nil {
panic(fmt.Errorf("unexpected error: %s", err))
panic(fmt.Errorf("unexpected error: %w", err))
}
}
})

View File

@@ -29,7 +29,7 @@ func benchmarkParseJSONRequest(b *testing.B, streams, rows, labels int) {
for pb.Next() {
_, err := parseJSONRequest(data, func(timestamp int64, fields []logstorage.Field) {})
if err != nil {
panic(fmt.Errorf("unexpected error: %s", err))
panic(fmt.Errorf("unexpected error: %w", err))
}
}
})

View File

@@ -84,7 +84,7 @@ func parseProtobufRequest(data []byte, processLogMessage func(timestamp int64, f
err = req.Unmarshal(bb.B)
if err != nil {
return 0, fmt.Errorf("cannot parse request body: %s", err)
return 0, fmt.Errorf("cannot parse request body: %w", err)
}
var commonFields []logstorage.Field
@@ -97,7 +97,7 @@ func parseProtobufRequest(data []byte, processLogMessage func(timestamp int64, f
// Labels are same for all entries in the stream.
commonFields, err = parsePromLabels(commonFields[:0], stream.Labels)
if err != nil {
return rowsIngested, fmt.Errorf("cannot parse stream labels %q: %s", stream.Labels, err)
return rowsIngested, fmt.Errorf("cannot parse stream labels %q: %w", stream.Labels, err)
}
fields := commonFields

View File

@@ -31,7 +31,7 @@ func benchmarkParseProtobufRequest(b *testing.B, streams, rows, labels int) {
for pb.Next() {
_, err := parseProtobufRequest(body, func(timestamp int64, fields []logstorage.Field) {})
if err != nil {
panic(fmt.Errorf("unexpected error: %s", err))
panic(fmt.Errorf("unexpected error: %w", err))
}
}
})

View File

@@ -1,13 +1,13 @@
{
"files": {
"main.css": "./static/css/main.9a224445.css",
"main.js": "./static/js/main.02178f4b.js",
"static/js/522.b5ae4365.chunk.js": "./static/js/522.b5ae4365.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.957b90ab4cb4852eec26.md",
"main.css": "./static/css/main.d1313636.css",
"main.js": "./static/js/main.1919fefe.js",
"static/js/522.da77e7b3.chunk.js": "./static/js/522.da77e7b3.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.8644fd7c964802dd34a9.md",
"index.html": "./index.html"
},
"entrypoints": [
"static/css/main.9a224445.css",
"static/js/main.02178f4b.js"
"static/css/main.d1313636.css",
"static/js/main.1919fefe.js"
]
}

View File

@@ -1 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.02178f4b.js"></script><link href="./static/css/main.9a224445.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.1919fefe.js"></script><link href="./static/css/main.d1313636.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -7,7 +7,7 @@
/*! regenerator-runtime -- Copyright (c) 2014-present, Facebook, Inc. -- license (MIT): https://github.com/facebook/regenerator/blob/main/LICENSE */
/**
* @remix-run/router v1.7.2
* @remix-run/router v1.10.0
*
* Copyright (c) Remix Software Inc.
*
@@ -18,7 +18,7 @@
*/
/**
* React Router DOM v6.14.2
* React Router DOM v6.17.0
*
* Copyright (c) Remix Software Inc.
*
@@ -29,7 +29,7 @@
*/
/**
* React Router v6.14.2
* React Router v6.17.0
*
* Copyright (c) Remix Software Inc.
*

View File

@@ -1,11 +1,11 @@
---
sort: 14
weight: 14
sort: 23
weight: 23
title: MetricsQL
menu:
docs:
parent: "victoriametrics"
weight: 14
parent: 'victoriametrics'
weight: 23
aliases:
- /ExtendedPromQL.html
- /MetricsQL.html
@@ -21,7 +21,8 @@ However, there are some [intentional differences](https://medium.com/@romanhavro
[Standalone MetricsQL package](https://godoc.org/github.com/VictoriaMetrics/metricsql) can be used for parsing MetricsQL in external apps.
If you are unfamiliar with PromQL, then it is suggested reading [this tutorial for beginners](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085).
If you are unfamiliar with PromQL, then it is suggested reading [this tutorial for beginners](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085)
and introduction into [basic querying via MetricsQL](https://docs.victoriametrics.com/keyConcepts.html#metricsql).
The following functionality is implemented differently in MetricsQL compared to PromQL. This improves user experience:
@@ -109,7 +110,7 @@ The list of MetricsQL features on top of PromQL:
* [histogram_quantile](#histogram_quantile) accepts optional third arg - `boundsLabel`.
In this case it returns `lower` and `upper` bounds for the estimated percentile.
See [this issue for details](https://github.com/prometheus/prometheus/issues/5706).
* `default` binary operator. `q1 default q2` fills gaps in `q1` with the corresponding values from `q2`.
* `default` binary operator. `q1 default q2` fills gaps in `q1` with the corresponding values from `q2`. See also [drop_empty_series](#drop_empty_series).
* `if` binary operator. `q1 if q2` removes values from `q1` for missing values from `q2`.
* `ifnot` binary operator. `q1 ifnot q2` removes values from `q1` for existing values from `q2`.
* `WITH` templates. This feature simplifies writing and managing complex queries.
@@ -531,7 +532,7 @@ See also [duration_over_time](#duration_over_time) and [lag](#lag).
`mad_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which calculates [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation)
over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
See also [mad](#mad) and [range_mad](#range_mad).
See also [mad](#mad), [range_mad](#range_mad) and [outlier_iqr_over_time](#outlier_iqr_over_time).
#### max_over_time
@@ -561,6 +562,18 @@ This function is supported by PromQL. See also [tmin_over_time](#tmin_over_time)
for raw samples on the given lookbehind window `d`. It is calculated individually per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering). It is expected that raw sample values are discrete.
#### outlier_iqr_over_time
`outlier_iqr_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the last sample on the given lookbehind window `d`
if its value is either smaller than the `q25-1.5*iqr` or bigger than `q75+1.5*iqr` where:
- `iqr` is an [Interquartile range](https://en.wikipedia.org/wiki/Interquartile_range) over raw samples on the lookbehind window `d`
- `q25` and `q75` are 25th and 75th [percentiles](https://en.wikipedia.org/wiki/Percentile) over raw samples on the lookbehind window `d`.
The `outlier_iqr_over_time()` is useful for detecting anomalies in gauge values based on the previous history of values.
For example, `outlier_iqr_over_time(memory_usage_bytes[1h])` triggers when `memory_usage_bytes` suddenly goes outside the usual value range for the last 24 hours.
See also [outliers_iqr](#outliers_iqr).
#### predict_linear
`predict_linear(series_selector[d], t)` is a [rollup function](#rollup-functions), which calculates the value `t` seconds in the future using
@@ -865,7 +878,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [zscore](#zscore) and [range_trim_zscore](#range_trim_zscore).
See also [zscore](#zscore), [range_trim_zscore](#range_trim_zscore) and [outlier_iqr_over_time](#outlier_iqr_over_time).
### Transform functions
@@ -1055,6 +1068,17 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
This function is supported by PromQL. See also [rad](#rad).
#### drop_empty_series
`drop_empty_series(q)` is a [transform function](#transform-functions), which drops empty series from `q`.
This function can be used when `default` operator should be applied only to non-empty series. For example,
`drop_empty_series(temperature < 30) default 42` returns series, which have at least a single sample smaller than 30 on the selected time range,
while filling gaps in the returned series with 42.
On the other hand `(temperature < 30) default 40` returns all the `temperature` series, even if they have no samples smaller than 30,
by replacing all the values bigger or equal to 30 with 40.
#### end
`end()` is a [transform function](#transform-functions), which returns the unix timestamp in seconds for the last point.
@@ -1591,7 +1615,7 @@ which maps `label` values from `src_*` to `dst*` for all the time series returne
which drops time series from `q` with `label` not matching the given `regexp`.
This function can be useful after [rollup](#rollup)-like functions, which may return multiple time series for every input series.
See also [label_mismatch](#label_mismatch).
See also [label_mismatch](#label_mismatch) and [labels_equal](#labels_equal).
#### label_mismatch
@@ -1599,7 +1623,7 @@ See also [label_mismatch](#label_mismatch).
which drops time series from `q` with `label` matching the given `regexp`.
This function can be useful after [rollup](#rollup)-like functions, which may return multiple time series for every input series.
See also [label_match](#label_match).
See also [label_match](#label_match) and [labels_equal](#labels_equal).
#### label_move
@@ -1642,23 +1666,30 @@ for the given `label` for every time series returned by `q`.
For example, if `label_value(foo, "bar")` is applied to `foo{bar="1.234"}`, then it will return a time series
`foo{bar="1.234"}` with `1.234` value. Function will return no data for non-numeric label values.
#### labels_equal
`labels_equal(q, "label1", "label2", ...)` is [label manipulation function](#label-manipulation-functions), which returns `q` series with identical values for the listed labels
"label1", "label2", etc.
See also [label_match](#label_match) and [label_mismatch](#label_mismatch).
#### sort_by_label
`sort_by_label(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels.
`sort_by_label(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels.
For example, `sort_by_label(foo, "bar")` would sort `foo` series by values of the label `bar` in these series.
See also [sort_by_label_desc](#sort_by_label_desc) and [sort_by_label_numeric](#sort_by_label_numeric).
#### sort_by_label_desc
`sort_by_label_desc(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order by the given set of labels.
`sort_by_label_desc(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order by the given set of labels.
For example, `sort_by_label(foo, "bar")` would sort `foo` series by values of the label `bar` in these series.
See also [sort_by_label](#sort_by_label) and [sort_by_label_numeric_desc](#sort_by_label_numeric_desc).
#### sort_by_label_numeric
`sort_by_label_numeric(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels
`sort_by_label_numeric(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels
using [numeric sort](https://www.gnu.org/software/coreutils/manual/html_node/Version-sort-is-not-the-same-as-numeric-sort.html).
For example, if `foo` series have `bar` label with values `1`, `101`, `15` and `2`, then `sort_by_label_numeric(foo, "bar")` would return series
in the following order of `bar` label values: `1`, `2`, `15` and `101`.
@@ -1667,7 +1698,7 @@ See also [sort_by_label_numeric_desc](#sort_by_label_numeric_desc) and [sort_by_
#### sort_by_label_numeric_desc
`sort_by_label_numeric_desc(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order
`sort_by_label_numeric_desc(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order
by the given set of labels using [numeric sort](https://www.gnu.org/software/coreutils/manual/html_node/Version-sort-is-not-the-same-as-numeric-sort.html).
For example, if `foo` series have `bar` label with values `1`, `101`, `15` and `2`, then `sort_by_label_numeric(foo, "bar")`
would return series in the following order of `bar` label values: `101`, `15`, `2` and `1`.
@@ -1839,20 +1870,33 @@ This function is supported by PromQL.
`mode(q) by (group_labels)` is [aggregate function](#aggregate-functions), which returns [mode](https://en.wikipedia.org/wiki/Mode_(statistics))
per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
#### outliers_iqr
`outliers_iqr(q)` is [aggregate function](#aggregate-functions), which returns time series from `q` with at least a single point
outside e.g. [Interquartile range outlier bounds](https://en.wikipedia.org/wiki/Interquartile_range) `[q25-1.5*iqr .. q75+1.5*iqr]`
comparing to other time series at the given point, where:
- `iqr` is an [Interquartile range](https://en.wikipedia.org/wiki/Interquartile_range) calculated independently per each point on the graph across `q` series.
- `q25` and `q75` are 25th and 75th [percentiles](https://en.wikipedia.org/wiki/Percentile) calculated independently per each point on the graph across `q` series.
The `outliers_iqr()` is useful for detecting anomalous series in the group of series. For example, `outliers_iqr(temperature) by (country)` returns
per-country series with anomalous outlier values comparing to the rest of per-country series.
See also [outliers_mad](#outliers_mad), [outliersk](#outliersk) and [outlier_iqr_over_time](#outlier_iqr_over_time).
#### outliers_mad
`outliers_mad(tolerance, q)` is [aggregate function](#aggregate-functions), which returns time series from `q` with at least
a single point outside [Median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) (aka MAD) multiplied by `tolerance`.
E.g. it returns time series with at least a single point below `median(q) - mad(q)` or a single point above `median(q) + mad(q)`.
See also [outliersk](#outliersk) and [mad](#mad).
See also [outliers_iqr](#outliers_iqr), [outliersk](#outliersk) and [mad](#mad).
#### outliersk
`outliersk(k, q)` is [aggregate function](#aggregate-functions), which returns up to `k` time series with the biggest standard deviation (aka outliers)
out of time series returned by `q`.
See also [outliers_mad](#outliers_mad).
See also [outliers_iqr](#outliers_iqr) and [outliers_mad](#outliers_mad).
#### quantile
@@ -1972,7 +2016,7 @@ See also [bottomk_min](#bottomk_min).
per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
This function is useful for detecting anomalies in the group of related time series.
See also [zscore_over_time](#zscore_over_time) and [range_trim_zscore](#range_trim_zscore).
See also [zscore_over_time](#zscore_over_time), [range_trim_zscore](#range_trim_zscore) and [outliers_iqr](#outliers_iqr).
## Subqueries

View File

@@ -60,7 +60,7 @@ and sending the data to the Prometheus-compatible remote storage:
Example command for writing the data received via [supported push-based protocols](#how-to-push-data-to-vmagent)
to [single-node VictoriaMetrics](https://docs.victoriametrics.com/) located at `victoria-metrics-host:8428`:
```console
```bash
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
@@ -69,7 +69,7 @@ the data to [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-V
Example command for scraping Prometheus targets and writing the data to single-node VictoriaMetrics:
```console
```bash
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
@@ -110,7 +110,7 @@ additionally to pull-based Prometheus-compatible targets' scraping:
* Sending `SIGHUP` signal to `vmagent` process:
```console
```bash
kill -SIGHUP `pidof vmagent`
```
@@ -173,6 +173,13 @@ by routing outgoing samples for the same time series of [counter](https://docs.v
and [histogram](https://docs.victoriametrics.com/keyConcepts.html#histogram) types from top-level `vmagent` instances
to the same second-level `vmagent` instance, so they are aggregated properly.
If `-remoteWrite.shardByURL` command-line flag is set, then all the metric labels are used for even sharding
among remote storage systems specified in `-remoteWrite.url`. Sometimes it may be needed to use only a particular
set of labels for sharding. For example, it may be needed to route all the metrics with the same `instance` label
to the same `-remoteWrite.url`. In this case you can specify comma-separated list of these labels in the `-remoteWrite.shardByURLLabels`
command-line flag. For example, `-remoteWrite.shardByURLLabels=instance,__name__` would shard metrics with the same name and `instance`
label to the same `-remoteWrite.url`.
See also [how to scrape big number of targets](#scraping-big-number-of-targets).
### Relabeling and filtering
@@ -318,7 +325,7 @@ in the `scrape_config_files` section of `-promscrape.config` file. For example,
loading scrape configs from all the `*.yml` files under `configs` directory, from `single_scrape_config.yml` local file
and from `https://config-server/scrape_config.yml` url:
```yml
```yaml
scrape_config_files:
- configs/*.yml
- single_scrape_config.yml
@@ -328,7 +335,7 @@ scrape_config_files:
Every referred file can contain arbitrary number of [supported scrape configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs).
There is no need in specifying top-level `scrape_configs` section in these files. For example:
```yml
```yaml
- job_name: foo
static_configs:
- targets: ["vmagent:8429"]
@@ -368,7 +375,7 @@ Extra labels can be added to metrics collected by `vmagent` via the following me
For example, the following command starts `vmagent`, which adds `{datacenter="foobar"}` label to all the metrics pushed
to all the configured remote storage systems (all the `-remoteWrite.url` flag values):
```
```bash
/path/to/vmagent -remoteWrite.label=datacenter=foobar ...
```
@@ -496,13 +503,16 @@ with [additional enhancements](#relabeling-enhancements). The relabeling can be
This relabeling can be debugged via `http://vmagent:8429/metric-relabel-debug` page. See [these docs](#relabel-debug) for details.
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is used for modifying labels for metrics
and for dropping unneeded metrics before sending them to a particular `-remoteWrite.url`.
and for dropping unneeded metrics before sending them to the particular `-remoteWrite.url`.
This relabeling can be debugged via `http://vmagent:8429/metric-relabel-debug` page. See [these docs](#relabel-debug) for details.
All the files with relabeling configs can contain special placeholders in the form `%{ENV_VAR}`,
which are replaced by the corresponding environment variable values.
[Streaming aggregation](https://docs.victoriametrics.com/stream-aggregation.html), if configured,
is pefrormed after applying all the relabeling stages mentioned above.
The following articles contain useful information about Prometheus relabeling:
* [Cookbook for common relabeling tasks](https://docs.victoriametrics.com/relabeling.html)
@@ -733,7 +743,7 @@ stream parsing mode can be explicitly enabled in the following places:
Examples:
```yml
```yaml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
@@ -760,7 +770,7 @@ Each `vmagent` instance in the cluster must use identical `-promscrape.config` f
in the range `0 ... N-1`, where `N` is the number of `vmagent` instances in the cluster specified via `-promscrape.cluster.membersCount`.
For example, the following commands spread scrape targets among a cluster of two `vmagent` instances:
```
```text
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
```
@@ -772,7 +782,7 @@ By default, each scrape target is scraped only by a single `vmagent` instance in
then `-promscrape.cluster.replicationFactor` command-line flag must be set to the desired number of replicas. For example, the following commands
start a cluster of three `vmagent` instances, where each target is scraped by two `vmagent` instances:
```
```text
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...
@@ -786,7 +796,7 @@ The `-promscrape.cluster.memberLabel` command-line flag allows specifying a name
The value of the `member num` label is set to `-promscrape.cluster.memberNum`. For example, the following config instructs adding `vmagent_instance="0"` label
to all the metrics scraped by the given `vmagent` instance:
```
```text
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=0 -promscrape.cluster.memberLabel=vmagent_instance
```
@@ -813,7 +823,7 @@ See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679)
`vmagent` supports scraping targets via http, https and socks5 proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
```yaml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
@@ -830,7 +840,7 @@ Proxy can be configured with the following optional settings:
For example:
```yml
```yaml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
@@ -980,7 +990,7 @@ If you have suggestions for improvements or have found a bug - please open an is
* By default `vmagent` evenly spreads scrape load in time. If a particular scrape target must be scraped at the beginning of some interval,
then `scrape_align_interval` option must be used. For example, the following config aligns hourly scrapes to the beginning of hour:
```yml
```yaml
scrape_configs:
- job_name: foo
scrape_interval: 1h
@@ -990,7 +1000,7 @@ If you have suggestions for improvements or have found a bug - please open an is
* By default `vmagent` evenly spreads scrape load in time. If a particular scrape target must be scraped at specific offset, then `scrape_offset` option must be used.
For example, the following config instructs `vmagent` to scrape the target at 10 seconds of every minute:
```yml
```yaml
scrape_configs:
- job_name: foo
scrape_interval: 1m
@@ -1003,14 +1013,14 @@ If you have suggestions for improvements or have found a bug - please open an is
The following relabeling rule may be added to `relabel_configs` section in order to filter out pods with unneeded ports:
```yml
```yaml
- action: keep_if_equal
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
```
The following relabeling rule may be added to `relabel_configs` section in order to filter out init container pods:
```yml
```yaml
- action: drop
source_labels: [__meta_kubernetes_pod_container_init]
regex: true
@@ -1054,7 +1064,7 @@ For example, `-kafka.consumer.topic.brokers=host1:9092;host2:9092`.
The following command starts `vmagent`, which reads metrics in InfluxDB line protocol format from Kafka broker at `localhost:9092`
from the topic `metrics-by-telegraf` and sends them to remote storage at `http://localhost:8428/api/v1/write`:
```console
```bash
./bin/vmagent -remoteWrite.url=http://localhost:8428/api/v1/write \
-kafka.consumer.topic.brokers=localhost:9092 \
-kafka.consumer.topic.format=influx \
@@ -1077,7 +1087,7 @@ These command-line flags are available only in [enterprise](https://docs.victori
which can be downloaded for evaluation from [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest) page
(see `vmutils-...-enterprise.tar.gz` archives) and from [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
```
```text
-kafka.consumer.topic array
Kafka topic names for data consumption.
Supports an array of values separated by comma or specified via multiple flags.
@@ -1122,13 +1132,13 @@ Two types of auth are supported:
* sasl with username and password:
```console
```bash
./bin/vmagent -remoteWrite.url=kafka://localhost:9092/?topic=prom-rw&security.protocol=SASL_SSL&sasl.mechanisms=PLAIN -remoteWrite.basicAuth.username=user -remoteWrite.basicAuth.password=password
```
* tls certificates:
```console
```bash
./bin/vmagent -remoteWrite.url=kafka://localhost:9092/?topic=prom-rw&security.protocol=SSL -remoteWrite.tlsCAFile=/opt/ca.pem -remoteWrite.tlsCertFile=/opt/cert.pem -remoteWrite.tlsKeyFile=/opt/key.pem
```
@@ -1159,7 +1169,7 @@ The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmagent`.
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
```console
```bash
ROOT_IMAGE=scratch make package-vmagent
```
@@ -1187,7 +1197,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
<div class="with-copy" markdown="1">
```console
```bash
curl http://0.0.0.0:8429/debug/pprof/heap > mem.pprof
```
@@ -1197,7 +1207,7 @@ curl http://0.0.0.0:8429/debug/pprof/heap > mem.pprof
<div class="with-copy" markdown="1">
```console
```bash
curl http://0.0.0.0:8429/debug/pprof/profile > cpu.pprof
```
@@ -1213,7 +1223,7 @@ It is safe sharing the collected profiles from security point of view, since the
`vmagent` can be fine-tuned with various command-line flags. Run `./vmagent -help` in order to see the full list of these flags with their descriptions and default values:
```
```text
./vmagent -help
vmagent collects metrics data via popular data ingestion protocols and routes them to VictoriaMetrics.
@@ -1222,10 +1232,6 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-cacheExpireDuration duration
Items are removed from in-memory caches after they aren't accessed for this duration. Lower values may reduce memory usage at the cost of higher CPU usage. See also -prevCacheRemovalPercent (default 30m0s)
-clients.docker
Decides whether a docker container be brought up automatically
-clients.semaphore
Tells if the job is running on Semaphore
-configAuthKey string
Authorization key for accessing /config page. It must be passed via authKey query arg
-csvTrimTimestamp duration
@@ -1263,6 +1269,12 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -1373,6 +1385,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from the OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-newrelic.maxInsertRequestSize size
The maximum size in bytes of a single NewRelic request to /newrelic/infra/v2/metrics/events/bulk
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 67108864)
-opentsdbHTTPListenAddr string
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
@@ -1411,7 +1426,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-promscrape.config.strictParse
Whether to deny unsupported fields in -promscrape.config . Set to false in order to silently skip unsupported fields (default true)
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes
Interval for checking for changes in -promscrape.config file. By default, the checking is disabled. See how to reload -promscrape.config file at https://docs.victoriametrics.com/vmagent.html#configuration-update
-promscrape.consul.waitTime duration
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
@@ -1595,6 +1610,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.shardByURL
Whether to shard outgoing series across all the remote storage systems enumerated via -remoteWrite.url . By default the data is replicated across all the -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#sharding-among-remote-storages
-remoteWrite.shardByURL.labels array
Optional list of labels, which must be used for sharding outgoing samples among remote storage systems if -remoteWrite.shardByURL command-line flag is set. By default all the labels are used for sharding in order to gain even distribution of series over the specified -remoteWrite.url systems
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.showURL
Whether to show -remoteWrite.url in the exported metrics. It is hidden by default, since it can contain sensitive info such as auth key
-remoteWrite.significantFigures array

View File

@@ -106,12 +106,15 @@ type client struct {
func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Panicf("FATAL: cannot initialize auth config for remoteWrite.url=%q: %s", remoteWriteURL, err)
logger.Fatalf("cannot initialize auth config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tlsCfg, err := authCfg.NewTLSConfig()
if err != nil {
logger.Fatalf("cannot initialize tls config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tlsCfg := authCfg.NewTLSConfig()
awsCfg, err := getAWSAPIConfig(argIdx)
if err != nil {
logger.Fatalf("FATAL: cannot initialize AWS Config for remoteWrite.url=%q: %s", remoteWriteURL, err)
logger.Fatalf("cannot initialize AWS Config for -remoteWrite.url=%q: %s", remoteWriteURL, err)
}
tr := &http.Transport{
DialContext: statDial,
@@ -328,15 +331,25 @@ func (c *client) doRequest(url string, body []byte) (*http.Response, error) {
return nil, err
}
resp, err := c.hc.Do(req)
if err != nil && errors.Is(err, io.EOF) {
// it is likely connection became stale.
// So we do one more attempt in hope request will succeed.
// If not, the error should be handled by the caller as usual.
// This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139
req, _ = c.newRequest(url, body)
resp, err = c.hc.Do(req)
if err == nil {
return resp, nil
}
return resp, err
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
return nil, err
}
// It is likely connection became stale or timed out during the first request.
// Make another attempt in hope request will succeed.
// If not, the error should be handled by the caller as usual.
// This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4139
req, err = c.newRequest(url, body)
if err != nil {
return nil, fmt.Errorf("second attempt: %w", err)
}
resp, err = c.hc.Do(req)
if err != nil {
return nil, fmt.Errorf("second attempt: %w", err)
}
return resp, nil
}
func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
@@ -362,8 +375,7 @@ func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
if c.awsCfg != nil {
sigv4Hash := awsapi.HashHex(body)
if err := c.awsCfg.SignRequest(req, sigv4Hash); err != nil {
// there is no need in retry, request will be rejected by client.Do and retried by code below
logger.Warnf("cannot sign remoteWrite request with AWS sigv4: %s", err)
return nil, fmt.Errorf("cannot sign remoteWrite request with AWS sigv4: %w", err)
}
}
return req, nil

View File

@@ -23,6 +23,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
@@ -40,6 +41,9 @@ var (
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
shardByURL = flag.Bool("remoteWrite.shardByURL", false, "Whether to shard outgoing series across all the remote storage systems enumerated via -remoteWrite.url . "+
"By default the data is replicated across all the -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#sharding-among-remote-storages")
shardByURLLabels = flagutil.NewArrayString("remoteWrite.shardByURL.labels", "Optional list of labels, which must be used for sharding outgoing samples "+
"among remote storage systems if -remoteWrite.shardByURL command-line flag is set. By default all the labels are used for sharding in order to gain "+
"even distribution of series over the specified -remoteWrite.url systems")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
keepDanglingQueues = flag.Bool("remoteWrite.keepDanglingQueues", false, "Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. "+
@@ -116,6 +120,8 @@ func InitSecretFlags() {
}
}
var shardByURLLabelsMap map[string]struct{}
// Init initializes remotewrite.
//
// It must be called after flag.Parse().
@@ -152,6 +158,13 @@ func Init() {
if *queues <= 0 {
*queues = 1
}
if len(*shardByURLLabels) > 0 {
m := make(map[string]struct{}, len(*shardByURLLabels))
for _, label := range *shardByURLLabels {
m[label] = struct{}{}
}
shardByURLLabelsMap = m
}
initLabelsGlobal()
// Register SIGHUP handler for config reload before loadRelabelConfigs.
@@ -418,11 +431,23 @@ func pushBlockToRemoteStorages(rwctxs []*remoteWriteCtx, tssBlock []prompbmarsha
if *shardByURL {
// Shard the data among rwctxs
tssByURL := make([][]prompbmarshal.TimeSeries, len(rwctxs))
tmpLabels := promutils.GetLabels()
for _, ts := range tssBlock {
h := getLabelsHash(ts.Labels)
hashLabels := ts.Labels
if len(shardByURLLabelsMap) > 0 {
hashLabels = tmpLabels.Labels[:0]
for _, label := range ts.Labels {
if _, ok := shardByURLLabelsMap[label.Name]; ok {
hashLabels = append(hashLabels, label)
}
}
}
h := getLabelsHash(hashLabels)
idx := h % uint64(len(tssByURL))
tssByURL[idx] = append(tssByURL[idx], ts)
}
promutils.PutLabels(tmpLabels)
// Push sharded data to remote storages in parallel in order to reduce
// the time needed for sending the data to multiple remote storage systems.
var wg sync.WaitGroup

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 8429
ENTRYPOINT ["/vmalert-tool-prod"]
ARG TARGETARCH
COPY vmalert-tool-linux-${TARGETARCH}-prod ./vmalert-tool-prod

View File

@@ -119,6 +119,13 @@ name: <string>
# `eval_offset` can't be bigger than `interval`.
[ eval_offset: <duration> ]
# Optional
# Adjust the `time` parameter of group evaluation requests to compensate intentional query delay from the datasource.
# By default, the value is inherited from the `-rule.evalDelay` cmd-line flag - see its description for details.
# If group has `latency_offset` set in `params`, then it is recommended to set `eval_delay` equal to `latency_offset`.
# See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155 and https://docs.victoriametrics.com/keyConcepts.html#query-latency.
[ eval_delay: <duration> ]
# Limit the number of alerts an alerting rule and series a recording
# rule can produce. 0 is no limit.
[ limit: <int> | default = 0 ]
@@ -794,9 +801,7 @@ Try the following recommendations to reduce the chance of hitting the data delay
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`.
If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
* If you know in advance, that data in datasource is delayed - try changing vmalerts `-datasource.lookback`
command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
* Extend `[duration]` in expr to help tolerate the delay. For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
* If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
in datasource is inconsistent or `>=5min` - try changing vmalerts `-datasource.queryStep` command-line flag to specify
how far search query can lookback for the recent datapoint. The recommendation is to have the step
@@ -804,9 +809,12 @@ at least two times bigger than the resolution.
> Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
By default, recently written samples to VictoriaMetrics [aren't visible for queries](https://docs.victoriametrics.com/keyConcepts.html#query-latency)
for up to 30s (see `-search.latencyOffset` command-line flag at vmselect or VictoriaMetrics single-node). Such delay is needed to eliminate risk of
incomplete data on the moment of querying, due to chance that metrics collectors won't be able to deliver that data in time.
To compensate the latency in timestamps for produced evaluation results, `-rule.evalDelay` is also set to `30s` by default.
If you expect data to be delayed for longer intervals (it gets buffered, queued, or just network is slow sometimes)
- consider increasing the `-rule.evalDelay` value accordingly.
### Alerts state
@@ -969,7 +977,7 @@ The shortlist of configuration flags is the following:
-datasource.headers string
Optional HTTP extraHeaders to send with each request to the corresponding -datasource.url. For example, -datasource.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -datasource.url. Multiple headers must be delimited by '^^': -datasource.headers='header1:value1^^header2:value2'
-datasource.lookback duration
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
Will be deprecated soon, please adjust "-search.latencyOffset" at datasource side or specify "latency_offset" in rule group's params. Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
-datasource.maxIdleConnections int
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
-datasource.oauth2.clientID string
@@ -1037,6 +1045,12 @@ The shortlist of configuration flags is the following:
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -1217,7 +1231,7 @@ The shortlist of configuration flags is the following:
-remoteWrite.bearerTokenFile string
Optional path to bearer token file to use for -remoteWrite.url.
-remoteWrite.concurrency int
Defines number of writers for concurrent writing into remote querier (default 1)
Defines number of writers for concurrent writing into remote write endpoint (default 1)
-remoteWrite.disablePathAppend
Whether to disable automatic appending of '/api/v1/write' path to the configured -remoteWrite.url.
-remoteWrite.flushInterval duration
@@ -1287,13 +1301,14 @@ The shortlist of configuration flags is the following:
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
Supports an array of values separated by comma or specified via multiple flags.
-rule.evalDelay time
Adjustment of the time parameter for rule evaluation requests to compensate intentional data delay from the datasource.Normally, should be equal to `-search.latencyOffset` (cmd-line flag configured for VictoriaMetrics single-node or vmselect). (default 30s)
-rule.maxResolveDuration duration
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
Limits the maxiMum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group
-rule.resendDelay duration
Minimum amount of time to wait before resending an alert to notifier
MiniMum amount of time to wait before resending an alert to notifier
-rule.templates array
Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
Path or glob pattern to location with go template definitions for rules annotations templating. Flag can be specified multiple times.
Examples:
-rule.templates="/path/to/file". Path to a single file with go templates
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
@@ -1382,8 +1397,11 @@ For example:
```yaml
static_configs:
- targets:
# support using full url
- 'http://alertmanager:9093/test/api/v2/alerts'
- 'https://alertmanager:9093/api/v2/alerts'
# the following target with only host:port will be used as <scheme>://localhost:9093/<path_prefix>/api/v2/alerts
- localhost:9093
- localhost:9095
consul_sd_configs:
- server: localhost:8500

View File

@@ -19,11 +19,14 @@ import (
// Group contains list of Rules grouped into
// entity with one name and evaluation interval
type Group struct {
Type Type `yaml:"type,omitempty"`
File string
Name string `yaml:"name"`
Interval *promutils.Duration `yaml:"interval,omitempty"`
EvalOffset *promutils.Duration `yaml:"eval_offset,omitempty"`
Type Type `yaml:"type,omitempty"`
File string
Name string `yaml:"name"`
Interval *promutils.Duration `yaml:"interval,omitempty"`
EvalOffset *promutils.Duration `yaml:"eval_offset,omitempty"`
// EvalDelay will adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155
EvalDelay *promutils.Duration `yaml:"eval_delay,omitempty"`
Limit int `yaml:"limit,omitempty"`
Rules []Rule `yaml:"rules"`
Concurrency int `yaml:"concurrency"`
@@ -233,7 +236,7 @@ func ParseSilent(pathPatterns []string, validateTplFn ValidateTplFn, validateExp
files, err := readFromFS(pathPatterns)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
return nil, fmt.Errorf("failed to read from the config: %w", err)
}
return parse(files, validateTplFn, validateExpressions)
}
@@ -242,11 +245,11 @@ func ParseSilent(pathPatterns []string, validateTplFn ValidateTplFn, validateExp
func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
files, err := readFromFS(pathPatterns)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
return nil, fmt.Errorf("failed to read from the config: %w", err)
}
groups, err := parse(files, validateTplFn, validateExpressions)
if err != nil {
return nil, fmt.Errorf("failed to parse %s: %s", pathPatterns, err)
return nil, fmt.Errorf("failed to parse %s: %w", pathPatterns, err)
}
if len(groups) < 1 {
cLogger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))

View File

@@ -106,7 +106,7 @@ func TestParseBad(t *testing.T) {
},
{
[]string{"http://unreachable-url"},
"failed to read",
"failed to",
},
}
for _, tc := range testCases {

View File

@@ -49,7 +49,7 @@ func (fs *FS) Read(files []string) (map[string][]byte, error) {
path, resp.StatusCode, http.StatusOK, data)
}
if err != nil {
return nil, fmt.Errorf("cannot read %q: %s", path, err)
return nil, fmt.Errorf("cannot read %q: %w", path, err)
}
result[path] = data
}

View File

@@ -15,6 +15,7 @@ groups:
interval: 2s
concurrency: 2
type: prometheus
eval_delay: 30s
rules:
- alert: Conns
expr: sum(vm_tcplistener_conns) by (instance) > 1

View File

@@ -43,7 +43,9 @@ var (
oauth2TokenURL = flag.String("datasource.oauth2.tokenUrl", "", "Optional OAuth2 tokenURL to use for -datasource.url.")
oauth2Scopes = flag.String("datasource.oauth2.scopes", "", "Optional OAuth2 scopes to use for -datasource.url. Scopes must be delimited by ';'")
lookBack = flag.Duration("datasource.lookback", 0, `Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
lookBack = flag.Duration("datasource.lookback", 0, `Will be deprecated soon, please adjust "-search.latencyOffset" at datasource side `+
`or specify "latency_offset" in rule group's params. Lookback defines how far into the past to look when evaluating queries. `+
`For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
queryStep = flag.Duration("datasource.queryStep", 5*time.Minute, "How far a value can fallback to when evaluating queries. "+
"For example, if -datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query. "+
"If set to 0, rule's evaluation interval will be used instead.")
@@ -83,7 +85,10 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
return nil, fmt.Errorf("datasource.url is empty")
}
if !*queryTimeAlignment {
logger.Warnf("flag `datasource.queryTimeAlignment` is deprecated and will be removed in next releases, please use `eval_alignment` in rule group instead")
logger.Warnf("flag `-datasource.queryTimeAlignment` is deprecated and will be removed in next releases. Please use `eval_alignment` in rule group instead.")
}
if *lookBack != 0 {
logger.Warnf("flag `-datasource.lookback` will be deprecated soon. Please use `-rule.evalDelay` command-line flag instead. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155 for details.")
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
@@ -113,7 +118,7 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
}
_, err = authCfg.GetAuthHeader()
if err != nil {
return nil, fmt.Errorf("failed to set request auth header to datasource %q: %s", *addr, err)
return nil, fmt.Errorf("failed to set request auth header to datasource %q: %w", *addr, err)
}
return &VMStorage{

View File

@@ -142,24 +142,30 @@ func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) (Resu
return Result{}, nil, err
}
resp, err := s.do(ctx, req)
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
// something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
req, _ = s.newQueryRequest(query, ts)
resp, err = s.do(ctx, req)
}
if err != nil {
return Result{}, req, err
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
// Return unexpected error to the caller.
return Result{}, nil, err
}
// Something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
req, err = s.newQueryRequest(query, ts)
if err != nil {
return Result{}, nil, fmt.Errorf("second attempt: %w", err)
}
resp, err = s.do(ctx, req)
if err != nil {
return Result{}, nil, fmt.Errorf("second attempt: %w", err)
}
}
defer func() {
_ = resp.Body.Close()
}()
// Process the received response.
parseFn := parsePrometheusResponse
if s.dataSourceType != datasourcePrometheus {
parseFn = parseGraphiteResponse
}
result, err := parseFn(req, resp)
_ = resp.Body.Close()
return result, req, err
}
@@ -177,23 +183,31 @@ func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end tim
return res, fmt.Errorf("end param is missing")
}
req, err := s.newQueryRangeRequest(query, start, end)
if err != nil {
return Result{}, err
}
resp, err := s.do(ctx, req)
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
// something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
req, _ = s.newQueryRangeRequest(query, start, end)
resp, err = s.do(ctx, req)
}
if err != nil {
return res, err
}
defer func() {
_ = resp.Body.Close()
}()
return parsePrometheusResponse(req, resp)
resp, err := s.do(ctx, req)
if err != nil {
if !errors.Is(err, io.EOF) && !errors.Is(err, io.ErrUnexpectedEOF) {
// Return unexpected error to the caller.
return res, err
}
// Something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
req, err = s.newQueryRangeRequest(query, start, end)
if err != nil {
return res, fmt.Errorf("second attempt: %w", err)
}
resp, err = s.do(ctx, req)
if err != nil {
return res, fmt.Errorf("second attempt: %w", err)
}
}
// Process the received response.
res, err = parsePrometheusResponse(req, resp)
_ = resp.Body.Close()
return res, err
}
func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response, error) {
@@ -219,7 +233,7 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
func (s *VMStorage) newQueryRangeRequest(query string, start, end time.Time) (*http.Request, error) {
req, err := s.newRequest()
if err != nil {
return nil, fmt.Errorf("cannot create query_range request to datasource %q: %s", s.datasourceURL, err)
return nil, fmt.Errorf("cannot create query_range request to datasource %q: %w", s.datasourceURL, err)
}
s.setPrometheusRangeReqParams(req, query, start, end)
return req, nil
@@ -228,7 +242,7 @@ func (s *VMStorage) newQueryRangeRequest(query string, start, end time.Time) (*h
func (s *VMStorage) newQueryRequest(query string, ts time.Time) (*http.Request, error) {
req, err := s.newRequest()
if err != nil {
return nil, fmt.Errorf("cannot create query request to datasource %q: %s", s.datasourceURL, err)
return nil, fmt.Errorf("cannot create query request to datasource %q: %w", s.datasourceURL, err)
}
switch s.dataSourceType {
case "", datasourcePrometheus:

View File

@@ -112,14 +112,14 @@ func parsePrometheusResponse(req *http.Request, resp *http.Response) (res Result
return res, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL.Redacted(), r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return res, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
return res, fmt.Errorf("unknown status: %s, Expected success or error", r.Status)
}
var parseFn func() ([]Metric, error)
switch r.Data.ResultType {
case rtVector:
var pi promInstant
if err := json.Unmarshal(r.Data.Result, &pi.Result); err != nil {
return res, fmt.Errorf("umarshal err %s; \n %#v", err, string(r.Data.Result))
return res, fmt.Errorf("unmarshal err %w; \n %#v", err, string(r.Data.Result))
}
parseFn = pi.metrics
case rtMatrix:

View File

@@ -47,8 +47,8 @@ all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
`)
ruleTemplatesPath = flagutil.NewArrayString("rule.templates", `Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
ruleTemplatesPath = flagutil.NewArrayString("rule.templates", `Path or glob pattern to location with go template definitions `+
`for rules annotations templating. Flag can be specified multiple times.
Examples:
-rule.templates="/path/to/file". Path to a single file with go templates
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
@@ -230,7 +230,9 @@ func newManager(ctx context.Context) (*manager, error) {
if err != nil {
return nil, fmt.Errorf("failed to init remoteWrite: %w", err)
}
manager.rw = rw
if rw != nil {
manager.rw = rw
}
rr, err := remoteread.Init()
if err != nil {

View File

@@ -3,7 +3,6 @@ package notifier
import (
"crypto/md5"
"fmt"
"gopkg.in/yaml.v2"
"net/url"
"os"
"path"
@@ -11,6 +10,8 @@ import (
"strings"
"time"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape/discovery/consul"
@@ -142,26 +143,23 @@ func parseLabels(target string, metaLabels *promutils.Labels, cfg *Config) (stri
if labels.Len() == 0 {
return "", nil, nil
}
schemeRelabeled := labels.Get("__scheme__")
if len(schemeRelabeled) == 0 {
schemeRelabeled = "http"
scheme := labels.Get("__scheme__")
if len(scheme) == 0 {
scheme = "http"
}
addressRelabeled := labels.Get("__address__")
if len(addressRelabeled) == 0 {
alertsPath := labels.Get("__alerts_path__")
if !strings.HasPrefix(alertsPath, "/") {
alertsPath = "/" + alertsPath
}
address := labels.Get("__address__")
if len(address) == 0 {
return "", nil, nil
}
if strings.Contains(addressRelabeled, "/") {
return "", nil, nil
}
addressRelabeled = addMissingPort(schemeRelabeled, addressRelabeled)
alertsPathRelabeled := labels.Get("__alerts_path__")
if !strings.HasPrefix(alertsPathRelabeled, "/") {
alertsPathRelabeled = "/" + alertsPathRelabeled
}
u := fmt.Sprintf("%s://%s%s", schemeRelabeled, addressRelabeled, alertsPathRelabeled)
address = addMissingPort(scheme, address)
u := fmt.Sprintf("%s://%s%s", scheme, address, alertsPath)
if _, err := url.Parse(u); err != nil {
return "", nil, fmt.Errorf("invalid url %q for scheme=%q (%q), target=%q, metrics_path=%q (%q): %w",
u, cfg.Scheme, schemeRelabeled, target, addressRelabeled, alertsPathRelabeled, err)
u, cfg.Scheme, scheme, target, address, alertsPath, err)
}
return u, labels, nil
}
@@ -181,9 +179,24 @@ func addMissingPort(scheme, target string) string {
func mergeLabels(target string, metaLabels *promutils.Labels, cfg *Config) *promutils.Labels {
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
m := promutils.NewLabels(3 + metaLabels.Len())
m.Add("__address__", target)
m.Add("__scheme__", cfg.Scheme)
m.Add("__alerts_path__", path.Join("/", cfg.PathPrefix, alertManagerPath))
address := target
scheme := cfg.Scheme
alertsPath := path.Join("/", cfg.PathPrefix, alertManagerPath)
// try to extract optional scheme and alertsPath from __address__.
if strings.HasPrefix(address, "http://") {
scheme = "http"
address = address[len("http://"):]
} else if strings.HasPrefix(address, "https://") {
scheme = "https"
address = address[len("https://"):]
}
if n := strings.IndexByte(address, '/'); n >= 0 {
alertsPath = address[n:]
address = address[:n]
}
m.Add("__address__", address)
m.Add("__scheme__", scheme)
m.Add("__alerts_path__", alertsPath)
m.AddFrom(metaLabels)
return m
}

View File

@@ -87,7 +87,7 @@ func (cw *configWatcher) reload(path string) error {
func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn getLabels) error {
targets, errors := targetsFromLabels(labelsFn, cw.cfg, cw.genFn)
for _, err := range errors {
return fmt.Errorf("failed to init notifier for %q: %s", typeK, err)
return fmt.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.setTargets(typeK, targets)
@@ -107,7 +107,7 @@ func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn
}
updateTargets, errors := targetsFromLabels(labelsFn, cw.cfg, cw.genFn)
for _, err := range errors {
logger.Errorf("failed to init notifier for %q: %s", typeK, err)
logger.Errorf("failed to init notifier for %q: %w", typeK, err)
}
cw.setTargets(typeK, updateTargets)
}
@@ -118,7 +118,7 @@ func (cw *configWatcher) add(typeK TargetType, interval time.Duration, labelsFn
func targetsFromLabels(labelsFn getLabels, cfg *Config, genFn AlertURLGenerator) ([]Target, []error) {
metaLabels, err := labelsFn()
if err != nil {
return nil, []error{fmt.Errorf("failed to get labels: %s", err)}
return nil, []error{fmt.Errorf("failed to get labels: %w", err)}
}
var targets []Target
var errors []error
@@ -167,11 +167,11 @@ func (cw *configWatcher) start() error {
for _, target := range cfg.Targets {
address, labels, err := parseLabels(target, nil, cw.cfg)
if err != nil {
return fmt.Errorf("failed to parse labels for target %q: %s", target, err)
return fmt.Errorf("failed to parse labels for target %q: %w", target, err)
}
notifier, err := NewAlertManager(address, cw.genFn, httpCfg, cw.cfg.parsedAlertRelabelConfigs, cw.cfg.Timeout.Duration())
if err != nil {
return fmt.Errorf("failed to init alertmanager for addr %q: %s", address, err)
return fmt.Errorf("failed to init alertmanager for addr %q: %w", address, err)
}
targets = append(targets, Target{
Notifier: notifier,
@@ -189,14 +189,14 @@ func (cw *configWatcher) start() error {
sdc := &cw.cfg.ConsulSDConfigs[i]
targetLabels, err := sdc.GetLabels(cw.cfg.baseDir)
if err != nil {
return nil, fmt.Errorf("got labels err: %s", err)
return nil, fmt.Errorf("got labels err: %w", err)
}
labels = append(labels, targetLabels...)
}
return labels, nil
})
if err != nil {
return fmt.Errorf("failed to start consulSD discovery: %s", err)
return fmt.Errorf("failed to start consulSD discovery: %w", err)
}
}
@@ -207,14 +207,14 @@ func (cw *configWatcher) start() error {
sdc := &cw.cfg.DNSSDConfigs[i]
targetLabels, err := sdc.GetLabels(cw.cfg.baseDir)
if err != nil {
return nil, fmt.Errorf("got labels err: %s", err)
return nil, fmt.Errorf("got labels err: %w", err)
}
labels = append(labels, targetLabels...)
}
return labels, nil
})
if err != nil {
return fmt.Errorf("failed to start DNSSD discovery: %s", err)
return fmt.Errorf("failed to start DNSSD discovery: %w", err)
}
}
return nil

View File

@@ -318,3 +318,47 @@ func TestMergeHTTPClientConfigs(t *testing.T) {
t.Fatalf("expected BasicAuth tp be present")
}
}
func TestParseLabels(t *testing.T) {
testCases := []struct {
name string
target string
cfg *Config
expectedAddress string
expectedErr bool
}{
{
"invalid address",
"invalid:*//url",
&Config{},
"",
true,
},
{
"use some default params",
"alertmanager:9093",
&Config{PathPrefix: "test"},
"http://alertmanager:9093/test/api/v2/alerts",
false,
},
{
"use target address",
"https://alertmanager:9093/api/v1/alerts",
&Config{Scheme: "http", PathPrefix: "test"},
"https://alertmanager:9093/api/v1/alerts",
false,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
address, _, err := parseLabels(tc.target, nil, tc.cfg)
if err == nil == tc.expectedErr {
t.Fatalf("unexpected error; got %t; want %t", err != nil, tc.expectedErr)
}
if address != tc.expectedAddress {
t.Fatalf("unexpected address; got %q; want %q", address, tc.expectedAddress)
}
})
}
}

View File

@@ -90,7 +90,7 @@ func Init(gen AlertURLGenerator, extLabels map[string]string, extURL string) (fu
externalLabels = extLabels
eu, err := url.Parse(externalURL)
if err != nil {
return nil, fmt.Errorf("failed to parse external URL: %s", err)
return nil, fmt.Errorf("failed to parse external URL: %w", err)
}
templates.UpdateWithFuncs(templates.FuncsWithExternalURL(eu))
@@ -116,7 +116,7 @@ func Init(gen AlertURLGenerator, extLabels map[string]string, extURL string) (fu
if len(*addrs) > 0 {
notifiers, err := notifiersFromFlags(gen)
if err != nil {
return nil, fmt.Errorf("failed to create notifier from flag values: %s", err)
return nil, fmt.Errorf("failed to create notifier from flag values: %w", err)
}
staticNotifiersFn = func() []Notifier {
return notifiers
@@ -126,7 +126,7 @@ func Init(gen AlertURLGenerator, extLabels map[string]string, extURL string) (fu
cw, err = newWatcher(*configPath, gen)
if err != nil {
return nil, fmt.Errorf("failed to init config watcher: %s", err)
return nil, fmt.Errorf("failed to init config watcher: %w", err)
}
return cw.notifiers, nil
}

View File

@@ -5,6 +5,7 @@ static_configs:
- targets:
- localhost:9093
- localhost:9095
- https://localhost:9093/test/api/v2/alerts
basic_auth:
username: foo
password: bar

View File

@@ -3,6 +3,7 @@ package remotewrite
import (
"bytes"
"context"
"errors"
"flag"
"fmt"
"io"
@@ -117,12 +118,19 @@ func NewClient(ctx context.Context, cfg Config) (*Client, error) {
// Push adds timeseries into queue for writing into remote storage.
// Push returns and error if client is stopped or if queue is full.
func (c *Client) Push(s prompbmarshal.TimeSeries) error {
rwTotal.Inc()
select {
case <-c.doneCh:
rwErrors.Inc()
droppedRows.Add(len(s.Samples))
droppedBytes.Add(s.Size())
return fmt.Errorf("client is closed")
case c.input <- s:
return nil
default:
rwErrors.Inc()
droppedRows.Add(len(s.Samples))
droppedBytes.Add(s.Size())
return fmt.Errorf("failed to push timeseries - queue is full (%d entries). "+
"Queue size is controlled by -remoteWrite.maxQueueSize flag",
c.maxQueueSize)
@@ -181,11 +189,14 @@ func (c *Client) run(ctx context.Context) {
}
var (
rwErrors = metrics.NewCounter(`vmalert_remotewrite_errors_total`)
rwTotal = metrics.NewCounter(`vmalert_remotewrite_total`)
sentRows = metrics.NewCounter(`vmalert_remotewrite_sent_rows_total`)
sentBytes = metrics.NewCounter(`vmalert_remotewrite_sent_bytes_total`)
sendDuration = metrics.NewFloatCounter(`vmalert_remotewrite_send_duration_seconds_total`)
droppedRows = metrics.NewCounter(`vmalert_remotewrite_dropped_rows_total`)
droppedBytes = metrics.NewCounter(`vmalert_remotewrite_dropped_bytes_total`)
sendDuration = metrics.NewFloatCounter(`vmalert_remotewrite_send_duration_seconds_total`)
bufferFlushDuration = metrics.NewHistogram(`vmalert_remotewrite_flush_duration_seconds`)
_ = metrics.NewGauge(`vmalert_remotewrite_concurrency`, func() float64 {
@@ -222,6 +233,11 @@ func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
L:
for attempts := 0; ; attempts++ {
err := c.send(ctx, b)
if errors.Is(err, io.EOF) {
// Something in the middle between client and destination might be closing
// the connection. So we do a one more attempt in hope request will succeed.
err = c.send(ctx, b)
}
if err == nil {
sentRows.Add(len(wr.Timeseries))
sentBytes.Add(len(b))
@@ -259,6 +275,7 @@ L:
}
rwErrors.Inc()
droppedRows.Add(len(wr.Timeseries))
droppedBytes.Add(len(b))
logger.Errorf("attempts to send remote-write request failed - dropping %d time series",
@@ -282,7 +299,9 @@ func (c *Client) send(ctx context.Context, data []byte) error {
if c.authCfg != nil {
err = c.authCfg.SetHeaders(req, true)
if err != nil {
return &nonRetriableError{err: err}
return &nonRetriableError{
err: err,
}
}
}
if !*disablePathAppend {
@@ -301,13 +320,14 @@ func (c *Client) send(ctx context.Context, data []byte) error {
// Prometheus remote Write compatible receivers MUST
switch resp.StatusCode / 100 {
case 2:
// respond with a HTTP 2xx status code when the write is successful.
// respond with HTTP 2xx status code when write is successful.
return nil
case 4:
if resp.StatusCode != http.StatusTooManyRequests {
// MUST NOT retry write requests on HTTP 4xx responses other than 429
return &nonRetriableError{fmt.Errorf("unexpected response code %d for %s. Response body %q",
resp.StatusCode, req.URL.Redacted(), body)}
return &nonRetriableError{
err: fmt.Errorf("unexpected response code %d for %s. Response body %q", resp.StatusCode, req.URL.Redacted(), body),
}
}
fallthrough
default:

View File

@@ -30,7 +30,7 @@ var (
maxQueueSize = flag.Int("remoteWrite.maxQueueSize", 1e5, "Defines the max number of pending datapoints to remote write endpoint")
maxBatchSize = flag.Int("remoteWrite.maxBatchSize", 1e3, "Defines max number of timeseries to be flushed at once")
concurrency = flag.Int("remoteWrite.concurrency", 1, "Defines number of writers for concurrent writing into remote querier")
concurrency = flag.Int("remoteWrite.concurrency", 1, "Defines number of writers for concurrent writing into remote write endpoint")
flushInterval = flag.Duration("remoteWrite.flushInterval", 5*time.Second, "Defines interval of flushes to remote write endpoint")
tlsInsecureSkipVerify = flag.Bool("remoteWrite.tlsInsecureSkipVerify", false, "Whether to skip tls verification when connecting to -remoteWrite.url")

View File

@@ -36,11 +36,11 @@ func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw remotewri
}
tFrom, err := time.Parse(time.RFC3339, *replayFrom)
if err != nil {
return fmt.Errorf("failed to parse %q: %s", *replayFrom, err)
return fmt.Errorf("failed to parse %q: %w", *replayFrom, err)
}
tTo, err := time.Parse(time.RFC3339, *replayTo)
if err != nil {
return fmt.Errorf("failed to parse %q: %s", *replayTo, err)
return fmt.Errorf("failed to parse %q: %w", *replayTo, err)
}
if !tTo.After(tFrom) {
return fmt.Errorf("replay.timeTo must be bigger than replay.timeFrom")

View File

@@ -91,7 +91,7 @@ func NewAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
entries: make([]StateEntry, entrySize),
}
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
labels := fmt.Sprintf(`alertname=%q, group=%q, file=%q, id="%d"`, ar.Name, group.Name, group.File, ar.ID())
ar.metrics.pending = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_alerts_pending{%s}`, labels),
func() float64 {
ar.alertsMu.RLock()
@@ -269,7 +269,7 @@ func (ar *AlertingRule) toLabels(m datasource.Metric, qFn templates.QueryFn) (*l
Expr: ar.Expr,
})
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %s", err)
return nil, fmt.Errorf("failed to expand labels: %w", err)
}
for k, v := range extraLabels {
ls.processed[k] = v
@@ -295,24 +295,33 @@ func (ar *AlertingRule) toLabels(m datasource.Metric, qFn templates.QueryFn) (*l
}
// execRange executes alerting rule on the given time range similarly to exec.
// It doesn't update internal states of the Rule and meant to be used just
// to get time series for backfilling.
// It returns ALERT and ALERT_FOR_STATE time series as result.
// When making consecutive calls make sure to respect time linearity for start and end params,
// as this function modifies AlertingRule alerts state.
// It is not thread safe.
// It returns ALERT and ALERT_FOR_STATE time series as a result.
func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
res, err := ar.q.QueryRange(ctx, ar.Expr, start, end)
if err != nil {
return nil, err
}
var result []prompbmarshal.TimeSeries
holdAlertState := make(map[uint64]*notifier.Alert)
qFn := func(query string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported in replay mode")
}
for _, s := range res.Data {
ls, err := ar.toLabels(s, qFn)
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %s", err)
}
h := hash(ls.processed)
a, err := ar.newAlert(s, nil, time.Time{}, qFn) // initial alert
if err != nil {
return nil, fmt.Errorf("failed to create alert: %s", err)
return nil, fmt.Errorf("failed to create alert: %w", err)
}
if ar.For == 0 { // if alert is instant
// if alert is instant, For: 0
if ar.For == 0 {
a.State = notifier.StateFiring
for i := range s.Values {
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
@@ -324,18 +333,32 @@ func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]
prevT := time.Time{}
for i := range s.Values {
at := time.Unix(s.Timestamps[i], 0)
// try to restore alert's state on the first iteration
if at.Equal(start) {
if _, ok := ar.alerts[h]; ok {
a = ar.alerts[h]
prevT = at
}
}
if at.Sub(prevT) > ar.EvalInterval {
// reset to Pending if there are gaps > EvalInterval between DPs
a.State = notifier.StatePending
a.ActiveAt = at
} else if at.Sub(a.ActiveAt) >= ar.For {
a.Start = time.Time{}
} else if at.Sub(a.ActiveAt) >= ar.For && a.State != notifier.StateFiring {
a.State = notifier.StateFiring
a.Start = at
}
prevT = at
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
// save alert's state on last iteration, so it can be used on the next execRange call
if at.Equal(end) {
holdAlertState[h] = a
}
}
}
ar.alerts = holdAlertState
return result, nil
}
@@ -388,7 +411,7 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
for _, m := range res.Data {
ls, err := ar.toLabels(m, qFn)
if err != nil {
curState.Err = fmt.Errorf("failed to expand labels: %s", err)
curState.Err = fmt.Errorf("failed to expand labels: %w", err)
return nil, curState.Err
}
h := hash(ls.processed)
@@ -513,7 +536,7 @@ func (ar *AlertingRule) newAlert(m datasource.Metric, ls *labelSet, start time.T
if ls == nil {
ls, err = ar.toLabels(m, qFn)
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %s", err)
return nil, fmt.Errorf("failed to expand labels: %w", err)
}
}
a := &notifier.Alert{
@@ -591,44 +614,41 @@ func (ar *AlertingRule) restore(ctx context.Context, q datasource.Querier, ts ti
return nil
}
for _, a := range ar.alerts {
nameStr := fmt.Sprintf("%s=%q", alertNameLabel, ar.Name)
if !*disableAlertGroupLabel {
nameStr = fmt.Sprintf("%s=%q,%s=%q", alertGroupNameLabel, ar.GroupName, alertNameLabel, ar.Name)
}
var labelsFilter string
for k, v := range ar.Labels {
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
}
expr := fmt.Sprintf("last_over_time(%s{%s%s}[%ds])",
alertForStateMetricName, nameStr, labelsFilter, int(lookback.Seconds()))
res, _, err := q.Query(ctx, expr, ts)
if err != nil {
return fmt.Errorf("failed to execute restore query %q: %w ", expr, err)
}
if len(res.Data) < 1 {
ar.logDebugf(ts, nil, "no response was received from restore query")
return nil
}
for _, series := range res.Data {
series.DelLabel("__name__")
labelSet := make(map[string]string, len(series.Labels))
for _, v := range series.Labels {
labelSet[v.Name] = v.Value
}
id := hash(labelSet)
a, ok := ar.alerts[id]
if !ok {
continue
}
if a.Restored || a.State != notifier.StatePending {
continue
}
var labelsFilter []string
for k, v := range a.Labels {
labelsFilter = append(labelsFilter, fmt.Sprintf("%s=%q", k, v))
}
sort.Strings(labelsFilter)
expr := fmt.Sprintf("last_over_time(%s{%s}[%ds])",
alertForStateMetricName, strings.Join(labelsFilter, ","), int(lookback.Seconds()))
ar.logDebugf(ts, nil, "restoring alert state via query %q", expr)
res, _, err := q.Query(ctx, expr, ts)
if err != nil {
return err
}
qMetrics := res.Data
if len(qMetrics) < 1 {
ar.logDebugf(ts, nil, "no response was received from restore query")
continue
}
// only one series expected in response
m := qMetrics[0]
// __name__ supposed to be alertForStateMetricName
m.DelLabel("__name__")
// we assume that restore query contains all label matchers,
// so all received labels will match anyway if their number is equal.
if len(m.Labels) != len(a.Labels) {
ar.logDebugf(ts, nil, "state restore query returned not expected label-set %v", m.Labels)
continue
}
a.ActiveAt = time.Unix(int64(m.Values[0]), 0)
a.ActiveAt = time.Unix(int64(series.Values[0]), 0)
a.Restored = true
logger.Infof("alert %q (%d) restored to state at %v", a.Name, a.ID, a.ActiveAt)
}

View File

@@ -346,15 +346,18 @@ func TestAlertingRule_Exec(t *testing.T) {
}
func TestAlertingRule_ExecRange(t *testing.T) {
fakeGroup := Group{Name: "TestRule_ExecRange"}
testCases := []struct {
rule *AlertingRule
data []datasource.Metric
expAlerts []*notifier.Alert
rule *AlertingRule
data []datasource.Metric
expAlerts []*notifier.Alert
expHoldAlertStateAlerts map[uint64]*notifier.Alert
}{
{
newTestAlertingRule("empty", 0),
[]datasource.Metric{},
nil,
nil,
},
{
newTestAlertingRule("empty labels", 0),
@@ -364,6 +367,7 @@ func TestAlertingRule_ExecRange(t *testing.T) {
[]*notifier.Alert{
{State: notifier.StateFiring},
},
nil,
},
{
newTestAlertingRule("single-firing", 0),
@@ -376,6 +380,7 @@ func TestAlertingRule_ExecRange(t *testing.T) {
State: notifier.StateFiring,
},
},
nil,
},
{
newTestAlertingRule("single-firing-on-range", 0),
@@ -387,6 +392,7 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{State: notifier.StateFiring},
{State: notifier.StateFiring},
},
nil,
},
{
newTestAlertingRule("for-pending", time.Second),
@@ -398,6 +404,16 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{State: notifier.StatePending, ActiveAt: time.Unix(3, 0)},
{State: notifier.StatePending, ActiveAt: time.Unix(5, 0)},
},
map[uint64]*notifier.Alert{hash(map[string]string{"alertname": "for-pending"}): {
GroupID: fakeGroup.ID(),
Name: "for-pending",
Labels: map[string]string{"alertname": "for-pending"},
Annotations: map[string]string{},
State: notifier.StatePending,
ActiveAt: time.Unix(5, 0),
Value: 1,
For: time.Second,
}},
},
{
newTestAlertingRule("for-firing", 3*time.Second),
@@ -409,6 +425,38 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{State: notifier.StatePending, ActiveAt: time.Unix(1, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
},
map[uint64]*notifier.Alert{hash(map[string]string{"alertname": "for-firing"}): {
GroupID: fakeGroup.ID(),
Name: "for-firing",
Labels: map[string]string{"alertname": "for-firing"},
Annotations: map[string]string{},
State: notifier.StateFiring,
ActiveAt: time.Unix(1, 0),
Start: time.Unix(5, 0),
Value: 1,
For: 3 * time.Second,
}},
},
{
newTestAlertingRule("for-hold-pending", time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 2, 5}},
},
[]*notifier.Alert{
{State: notifier.StatePending, ActiveAt: time.Unix(1, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
{State: notifier.StatePending, ActiveAt: time.Unix(5, 0)},
},
map[uint64]*notifier.Alert{hash(map[string]string{"alertname": "for-hold-pending"}): {
GroupID: fakeGroup.ID(),
Name: "for-hold-pending",
Labels: map[string]string{"alertname": "for-hold-pending"},
Annotations: map[string]string{},
State: notifier.StatePending,
ActiveAt: time.Unix(5, 0),
Value: 1,
For: time.Second,
}},
},
{
newTestAlertingRule("for=>pending=>firing=>pending=>firing=>pending", time.Second),
@@ -422,9 +470,10 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{State: notifier.StateFiring, ActiveAt: time.Unix(5, 0)},
{State: notifier.StatePending, ActiveAt: time.Unix(20, 0)},
},
nil,
},
{
newTestAlertingRule("multi-series-for=>pending=>pending=>firing", 3*time.Second),
newTestAlertingRule("multi-series", 3*time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
{
@@ -436,7 +485,6 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{State: notifier.StatePending, ActiveAt: time.Unix(1, 0)},
{State: notifier.StatePending, ActiveAt: time.Unix(1, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
//
{
State: notifier.StatePending, ActiveAt: time.Unix(1, 0),
Labels: map[string]string{
@@ -450,6 +498,29 @@ func TestAlertingRule_ExecRange(t *testing.T) {
},
},
},
map[uint64]*notifier.Alert{
hash(map[string]string{"alertname": "multi-series"}): {
GroupID: fakeGroup.ID(),
Name: "multi-series",
Labels: map[string]string{"alertname": "multi-series"},
Annotations: map[string]string{},
State: notifier.StateFiring,
ActiveAt: time.Unix(1, 0),
Start: time.Unix(5, 0),
Value: 1,
For: 3 * time.Second,
},
hash(map[string]string{"alertname": "multi-series", "foo": "bar"}): {
GroupID: fakeGroup.ID(),
Name: "multi-series",
Labels: map[string]string{"alertname": "multi-series", "foo": "bar"},
Annotations: map[string]string{},
State: notifier.StatePending,
ActiveAt: time.Unix(5, 0),
Value: 1,
For: 3 * time.Second,
},
},
},
{
newTestRuleWithLabels("multi-series-firing", "source", "vm"),
@@ -477,16 +548,16 @@ func TestAlertingRule_ExecRange(t *testing.T) {
"source": "vm",
}},
},
nil,
},
}
fakeGroup := Group{Name: "TestRule_ExecRange"}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &datasource.FakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
fq.Add(tc.data...)
gotTS, err := tc.rule.execRange(context.TODO(), time.Now(), time.Now())
gotTS, err := tc.rule.execRange(context.TODO(), time.Unix(1, 0), time.Unix(5, 0))
if err != nil {
t.Fatalf("unexpected err: %s", err)
}
@@ -512,6 +583,11 @@ func TestAlertingRule_ExecRange(t *testing.T) {
t.Fatalf("%d: expected \n%v but got \n%v", i, exp, got)
}
}
if tc.expHoldAlertStateAlerts != nil {
if !reflect.DeepEqual(tc.expHoldAlertStateAlerts, tc.rule.alerts) {
t.Fatalf("expected hold alerts state: \n%v but got \n%v", tc.expHoldAlertStateAlerts, tc.rule.alerts)
}
}
})
}
}
@@ -564,6 +640,9 @@ func TestGroup_Restore(t *testing.T) {
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
if got.Name != exp.Name {
t.Fatalf("expected alertname %q; got %q", exp.Name, got.Name)
}
}
}
@@ -579,6 +658,7 @@ func TestGroup_Restore(t *testing.T) {
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
Name: "foo",
ActiveAt: defaultTS,
},
})
@@ -592,6 +672,7 @@ func TestGroup_Restore(t *testing.T) {
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
Name: "foo",
ActiveAt: ts,
},
})
@@ -599,7 +680,7 @@ func TestGroup_Restore(t *testing.T) {
// two rules, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.Set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("foo", ts))
stateMetric("bar", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
@@ -607,9 +688,11 @@ func TestGroup_Restore(t *testing.T) {
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
Name: "foo",
ActiveAt: defaultTS,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
Name: "bar",
ActiveAt: ts,
},
})
@@ -627,9 +710,11 @@ func TestGroup_Restore(t *testing.T) {
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
Name: "foo",
ActiveAt: ts,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
Name: "bar",
ActiveAt: ts,
},
})
@@ -642,6 +727,7 @@ func TestGroup_Restore(t *testing.T) {
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
Name: "foo",
ActiveAt: defaultTS,
},
})
@@ -654,6 +740,7 @@ func TestGroup_Restore(t *testing.T) {
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
Name: "foo",
ActiveAt: ts,
},
})
@@ -666,6 +753,7 @@ func TestGroup_Restore(t *testing.T) {
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
Name: "foo",
ActiveAt: defaultTS,
},
})

View File

@@ -31,7 +31,9 @@ var (
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param.")
resendDelay = flag.Duration("rule.resendDelay", 0, "MiniMum amount of time to wait before resending an alert to notifier")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maxiMum duration for automatic alert expiration, "+
"which by default is 4 times evaluationInterval of the parent ")
"which by default is 4 times evaluationInterval of the parent group")
evalDelay = flag.Duration("rule.evalDelay", 30*time.Second, "Adjustment of the `time` parameter for rule evaluation requests to compensate intentional data delay from the datasource."+
"Normally, should be equal to `-search.latencyOffset` (cmd-line flag configured for VictoriaMetrics single-node or vmselect).")
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
@@ -39,13 +41,16 @@ var (
// Group is an entity for grouping rules
type Group struct {
mu sync.RWMutex
Name string
File string
Rules []Rule
Type config.Type
Interval time.Duration
EvalOffset *time.Duration
mu sync.RWMutex
Name string
File string
Rules []Rule
Type config.Type
Interval time.Duration
EvalOffset *time.Duration
// EvalDelay will adjust timestamp for rule evaluation requests to compensate intentional query delay from datasource.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155
EvalDelay *time.Duration
Limit int
Concurrency int
Checksum string
@@ -139,6 +144,9 @@ func NewGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval ti
if cfg.EvalOffset != nil {
g.EvalOffset = &cfg.EvalOffset.D
}
if cfg.EvalDelay != nil {
g.EvalDelay = &cfg.EvalDelay.D
}
for _, h := range cfg.Headers {
g.Headers[h.Key] = h.Value
}
@@ -331,7 +339,7 @@ func (g *Group) Start(ctx context.Context, nts func() []notifier.Notifier, rw re
Rw: rw,
Notifiers: nts,
notifierHeaders: g.NotifierHeaders,
PreviouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
g.infof("started")
@@ -520,7 +528,7 @@ func (g *Group) ExecOnce(ctx context.Context, nts func() []notifier.Notifier, rw
Rw: rw,
Notifiers: nts,
notifierHeaders: g.NotifierHeaders,
PreviouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
if len(g.Rules) < 1 {
return nil
@@ -581,10 +589,14 @@ func (g *Group) adjustReqTimestamp(timestamp time.Time) time.Time {
// to 10:30, to the previous evaluationInterval.
return ts.Add(-g.Interval)
}
// EvalOffset shouldn't interfere with evalAlignment,
// so we return it immediately
// when `eval_offset` is using, ts shouldn't be effect by `eval_alignment` and `eval_delay`
// since it should be always aligned.
return ts
}
timestamp = timestamp.Add(-g.getEvalDelay())
// always apply the alignment as a last step
if g.evalAlignment == nil || *g.evalAlignment {
// align query time with interval to get similar result with grafana when plotting time series.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5049
@@ -594,6 +606,13 @@ func (g *Group) adjustReqTimestamp(timestamp time.Time) time.Time {
return timestamp
}
func (g *Group) getEvalDelay() time.Duration {
if g.EvalDelay != nil {
return *g.EvalDelay
}
return *evalDelay
}
// executor contains group's notify and rw configs
type executor struct {
Notifiers func() []notifier.Notifier
@@ -602,11 +621,11 @@ type executor struct {
Rw remotewrite.RWClient
previouslySentSeriesToRWMu sync.Mutex
// PreviouslySentSeriesToRW stores series sent to RW on previous iteration
// previouslySentSeriesToRW stores series sent to RW on previous iteration
// map[ruleID]map[ruleLabels][]prompb.Label
// where `ruleID` is ID of the Rule within a Group
// and `ruleLabels` is []prompb.Label marshalled to a string
PreviouslySentSeriesToRW map[uint64]map[string][]prompbmarshal.Label
previouslySentSeriesToRW map[uint64]map[string][]prompbmarshal.Label
}
// execConcurrently executes rules concurrently if concurrency>1
@@ -644,9 +663,6 @@ var (
execTotal = metrics.NewCounter(`vmalert_execution_total`)
execErrors = metrics.NewCounter(`vmalert_execution_errors_total`)
remoteWriteErrors = metrics.NewCounter(`vmalert_remotewrite_errors_total`)
remoteWriteTotal = metrics.NewCounter(`vmalert_remotewrite_total`)
)
func (e *executor) exec(ctx context.Context, r Rule, ts time.Time, resolveDuration time.Duration, limit int) error {
@@ -667,9 +683,7 @@ func (e *executor) exec(ctx context.Context, r Rule, ts time.Time, resolveDurati
pushToRW := func(tss []prompbmarshal.TimeSeries) error {
var lastErr error
for _, ts := range tss {
remoteWriteTotal.Inc()
if err := e.Rw.Push(ts); err != nil {
remoteWriteErrors.Inc()
lastErr = fmt.Errorf("rule %q: remote write failure: %w", r, err)
}
}
@@ -723,7 +737,7 @@ func (e *executor) getStaleSeries(r Rule, tss []prompbmarshal.TimeSeries, timest
var staleS []prompbmarshal.TimeSeries
// check whether there are series which disappeared and need to be marked as stale
e.previouslySentSeriesToRWMu.Lock()
for key, labels := range e.PreviouslySentSeriesToRW[rID] {
for key, labels := range e.previouslySentSeriesToRW[rID] {
if _, ok := ruleLabels[key]; ok {
continue
}
@@ -732,7 +746,7 @@ func (e *executor) getStaleSeries(r Rule, tss []prompbmarshal.TimeSeries, timest
staleS = append(staleS, ss)
}
// set previous series to current
e.PreviouslySentSeriesToRW[rID] = ruleLabels
e.previouslySentSeriesToRW[rID] = ruleLabels
e.previouslySentSeriesToRWMu.Unlock()
return staleS
@@ -750,14 +764,14 @@ func (e *executor) purgeStaleSeries(activeRules []Rule) {
for _, rule := range activeRules {
id := rule.ID()
prev, ok := e.PreviouslySentSeriesToRW[id]
prev, ok := e.previouslySentSeriesToRW[id]
if ok {
// keep previous series for staleness detection
newPreviouslySentSeriesToRW[id] = prev
}
}
e.PreviouslySentSeriesToRW = nil
e.PreviouslySentSeriesToRW = newPreviouslySentSeriesToRW
e.previouslySentSeriesToRW = nil
e.previouslySentSeriesToRW = newPreviouslySentSeriesToRW
e.previouslySentSeriesToRWMu.Unlock()
}

View File

@@ -321,7 +321,7 @@ func TestResolveDuration(t *testing.T) {
func TestGetStaleSeries(t *testing.T) {
ts := time.Now()
e := &executor{
PreviouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
f := func(r Rule, labels, expLabels [][]prompbmarshal.Label) {
t.Helper()
@@ -414,7 +414,7 @@ func TestPurgeStaleSeries(t *testing.T) {
f := func(curRules, newRules, expStaleRules []Rule) {
t.Helper()
e := &executor{
PreviouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
// seed executor with series for
// current rules
@@ -424,13 +424,13 @@ func TestPurgeStaleSeries(t *testing.T) {
e.purgeStaleSeries(newRules)
if len(e.PreviouslySentSeriesToRW) != len(expStaleRules) {
if len(e.previouslySentSeriesToRW) != len(expStaleRules) {
t.Fatalf("expected to get %d stale series, got %d",
len(expStaleRules), len(e.PreviouslySentSeriesToRW))
len(expStaleRules), len(e.previouslySentSeriesToRW))
}
for _, exp := range expStaleRules {
if _, ok := e.PreviouslySentSeriesToRW[exp.ID()]; !ok {
if _, ok := e.previouslySentSeriesToRW[exp.ID()]; !ok {
t.Fatalf("expected to have rule %d; got nil instead", exp.ID())
}
}
@@ -515,7 +515,7 @@ func TestFaultyRW(t *testing.T) {
e := &executor{
Rw: &remotewrite.Client{},
PreviouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
err := e.exec(context.Background(), r, time.Now(), 0, 10)
@@ -628,6 +628,7 @@ func TestGroupStartDelay(t *testing.T) {
func TestGetPrometheusReqTimestamp(t *testing.T) {
offset := 30 * time.Minute
evalDelay := 1 * time.Minute
disableAlign := false
testCases := []struct {
name string
@@ -635,7 +636,7 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
originTS, expTS string
}{
{
"with query align",
"with query align + default evalDelay",
&Group{
Interval: time.Hour,
},
@@ -643,16 +644,16 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
"2023-08-28T11:00:00+00:00",
},
{
"without query align",
"without query align + default evalDelay",
&Group{
Interval: time.Hour,
evalAlignment: &disableAlign,
},
"2023-08-28T11:11:00+00:00",
"2023-08-28T11:11:00+00:00",
"2023-08-28T11:10:30+00:00",
},
{
"with eval_offset, find previous offset point",
"with eval_offset, find previous offset point + default evalDelay",
&Group{
EvalOffset: &offset,
Interval: time.Hour,
@@ -661,7 +662,7 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
"2023-08-28T10:30:00+00:00",
},
{
"with eval_offset",
"with eval_offset + default evalDelay",
&Group{
EvalOffset: &offset,
Interval: time.Hour,
@@ -669,14 +670,44 @@ func TestGetPrometheusReqTimestamp(t *testing.T) {
"2023-08-28T11:41:00+00:00",
"2023-08-28T11:30:00+00:00",
},
{
"1h interval with eval_delay",
&Group{
EvalDelay: &evalDelay,
Interval: time.Hour,
},
"2023-08-28T11:41:00+00:00",
"2023-08-28T11:00:00+00:00",
},
{
"1m interval with eval_delay",
&Group{
EvalDelay: &evalDelay,
Interval: time.Minute,
},
"2023-08-28T11:41:13+00:00",
"2023-08-28T11:40:00+00:00",
},
{
"disable alignment with eval_delay",
&Group{
EvalDelay: &evalDelay,
Interval: time.Hour,
evalAlignment: &disableAlign,
},
"2023-08-28T11:41:00+00:00",
"2023-08-28T11:40:00+00:00",
},
}
for _, tc := range testCases {
originT, _ := time.Parse(time.RFC3339, tc.originTS)
expT, _ := time.Parse(time.RFC3339, tc.expTS)
gotTS := tc.g.adjustReqTimestamp(originT)
if !gotTS.Equal(expT) {
t.Fatalf("get wrong prometheus request timestamp, expect %s, got %s", expT, gotTS)
}
t.Run(tc.name, func(t *testing.T) {
originT, _ := time.Parse(time.RFC3339, tc.originTS)
expT, _ := time.Parse(time.RFC3339, tc.expTS)
gotTS := tc.g.adjustReqTimestamp(originT)
if !gotTS.Equal(expT) {
t.Fatalf("get wrong prometheus request timestamp, expect %s, got %s", expT, gotTS)
}
})
}
}

View File

@@ -78,7 +78,7 @@ func NewRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
entries: make([]StateEntry, entrySize),
}
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
labels := fmt.Sprintf(`recording=%q, group=%q, file=%q, id="%d"`, rr.Name, group.Name, group.File, rr.ID())
rr.metrics.errors = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
func() float64 {
e := rr.state.getLast()

View File

@@ -166,7 +166,7 @@ func replayRule(r Rule, start, end time.Time, rw remotewrite.RWClient, replayRul
var n int
for _, ts := range tss {
if err := rw.Push(ts); err != nil {
return n, fmt.Errorf("remote write failure: %s", err)
return n, fmt.Errorf("remote write failure: %w", err)
}
n += len(ts.Samples)
}

View File

@@ -147,11 +147,11 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
func (rh *requestHandler) getRule(r *http.Request) (apiRule, error) {
groupID, err := strconv.ParseUint(r.FormValue(paramGroupID), 10, 64)
if err != nil {
return apiRule{}, fmt.Errorf("failed to read %q param: %s", paramGroupID, err)
return apiRule{}, fmt.Errorf("failed to read %q param: %w", paramGroupID, err)
}
ruleID, err := strconv.ParseUint(r.FormValue(paramRuleID), 10, 64)
if err != nil {
return apiRule{}, fmt.Errorf("failed to read %q param: %s", paramRuleID, err)
return apiRule{}, fmt.Errorf("failed to read %q param: %w", paramRuleID, err)
}
obj, err := rh.m.ruleAPI(groupID, ruleID)
if err != nil {
@@ -163,11 +163,11 @@ func (rh *requestHandler) getRule(r *http.Request) (apiRule, error) {
func (rh *requestHandler) getAlert(r *http.Request) (*apiAlert, error) {
groupID, err := strconv.ParseUint(r.FormValue(paramGroupID), 10, 64)
if err != nil {
return nil, fmt.Errorf("failed to read %q param: %s", paramGroupID, err)
return nil, fmt.Errorf("failed to read %q param: %w", paramGroupID, err)
}
alertID, err := strconv.ParseUint(r.FormValue(paramAlertID), 10, 64)
if err != nil {
return nil, fmt.Errorf("failed to read %q param: %s", paramAlertID, err)
return nil, fmt.Errorf("failed to read %q param: %w", paramAlertID, err)
}
a, err := rh.m.alertAPI(groupID, alertID)
if err != nil {

View File

@@ -94,6 +94,10 @@ type apiGroup struct {
NotifierHeaders []string `json:"notifier_headers,omitempty"`
// Labels is a set of label value pairs, that will be added to every rule.
Labels map[string]string `json:"labels,omitempty"`
// EvalOffset Group will be evaluated at the exact time offset on the range of [0...evaluationInterval]
EvalOffset float64 `json:"eval_offset,omitempty"`
// EvalDelay will adjust the `time` parameter of rule evaluation requests to compensate intentional query delay from datasource.
EvalDelay float64 `json:"eval_delay,omitempty"`
}
// groupAlerts represents a group of alerts for WEB view
@@ -309,6 +313,12 @@ func groupToAPI(g *rule.Group) apiGroup {
Labels: g.Labels,
}
if g.EvalOffset != nil {
ag.EvalOffset = g.EvalOffset.Seconds()
}
if g.EvalDelay != nil {
ag.EvalDelay = g.EvalDelay.Seconds()
}
ag.Rules = make([]apiRule, 0)
for _, r := range g.Rules {
ag.Rules = append(ag.Rules, ruleToAPI(r))

View File

@@ -32,6 +32,38 @@ Pass `-help` to `vmauth` in order to see all the supported command-line flags wi
Feel free [contacting us](mailto:info@victoriametrics.com) if you need customized auth proxy for VictoriaMetrics with the support of LDAP, SSO, RBAC, SAML,
accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.com/vmgateway.html).
## Dropping request path prefix
By default `vmauth` doesn't drop the path prefix from the original request when proxying the request to the matching backend.
Sometimes it is needed to drop path prefix before routing the request to the backend. This can be done by specifying the number of `/`-delimited
prefix parts to drop from the request path via `drop_src_path_prefix_parts` option at `url_map` level or at `user` level.
For example, if you need to serve requests to [vmalert](https://docs.victoriametrics.com/vmalert.html) at `/vmalert/` path prefix,
while serving requests to [vmagent](https://docs.victoriametrics.com/vmagent.html) at `/vmagent/` path prefix for a particular user,
then the following [-auth.config](#auth-config) can be used:
```yml
users:
- username: foo
url_map:
# proxy all the requests, which start with `/vmagent/`, to vmagent backend
- src_paths:
- "/vmagent/.+"
# drop /vmagent/ path prefix from the original request before proxying it to url_prefix.
drop_src_path_prefix_parts: 1
url_prefix: "http://vmagent-backend:8429/"
# proxy all the requests, which start with `/vmalert`, to vmalert backend
- src_paths:
- "/vmalert/.+"
# drop /vmalert/ path prefix from the original request before proxying it to url_prefix.
drop_src_path_prefix_parts: 1
url_prefix: "http://vmalert-backend:8880/"
```
## Load balancing
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls.
@@ -101,6 +133,31 @@ The following [metrics](#monitoring) related to concurrency limits are exposed b
- `vmauth_unauthorized_user_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
because of the concurrency limit has been reached for unauthorized users (if `unauthorized_user` section is used).
## Backend TLS setup
By default `vmauth` uses system settings when performing requests to HTTPS backends specified via `url_prefix` option
in the [`-auth.config`](https://docs.victoriametrics.com/vmauth.html#auth-config). These settings can be overridden with the following command-line flags:
- `-backend.tlsInsecureSkipVerify` allows skipping TLS verification when connecting to HTTPS backends.
This global setting can be overridden at per-user level inside [`-auth.config`](https://docs.victoriametrics.com/vmauth.html#auth-config)
via `tls_insecure_skip_verify` option. For example:
```yml
- username: "foo"
url_prefix: "https://localhost"
tls_insecure_skip_verify: true
```
- `-backend.tlsCAFile` allows specifying the path to TLS Root CA, which will be used for TLS verification when connecting to HTTPS backends.
The `-backend.tlsCAFile` may point either to local file or to `http` / `https` url.
This global setting can be overridden at per-user level inside [`-auth.config`](https://docs.victoriametrics.com/vmauth.html#auth-config)
via `tls_ca_file` option. For example:
```yml
- username: "foo"
url_prefix: "https://localhost"
tls_ca_file: "/path/to/tls/root/ca"
```
## IP filters
@@ -181,6 +238,15 @@ users:
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# are proxied to https://localhost:8428.
# For example, http://vmauth:8427/api/v1/query is routed to https://localhost/api/v1/query
# TLS verification is skipped for https://localhost.
- username: "local-single-node-with-tls"
password: "***"
url_prefix: "https://localhost"
tls_insecure_skip_verify: true
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
@@ -222,6 +288,8 @@ users:
# For example, request to http://vmauth:8427/non/existing/path are proxied:
# - to http://default1:8888/unsupported_url_handler?request_path=/non/existing/path
# - or http://default2:8888/unsupported_url_handler?request_path=/non/existing/path
#
# Regular expressions are allowed in `src_paths` entries.
- username: "foobar"
url_map:
- src_paths:
@@ -248,6 +316,8 @@ users:
# Requests are routed in round-robin fashion between `url_prefix` backends.
# The deny_partial_response query arg is added to all the routed requests.
# The requests are re-tried if url_prefix backends send 500 or 503 response status codes.
# Note that the unauthorized_user section takes precedence when processing a route without credentials,
# even if such a route also exists in the users section (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5236).
unauthorized_user:
url_prefix:
- http://vmselect-az1/?deny_partial_response=1
@@ -408,6 +478,12 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration

View File

@@ -5,6 +5,7 @@ import (
"encoding/base64"
"flag"
"fmt"
"net/http"
"net/url"
"os"
"regexp"
@@ -14,13 +15,14 @@ import (
"sync/atomic"
"time"
"github.com/VictoriaMetrics/metrics"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/metrics"
"gopkg.in/yaml.v2"
)
var (
@@ -38,20 +40,25 @@ type AuthConfig struct {
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
HeadersConf HeadersConf `yaml:",inline"`
MaxConcurrentRequests int `yaml:"max_concurrent_requests,omitempty"`
DefaultURL *URLPrefix `yaml:"default_url,omitempty"`
RetryStatusCodes []int `yaml:"retry_status_codes,omitempty"`
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
HeadersConf HeadersConf `yaml:",inline"`
MaxConcurrentRequests int `yaml:"max_concurrent_requests,omitempty"`
DefaultURL *URLPrefix `yaml:"default_url,omitempty"`
RetryStatusCodes []int `yaml:"retry_status_codes,omitempty"`
DropSrcPathPrefixParts int `yaml:"drop_src_path_prefix_parts,omitempty"`
TLSInsecureSkipVerify *bool `yaml:"tls_insecure_skip_verify,omitempty"`
TLSCAFile string `yaml:"tls_ca_file,omitempty"`
concurrencyLimitCh chan struct{}
concurrencyLimitReached *metrics.Counter
httpTransport *http.Transport
requests *metrics.Counter
requestsDuration *metrics.Summary
}
@@ -113,10 +120,11 @@ func (h *Header) MarshalYAML() (interface{}, error) {
// URLMap is a mapping from source paths to target urls.
type URLMap struct {
SrcPaths []*SrcPath `yaml:"src_paths,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
HeadersConf HeadersConf `yaml:",inline"`
RetryStatusCodes []int `yaml:"retry_status_codes,omitempty"`
SrcPaths []*SrcPath `yaml:"src_paths,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
HeadersConf HeadersConf `yaml:",inline"`
RetryStatusCodes []int `yaml:"retry_status_codes,omitempty"`
DropSrcPathPrefixParts int `yaml:"drop_src_path_prefix_parts,omitempty"`
}
// SrcPath represents an src path
@@ -420,6 +428,18 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
}
ui := ac.UnauthorizedUser
if ui != nil {
if ui.Username != "" {
return nil, fmt.Errorf("field username can't be specified for unauthorized_user section")
}
if ui.Password != "" {
return nil, fmt.Errorf("field password can't be specified for unauthorized_user section")
}
if ui.BearerToken != "" {
return nil, fmt.Errorf("field bearer_token can't be specified for unauthorized_user section")
}
if ui.Name != "" {
return nil, fmt.Errorf("field name can't be specified for unauthorized_user section")
}
ui.requests = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_requests_total`)
ui.requestsDuration = metrics.GetOrCreateSummary(`vmauth_unauthorized_user_request_duration_seconds`)
ui.concurrencyLimitCh = make(chan struct{}, ui.getMaxConcurrentRequests())
@@ -430,6 +450,11 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_current`, func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
tr, err := getTransport(ui.TLSInsecureSkipVerify, ui.TLSCAFile)
if err != nil {
return nil, fmt.Errorf("cannot initialize HTTP transport: %w", err)
}
ui.httpTransport = tr
}
return &ac, nil
}
@@ -500,6 +525,12 @@ func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
tr, err := getTransport(ui.TLSInsecureSkipVerify, ui.TLSCAFile)
if err != nil {
return nil, fmt.Errorf("cannot initialize HTTP transport: %w", err)
}
ui.httpTransport = tr
byAuthToken[at1] = ui
byAuthToken[at2] = ui
}

View File

@@ -221,22 +221,26 @@ func TestParseAuthConfigSuccess(t *testing.T) {
}
// Single user
insecureSkipVerifyTrue := true
f(`
users:
- username: foo
password: bar
url_prefix: http://aaa:343/bbb
max_concurrent_requests: 5
tls_insecure_skip_verify: true
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
MaxConcurrentRequests: 5,
TLSInsecureSkipVerify: &insecureSkipVerifyTrue,
},
})
// Multiple url_prefix entries
insecureSkipVerifyFalse := false
f(`
users:
- username: foo
@@ -244,6 +248,9 @@ users:
url_prefix:
- http://node1:343/bbb
- http://node2:343/bbb
tls_insecure_skip_verify: false
retry_status_codes: [500, 501]
drop_src_path_prefix_parts: 1
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
@@ -252,6 +259,9 @@ users:
"http://node1:343/bbb",
"http://node2:343/bbb",
}),
TLSInsecureSkipVerify: &insecureSkipVerifyFalse,
RetryStatusCodes: []int{500, 501},
DropSrcPathPrefixParts: 1,
},
})
@@ -448,6 +458,47 @@ users:
}
func TestParseAuthConfigPassesTLSVerificationConfig(t *testing.T) {
c := `
users:
- username: foo
password: bar
url_prefix: https://aaa/bbb
max_concurrent_requests: 5
tls_insecure_skip_verify: true
unauthorized_user:
url_prefix: http://aaa:343/bbb
max_concurrent_requests: 5
tls_insecure_skip_verify: false
`
ac, err := parseAuthConfig([]byte(c))
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
m, err := parseAuthConfigUsers(ac)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
ui := m[getAuthToken("", "foo", "bar")]
if !isSetBool(ui.TLSInsecureSkipVerify, true) || !ui.httpTransport.TLSClientConfig.InsecureSkipVerify {
t.Fatalf("unexpected TLSInsecureSkipVerify value for user foo")
}
if !isSetBool(ac.UnauthorizedUser.TLSInsecureSkipVerify, false) || ac.UnauthorizedUser.httpTransport.TLSClientConfig.InsecureSkipVerify {
t.Fatalf("unexpected TLSInsecureSkipVerify value for unauthorized_user")
}
}
func isSetBool(boolP *bool, expectedValue bool) bool {
if boolP == nil {
return false
}
return *boolP == expectedValue
}
func getSrcPaths(paths []string) []*SrcPath {
var sps []*SrcPath
for _, path := range paths {

View File

@@ -42,6 +42,15 @@ users:
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# are proxied to https://localhost:8428
# For example, http://vmauth:8427/api/v1/query is routed to https://localhost/api/v1/query
# TLS verification is ignored for https://localhost.
- username: "local-single-node-with-tls"
password: "***"
url_prefix: "https://localhost"
tls_insecure_skip_verify: true
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
@@ -82,6 +91,8 @@ users:
# For example, request to http://vmauth:8427/non/existing/path are proxied:
# - to http://default1:8888/unsupported_url_handler?request_path=/non/existing/path
# - or http://default2:8888/unsupported_url_handler?request_path=/non/existing/path
#
# Regular expressions are allowed in `src_paths` entries.
- username: "foobar"
url_map:
- src_paths:

View File

@@ -20,6 +20,8 @@ users:
# For example, request to http://vmauth:8427/non/existing/path are proxied:
# - to http://default1:8888/unsupported_url_handler?request_path=/non/existing/path
# - or http://default2:8888/unsupported_url_handler?request_path=/non/existing/path
#
# Regular expressions are allowed in `src_paths` entries.
- username: "foobar"
url_map:
- src_paths:

View File

@@ -2,6 +2,8 @@ package main
import (
"context"
"crypto/tls"
"crypto/x509"
"errors"
"flag"
"fmt"
@@ -15,16 +17,19 @@ import (
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
@@ -46,6 +51,10 @@ var (
failTimeout = flag.Duration("failTimeout", 3*time.Second, "Sets a delay period for load balancing to skip a malfunctioning backend")
maxRequestBodySizeToRetry = flagutil.NewBytes("maxRequestBodySizeToRetry", 16*1024, "The maximum request body size, which can be cached and re-tried at other backends. "+
"Bigger values may require more memory")
backendTLSInsecureSkipVerify = flag.Bool("backend.tlsInsecureSkipVerify", false, "Whether to skip TLS verification when connecting to backends over HTTPS. "+
"See https://docs.victoriametrics.com/vmauth.html#backend-tls-setup")
backendTLSCAFile = flag.String("backend.TLSCAFile", "", "Optional path to TLS root CA file, which is used for TLS verification when connecting to backends over HTTPS. "+
"See https://docs.victoriametrics.com/vmauth.html#backend-tls-setup")
)
func main() {
@@ -155,11 +164,19 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
u := normalizeURL(r.URL)
up, hc, retryStatusCodes := ui.getURLPrefixAndHeaders(u)
up, hc, retryStatusCodes, dropSrcPathPrefixParts := ui.getURLPrefixAndHeaders(u)
isDefault := false
if up == nil {
missingRouteRequests.Inc()
if ui.DefaultURL == nil {
// Authorization should be requested for http requests without credentials
// to a route that is not in the configuration for unauthorized user.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5236
if ui.BearerToken == "" && ui.Username == "" && len(*authUsers.Load()) > 0 {
w.Header().Set("WWW-Authenticate", `Basic realm="Restricted"`)
http.Error(w, "missing `Authorization` request header", http.StatusUnauthorized)
return
}
missingRouteRequests.Inc()
httpserver.Errorf(w, r, "missing route for %q", u.String())
return
}
@@ -181,9 +198,9 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
query.Set("request_path", u.Path)
targetURL.RawQuery = query.Encode()
} else { // Update path for regular routes.
targetURL = mergeURLs(targetURL, u)
targetURL = mergeURLs(targetURL, u, dropSrcPathPrefixParts)
}
ok := tryProcessingRequest(w, r, targetURL, hc, retryStatusCodes)
ok := tryProcessingRequest(w, r, targetURL, hc, retryStatusCodes, ui.httpTransport)
bu.put()
if ok {
return
@@ -197,12 +214,12 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
httpserver.Errorf(w, r, "%s", err)
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int) bool {
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int, transport *http.Transport) bool {
// This code has been copied from net/http/httputil/reverseproxy.go
req := sanitizeRequestHeaders(r)
req.URL = targetURL
req.Host = targetURL.Host
updateHeadersByConfig(req.Header, hc.RequestHeaders)
transportOnce.Do(transportInit)
res, err := transport.RoundTrip(req)
rtb, rtbOK := req.Body.(*readTrackingBody)
if err != nil {
@@ -345,23 +362,77 @@ var (
missingRouteRequests = metrics.NewCounter(`vmauth_http_request_errors_total{reason="missing_route"}`)
)
var (
transport *http.Transport
transportOnce sync.Once
)
func getTransport(insecureSkipVerifyP *bool, caFile string) (*http.Transport, error) {
if insecureSkipVerifyP == nil {
insecureSkipVerifyP = backendTLSInsecureSkipVerify
}
insecureSkipVerify := *insecureSkipVerifyP
if caFile == "" {
caFile = *backendTLSCAFile
}
func transportInit() {
bb := bbPool.Get()
defer bbPool.Put(bb)
bb.B = appendTransportKey(bb.B[:0], insecureSkipVerify, caFile)
transportMapLock.Lock()
defer transportMapLock.Unlock()
tr := transportMap[string(bb.B)]
if tr == nil {
trLocal, err := newTransport(insecureSkipVerify, caFile)
if err != nil {
return nil, err
}
transportMap[string(bb.B)] = trLocal
tr = trLocal
}
return tr, nil
}
var transportMap = make(map[string]*http.Transport)
var transportMapLock sync.Mutex
func appendTransportKey(dst []byte, insecureSkipVerify bool, caFile string) []byte {
dst = encoding.MarshalBool(dst, insecureSkipVerify)
dst = encoding.MarshalBytes(dst, bytesutil.ToUnsafeBytes(caFile))
return dst
}
var bbPool bytesutil.ByteBufferPool
func newTransport(insecureSkipVerify bool, caFile string) (*http.Transport, error) {
tr := http.DefaultTransport.(*http.Transport).Clone()
tr.ResponseHeaderTimeout = *responseTimeout
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
transport = tr
tlsCfg := tr.TLSClientConfig
if tlsCfg == nil {
tlsCfg = &tls.Config{}
tr.TLSClientConfig = tlsCfg
}
if insecureSkipVerify || caFile != "" {
tlsCfg.ClientSessionCache = tls.NewLRUClientSessionCache(0)
tlsCfg.InsecureSkipVerify = insecureSkipVerify
if caFile != "" {
data, err := fs.ReadFileOrHTTP(caFile)
if err != nil {
return nil, fmt.Errorf("cannot read tls_ca_file: %w", err)
}
rootCA := x509.NewCertPool()
if !rootCA.AppendCertsFromPEM(data) {
return nil, fmt.Errorf("cannot parse data read from tls_ca_file %q", caFile)
}
tlsCfg.RootCAs = rootCA
}
}
return tr, nil
}
var (

View File

@@ -6,12 +6,13 @@ import (
"strings"
)
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
func mergeURLs(uiURL, requestURI *url.URL, dropSrcPathPrefixParts int) *url.URL {
targetURL := *uiURL
if strings.HasPrefix(requestURI.Path, "/") {
srcPath := dropPrefixParts(requestURI.Path, dropSrcPathPrefixParts)
if strings.HasPrefix(srcPath, "/") {
targetURL.Path = strings.TrimSuffix(targetURL.Path, "/")
}
targetURL.Path += requestURI.Path
targetURL.Path += srcPath
requestParams := requestURI.Query()
// fast path
if len(requestParams) == 0 {
@@ -32,18 +33,34 @@ func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
return &targetURL
}
func (ui *UserInfo) getURLPrefixAndHeaders(u *url.URL) (*URLPrefix, HeadersConf, []int) {
func dropPrefixParts(path string, parts int) string {
if parts <= 0 {
return path
}
for parts > 0 {
path = strings.TrimPrefix(path, "/")
n := strings.IndexByte(path, '/')
if n < 0 {
return ""
}
path = path[n:]
parts--
}
return path
}
func (ui *UserInfo) getURLPrefixAndHeaders(u *url.URL) (*URLPrefix, HeadersConf, []int, int) {
for _, e := range ui.URLMaps {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix, e.HeadersConf, e.RetryStatusCodes
return e.URLPrefix, e.HeadersConf, e.RetryStatusCodes, e.DropSrcPathPrefixParts
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix, ui.HeadersConf, ui.RetryStatusCodes
return ui.URLPrefix, ui.HeadersConf, ui.RetryStatusCodes, ui.DropSrcPathPrefixParts
}
return nil, HeadersConf{}, nil
return nil, HeadersConf{}, nil, 0
}
func normalizeURL(uOrig *url.URL) *url.URL {

View File

@@ -7,20 +7,91 @@ import (
"testing"
)
func TestDropPrefixParts(t *testing.T) {
f := func(path string, parts int, expectedResult string) {
t.Helper()
result := dropPrefixParts(path, parts)
if result != expectedResult {
t.Fatalf("unexpected result; got %q; want %q", result, expectedResult)
}
}
f("", 0, "")
f("", 1, "")
f("", 10, "")
f("foo", 0, "foo")
f("foo", -1, "foo")
f("foo", 1, "")
f("/foo", 0, "/foo")
f("/foo/bar", 0, "/foo/bar")
f("/foo/bar/baz", 0, "/foo/bar/baz")
f("foo", 0, "foo")
f("foo/bar", 0, "foo/bar")
f("foo/bar/baz", 0, "foo/bar/baz")
f("/foo/", 0, "/foo/")
f("/foo/bar/", 0, "/foo/bar/")
f("/foo/bar/baz/", 0, "/foo/bar/baz/")
f("/foo", 1, "")
f("/foo/bar", 1, "/bar")
f("/foo/bar/baz", 1, "/bar/baz")
f("foo", 1, "")
f("foo/bar", 1, "/bar")
f("foo/bar/baz", 1, "/bar/baz")
f("/foo/", 1, "/")
f("/foo/bar/", 1, "/bar/")
f("/foo/bar/baz/", 1, "/bar/baz/")
f("/foo", 2, "")
f("/foo/bar", 2, "")
f("/foo/bar/baz", 2, "/baz")
f("foo", 2, "")
f("foo/bar", 2, "")
f("foo/bar/baz", 2, "/baz")
f("/foo/", 2, "")
f("/foo/bar/", 2, "/")
f("/foo/bar/baz/", 2, "/baz/")
f("/foo", 3, "")
f("/foo/bar", 3, "")
f("/foo/bar/baz", 3, "")
f("foo", 3, "")
f("foo/bar", 3, "")
f("foo/bar/baz", 3, "")
f("/foo/", 3, "")
f("/foo/bar/", 3, "")
f("/foo/bar/baz/", 3, "/")
f("/foo/", 4, "")
f("/foo/bar/", 4, "")
f("/foo/bar/baz/", 4, "")
}
func TestCreateTargetURLSuccess(t *testing.T) {
f := func(ui *UserInfo, requestURI, expectedTarget, expectedRequestHeaders, expectedResponseHeaders string, expectedRetryStatusCodes []int) {
f := func(ui *UserInfo, requestURI, expectedTarget, expectedRequestHeaders, expectedResponseHeaders string,
expectedRetryStatusCodes []int, expectedDropSrcPathPrefixParts int) {
t.Helper()
u, err := url.Parse(requestURI)
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
u = normalizeURL(u)
up, hc, retryStatusCodes := ui.getURLPrefixAndHeaders(u)
up, hc, retryStatusCodes, dropSrcPathPrefixParts := ui.getURLPrefixAndHeaders(u)
if up == nil {
t.Fatalf("cannot determie backend: %s", err)
}
bu := up.getLeastLoadedBackendURL()
target := mergeURLs(bu.url, u)
target := mergeURLs(bu.url, u, dropSrcPathPrefixParts)
bu.put()
if target.String() != expectedTarget {
t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
@@ -32,11 +103,14 @@ func TestCreateTargetURLSuccess(t *testing.T) {
if !reflect.DeepEqual(retryStatusCodes, expectedRetryStatusCodes) {
t.Fatalf("unexpected retryStatusCodes; got %d; want %d", retryStatusCodes, expectedRetryStatusCodes)
}
if dropSrcPathPrefixParts != expectedDropSrcPathPrefixParts {
t.Fatalf("unexpected dropSrcPathPrefixParts; got %d; want %d", dropSrcPathPrefixParts, expectedDropSrcPathPrefixParts)
}
}
// Simple routing with `url_prefix`
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "", "http://foo.bar/.", "[]", "[]", nil)
}, "", "http://foo.bar/.", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
HeadersConf: HeadersConf{
@@ -45,29 +119,30 @@ func TestCreateTargetURLSuccess(t *testing.T) {
Value: "aaa",
}},
},
RetryStatusCodes: []int{503, 501},
}, "/", "http://foo.bar", `[{"bb" "aaa"}]`, `[]`, []int{503, 501})
RetryStatusCodes: []int{503, 501},
DropSrcPathPrefixParts: 2,
}, "/a/b/c", "http://foo.bar/c", `[{"bb" "aaa"}]`, `[]`, []int{503, 501}, 2)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar/federate"),
}, "/", "http://foo.bar/federate", "[]", "[]", nil)
}, "/", "http://foo.bar/federate", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "a/b?c=d", "http://foo.bar/a/b?c=d", "[]", "[]", nil)
}, "a/b?c=d", "http://foo.bar/a/b?c=d", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/z", "https://sss:3894/x/y/z", "[]", "[]", nil)
}, "/z", "https://sss:3894/x/y/z", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/../../aaa", "https://sss:3894/x/y/aaa", "[]", "[]", nil)
}, "/../../aaa", "https://sss:3894/x/y/aaa", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/./asd/../../aaa?a=d&s=s/../d", "https://sss:3894/x/y/aaa?a=d&s=s%2F..%2Fd", "[]", "[]", nil)
}, "/./asd/../../aaa?a=d&s=s/../d", "https://sss:3894/x/y/aaa?a=d&s=s%2F..%2Fd", "[]", "[]", nil, 0)
// Complex routing with `url_map`
ui := &UserInfo{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
SrcPaths: getSrcPaths([]string{"/vmsingle/api/v1/query"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
HeadersConf: HeadersConf{
RequestHeaders: []Header{
@@ -87,7 +162,8 @@ func TestCreateTargetURLSuccess(t *testing.T) {
},
},
},
RetryStatusCodes: []int{503, 500, 501},
RetryStatusCodes: []int{503, 500, 501},
DropSrcPathPrefixParts: 1,
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
@@ -105,11 +181,12 @@ func TestCreateTargetURLSuccess(t *testing.T) {
Value: "y",
}},
},
RetryStatusCodes: []int{502},
RetryStatusCodes: []int{502},
DropSrcPathPrefixParts: 2,
}
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up", `[{"xx" "aa"} {"yy" "asdf"}]`, `[{"qwe" "rty"}]`, []int{503, 500, 501})
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write", "[]", "[]", nil)
f(ui, "/api/v1/query_range", "http://default-server/api/v1/query_range", `[{"bb" "aaa"}]`, `[{"x" "y"}]`, []int{502})
f(ui, "/vmsingle/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up", `[{"xx" "aa"} {"yy" "asdf"}]`, `[{"qwe" "rty"}]`, []int{503, 500, 501}, 1)
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write", "[]", "[]", nil, 0)
f(ui, "/foo/bar/api/v1/query_range", "http://default-server/api/v1/query_range", `[{"bb" "aaa"}]`, `[{"x" "y"}]`, []int{502}, 2)
// Complex routing regexp paths in `url_map`
ui = &UserInfo{
@@ -125,17 +202,17 @@ func TestCreateTargetURLSuccess(t *testing.T) {
},
URLPrefix: mustParseURL("http://default-server"),
}
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up", "[]", "[]", nil)
f(ui, "/api/v1/query_range?query=up", "http://vmselect/0/prometheus/api/v1/query_range?query=up", "[]", "[]", nil)
f(ui, "/api/v1/label/foo/values", "http://vmselect/0/prometheus/api/v1/label/foo/values", "[]", "[]", nil)
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write", "[]", "[]", nil)
f(ui, "/api/v1/foo/bar", "http://default-server/api/v1/foo/bar", "[]", "[]", nil)
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up", "[]", "[]", nil, 0)
f(ui, "/api/v1/query_range?query=up", "http://vmselect/0/prometheus/api/v1/query_range?query=up", "[]", "[]", nil, 0)
f(ui, "/api/v1/label/foo/values", "http://vmselect/0/prometheus/api/v1/label/foo/values", "[]", "[]", nil, 0)
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write", "[]", "[]", nil, 0)
f(ui, "/api/v1/foo/bar", "http://default-server/api/v1/foo/bar", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar?extra_label=team=dev"),
}, "/api/v1/query", "http://foo.bar/api/v1/query?extra_label=team=dev", "[]", "[]", nil)
}, "/api/v1/query", "http://foo.bar/api/v1/query?extra_label=team=dev", "[]", "[]", nil, 0)
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar?extra_label=team=mobile"),
}, "/api/v1/query?extra_label=team=dev", "http://foo.bar/api/v1/query?extra_label=team%3Dmobile", "[]", "[]", nil)
}, "/api/v1/query?extra_label=team=dev", "http://foo.bar/api/v1/query?extra_label=team%3Dmobile", "[]", "[]", nil, 0)
}
func TestCreateTargetURLFailure(t *testing.T) {
@@ -146,7 +223,7 @@ func TestCreateTargetURLFailure(t *testing.T) {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
u = normalizeURL(u)
up, hc, retryStatusCodes := ui.getURLPrefixAndHeaders(u)
up, hc, retryStatusCodes, dropSrcPathPrefixParts := ui.getURLPrefixAndHeaders(u)
if up != nil {
t.Fatalf("unexpected non-empty up=%#v", up)
}
@@ -159,6 +236,9 @@ func TestCreateTargetURLFailure(t *testing.T) {
if retryStatusCodes != nil {
t.Fatalf("unexpected non-empty retryStatusCodes=%d", retryStatusCodes)
}
if dropSrcPathPrefixParts != 0 {
t.Fatalf("unexpected non-zero dropSrcPathPrefixParts=%d", dropSrcPathPrefixParts)
}
}
f(&UserInfo{}, "/foo/bar")
f(&UserInfo{

View File

@@ -316,6 +316,12 @@ Run `vmbackup -help` in order to see all the available options:
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration

View File

@@ -168,7 +168,7 @@ See the docs at https://docs.victoriametrics.com/vmbackup.html .
func newSrcFS() (*fslocal.FS, error) {
if err := snapshot.Validate(*snapshotName); err != nil {
return nil, fmt.Errorf("invalid -snapshotName=%q: %s", *snapshotName, err)
return nil, fmt.Errorf("invalid -snapshotName=%q: %w", *snapshotName, err)
}
snapshotPath := filepath.Join(*storageDataPath, "snapshots", *snapshotName)

View File

@@ -450,6 +450,12 @@ command-line flags:
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration

View File

@@ -1,5 +1,4 @@
//go:build aix || linux || solaris || zos
// +build aix linux solaris zos
package terminal

View File

@@ -1,5 +1,4 @@
//go:build darwin || freebsd || openbsd
// +build darwin freebsd openbsd
package terminal

View File

@@ -1,5 +1,4 @@
//go:build windows
// +build windows
package terminal

View File

@@ -353,6 +353,12 @@ The shortlist of configuration flags include the following:
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration

View File

@@ -113,6 +113,12 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.header.csp string
Value for 'Content-Security-Policy' header
-http.header.frameOptions string
Value for 'X-Frame-Options' header
-http.header.hsts string
Value for 'Strict-Transport-Security' header
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration

View File

@@ -81,7 +81,7 @@ var funcs = func() map[string]*funcInfo {
var m map[string]*funcInfo
if err := json.Unmarshal(funcsJSON, &m); err != nil {
// Do not use logger.Panicf, since it isn't ready yet.
panic(fmt.Errorf("cannot parse funcsJSON: %s", err))
panic(fmt.Errorf("cannot parse funcsJSON: %w", err))
}
return m
}()

View File

@@ -197,7 +197,7 @@ func ExportCSVHandler(startTime time.Time, w http.ResponseWriter, r *http.Reques
return err
}
if err := b.UnmarshalData(); err != nil {
return fmt.Errorf("cannot unmarshal block during export: %s", err)
return fmt.Errorf("cannot unmarshal block during export: %w", err)
}
xb := exportBlockPool.Get().(*exportBlock)
xb.mn = mn
@@ -407,7 +407,7 @@ func exportHandler(qt *querytracer.Tracer, w http.ResponseWriter, cp *commonPara
return err
}
if err := b.UnmarshalData(); err != nil {
return fmt.Errorf("cannot unmarshal block during export: %s", err)
return fmt.Errorf("cannot unmarshal block during export: %w", err)
}
xb := exportBlockPool.Get().(*exportBlock)
xb.mn = mn

View File

@@ -29,7 +29,12 @@ See https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries
]
},
"stats":{
"seriesFetched": "{%d qs.SeriesFetched %}"
{% code
// seriesFetched is string instead of int because of historical reasons.
// It cannot be converted to int without breaking backwards compatibility at vmalert :(
%}
"seriesFetched": "{%dl qs.SeriesFetched %}",
"executionTimeMsec": {%dl qs.ExecutionTimeMsec %}
}
{% code
qt.Printf("generate /api/v1/query_range response for series=%d, points=%d", seriesCount, pointsCount)

View File

@@ -60,85 +60,95 @@ func StreamQueryRangeResponse(qw422016 *qt422016.Writer, rs []netstorage.Result,
//line app/vmselect/prometheus/query_range_response.qtpl:28
}
//line app/vmselect/prometheus/query_range_response.qtpl:28
qw422016.N().S(`]},"stats":{"seriesFetched": "`)
//line app/vmselect/prometheus/query_range_response.qtpl:32
qw422016.N().D(qs.SeriesFetched)
//line app/vmselect/prometheus/query_range_response.qtpl:32
qw422016.N().S(`"}`)
qw422016.N().S(`]},"stats":{`)
//line app/vmselect/prometheus/query_range_response.qtpl:33
// seriesFetched is string instead of int because of historical reasons.
// It cannot be converted to int without breaking backwards compatibility at vmalert :(
//line app/vmselect/prometheus/query_range_response.qtpl:35
qw422016.N().S(`"seriesFetched": "`)
//line app/vmselect/prometheus/query_range_response.qtpl:36
qw422016.N().DL(qs.SeriesFetched)
//line app/vmselect/prometheus/query_range_response.qtpl:36
qw422016.N().S(`","executionTimeMsec":`)
//line app/vmselect/prometheus/query_range_response.qtpl:37
qw422016.N().DL(qs.ExecutionTimeMsec)
//line app/vmselect/prometheus/query_range_response.qtpl:37
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_range_response.qtpl:40
qt.Printf("generate /api/v1/query_range response for series=%d, points=%d", seriesCount, pointsCount)
qtDone()
//line app/vmselect/prometheus/query_range_response.qtpl:38
//line app/vmselect/prometheus/query_range_response.qtpl:43
streamdumpQueryTrace(qw422016, qt)
//line app/vmselect/prometheus/query_range_response.qtpl:38
//line app/vmselect/prometheus/query_range_response.qtpl:43
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
}
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
func WriteQueryRangeResponse(qq422016 qtio422016.Writer, rs []netstorage.Result, qt *querytracer.Tracer, qtDone func(), qs *promql.QueryStats) {
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
StreamQueryRangeResponse(qw422016, rs, qt, qtDone, qs)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
}
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
func QueryRangeResponse(rs []netstorage.Result, qt *querytracer.Tracer, qtDone func(), qs *promql.QueryStats) string {
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
WriteQueryRangeResponse(qb422016, rs, qt, qtDone, qs)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
return qs422016
//line app/vmselect/prometheus/query_range_response.qtpl:40
//line app/vmselect/prometheus/query_range_response.qtpl:45
}
//line app/vmselect/prometheus/query_range_response.qtpl:42
//line app/vmselect/prometheus/query_range_response.qtpl:47
func streamqueryRangeLine(qw422016 *qt422016.Writer, r *netstorage.Result) {
//line app/vmselect/prometheus/query_range_response.qtpl:42
//line app/vmselect/prometheus/query_range_response.qtpl:47
qw422016.N().S(`{"metric":`)
//line app/vmselect/prometheus/query_range_response.qtpl:44
//line app/vmselect/prometheus/query_range_response.qtpl:49
streammetricNameObject(qw422016, &r.MetricName)
//line app/vmselect/prometheus/query_range_response.qtpl:44
//line app/vmselect/prometheus/query_range_response.qtpl:49
qw422016.N().S(`,"values":`)
//line app/vmselect/prometheus/query_range_response.qtpl:45
//line app/vmselect/prometheus/query_range_response.qtpl:50
streamvaluesWithTimestamps(qw422016, r.Values, r.Timestamps)
//line app/vmselect/prometheus/query_range_response.qtpl:45
//line app/vmselect/prometheus/query_range_response.qtpl:50
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
}
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
func writequeryRangeLine(qq422016 qtio422016.Writer, r *netstorage.Result) {
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
streamqueryRangeLine(qw422016, r)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
}
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
func queryRangeLine(r *netstorage.Result) string {
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
writequeryRangeLine(qb422016, r)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
return qs422016
//line app/vmselect/prometheus/query_range_response.qtpl:47
//line app/vmselect/prometheus/query_range_response.qtpl:52
}

View File

@@ -31,7 +31,12 @@ See https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries
]
},
"stats":{
"seriesFetched": "{%d qs.SeriesFetched %}"
{% code
// seriesFetched is string instead of int because of historical reasons.
// It cannot be converted to int without breaking backwards compatibility at vmalert :(
%}
"seriesFetched": "{%dl qs.SeriesFetched %}",
"executionTimeMsec": {%dl qs.ExecutionTimeMsec %}
}
{% code
qt.Printf("generate /api/v1/query response for series=%d", seriesCount)

View File

@@ -70,44 +70,54 @@ func StreamQueryResponse(qw422016 *qt422016.Writer, rs []netstorage.Result, qt *
//line app/vmselect/prometheus/query_response.qtpl:30
}
//line app/vmselect/prometheus/query_response.qtpl:30
qw422016.N().S(`]},"stats":{"seriesFetched": "`)
//line app/vmselect/prometheus/query_response.qtpl:34
qw422016.N().D(qs.SeriesFetched)
//line app/vmselect/prometheus/query_response.qtpl:34
qw422016.N().S(`"}`)
qw422016.N().S(`]},"stats":{`)
//line app/vmselect/prometheus/query_response.qtpl:35
// seriesFetched is string instead of int because of historical reasons.
// It cannot be converted to int without breaking backwards compatibility at vmalert :(
//line app/vmselect/prometheus/query_response.qtpl:37
qw422016.N().S(`"seriesFetched": "`)
//line app/vmselect/prometheus/query_response.qtpl:38
qw422016.N().DL(qs.SeriesFetched)
//line app/vmselect/prometheus/query_response.qtpl:38
qw422016.N().S(`","executionTimeMsec":`)
//line app/vmselect/prometheus/query_response.qtpl:39
qw422016.N().DL(qs.ExecutionTimeMsec)
//line app/vmselect/prometheus/query_response.qtpl:39
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_response.qtpl:42
qt.Printf("generate /api/v1/query response for series=%d", seriesCount)
qtDone()
//line app/vmselect/prometheus/query_response.qtpl:40
//line app/vmselect/prometheus/query_response.qtpl:45
streamdumpQueryTrace(qw422016, qt)
//line app/vmselect/prometheus/query_response.qtpl:40
//line app/vmselect/prometheus/query_response.qtpl:45
qw422016.N().S(`}`)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
}
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
func WriteQueryResponse(qq422016 qtio422016.Writer, rs []netstorage.Result, qt *querytracer.Tracer, qtDone func(), qs *promql.QueryStats) {
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
StreamQueryResponse(qw422016, rs, qt, qtDone, qs)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
qt422016.ReleaseWriter(qw422016)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
}
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
func QueryResponse(rs []netstorage.Result, qt *querytracer.Tracer, qtDone func(), qs *promql.QueryStats) string {
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
WriteQueryResponse(qb422016, rs, qt, qtDone, qs)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
qs422016 := string(qb422016.B)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
return qs422016
//line app/vmselect/prometheus/query_response.qtpl:42
//line app/vmselect/prometheus/query_response.qtpl:47
}

View File

@@ -38,6 +38,7 @@ var aggrFuncs = map[string]aggrFunc{
"median": aggrFuncMedian,
"min": newAggrFunc(aggrFuncMin),
"mode": newAggrFunc(aggrFuncMode),
"outliers_iqr": aggrFuncOutliersIQR,
"outliers_mad": aggrFuncOutliersMAD,
"outliersk": aggrFuncOutliersK,
"quantile": aggrFuncQuantile,
@@ -944,6 +945,58 @@ func aggrFuncMAD(tss []*timeseries) []*timeseries {
return tss[:1]
}
func aggrFuncOutliersIQR(afa *aggrFuncArg) ([]*timeseries, error) {
args := afa.args
if err := expectTransformArgsNum(args, 1); err != nil {
return nil, err
}
afe := func(tss []*timeseries, modifier *metricsql.ModifierExpr) []*timeseries {
// Calculate lower and upper bounds for interquartile range per each point across tss
// according to Outliers section at https://en.wikipedia.org/wiki/Interquartile_range
lower, upper := getPerPointIQRBounds(tss)
// Leave only time series with outliers above upper bound or below lower bound
tssDst := tss[:0]
for _, ts := range tss {
values := ts.Values
for i, v := range values {
if v > upper[i] || v < lower[i] {
tssDst = append(tssDst, ts)
break
}
}
}
return tssDst
}
return aggrFuncExt(afe, args[0], &afa.ae.Modifier, afa.ae.Limit, true)
}
func getPerPointIQRBounds(tss []*timeseries) ([]float64, []float64) {
if len(tss) == 0 {
return nil, nil
}
pointsLen := len(tss[0].Values)
values := make([]float64, 0, len(tss))
var qs []float64
lower := make([]float64, pointsLen)
upper := make([]float64, pointsLen)
for i := 0; i < pointsLen; i++ {
values = values[:0]
for _, ts := range tss {
v := ts.Values[i]
if !math.IsNaN(v) {
values = append(values, v)
}
}
qs := quantiles(qs[:0], iqrPhis, values)
iqr := 1.5 * (qs[1] - qs[0])
lower[i] = qs[0] - iqr
upper[i] = qs[1] + iqr
}
return lower, upper
}
var iqrPhis = []float64{0.25, 0.75}
func aggrFuncOutliersMAD(afa *aggrFuncArg) ([]*timeseries, error) {
args := afa.args
if err := expectTransformArgsNum(args, 2); err != nil {

View File

@@ -79,6 +79,13 @@ type incrementalAggrFuncContext struct {
callbacks *incrementalAggrFuncCallbacks
}
func (iafc *incrementalAggrFuncContext) resetState() {
byWorkerID := iafc.byWorkerID
for i := range byWorkerID {
byWorkerID[i].m = make(map[string]*incrementalAggrContext, len(byWorkerID[i].m))
}
}
func newIncrementalAggrFuncContext(ae *metricsql.AggrFuncExpr, callbacks *incrementalAggrFuncCallbacks) *incrementalAggrFuncContext {
return &incrementalAggrFuncContext{
ae: ae,
@@ -154,6 +161,8 @@ func (iafc *incrementalAggrFuncContext) finalizeTimeseries() []*timeseries {
finalizeAggrFunc(iac)
tss = append(tss, iac.ts)
}
// reset iafc state, so it could be re-used
iafc.resetState()
return tss
}

View File

@@ -9,6 +9,7 @@ import (
"strings"
"sync"
"sync/atomic"
"time"
"unsafe"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
@@ -16,11 +17,13 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/stringsutil"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/metricsql"
)
@@ -33,11 +36,14 @@ var (
"Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated "+
"as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests . "+
"See also -search.logQueryMemoryUsage")
logQueryMemoryUsage = flagutil.NewBytes("search.logQueryMemoryUsage", 0, "Log queries, which require more memory than specified by this flag. "+
logQueryMemoryUsage = flagutil.NewBytes("search.logQueryMemoryUsage", 0, "Log query and increment vm_memory_intensive_queries_total metric each time "+
"the query requires more memory than specified by this flag. "+
"This may help detecting and optimizing heavy queries. Query logging is disabled by default. "+
"See also -search.logSlowQueryDuration and -search.maxMemoryPerQuery")
noStaleMarkers = flag.Bool("search.noStaleMarkers", false, "Set this flag to true if the database doesn't contain Prometheus stale markers, "+
"so there is no need in spending additional CPU time on its handling. Staleness markers may exist only in data obtained from Prometheus scrape targets")
minWindowForInstantRollupOptimization = flagutil.NewDuration("search.minWindowForInstantRollupOptimization", "3h", "Enable cache-based optimization for repeated queries "+
"to /api/v1/query (aka instant queries), which contain rollup functions with lookbehind window exceeding the given value")
)
// The minimum number of points per timeseries for enabling time rounding.
@@ -60,7 +66,7 @@ func ValidateMaxPointsPerSeries(start, end, step int64, maxPoints int) error {
// AdjustStartEnd adjusts start and end values, so response caching may be enabled.
//
// See EvalConfig.mayCache for details.
// See EvalConfig.mayCache() for details.
func AdjustStartEnd(start, end, step int64) (int64, int64) {
if *disableCache {
// Do not adjust start and end values when cache is disabled.
@@ -134,8 +140,7 @@ type EvalConfig struct {
// QueryStats contains various stats for the currently executed query.
//
// The caller must initialize the QueryStats if it needs the stats.
// Otherwise the stats isn't collected.
// The caller must initialize QueryStats, otherwise it isn't collected.
QueryStats *QueryStats
timestamps []int64
@@ -165,13 +170,24 @@ func copyEvalConfig(src *EvalConfig) *EvalConfig {
// QueryStats contains various stats for the query.
type QueryStats struct {
// SeriesFetched contains the number of series fetched from storage during the query evaluation.
SeriesFetched int
SeriesFetched int64
// ExecutionTimeMsec contains the number of milliseconds the query took to execute.
ExecutionTimeMsec int64
}
func (qs *QueryStats) addSeriesFetched(n int) {
if qs != nil {
qs.SeriesFetched += n
if qs == nil {
return
}
atomic.AddInt64(&qs.SeriesFetched, int64(n))
}
func (qs *QueryStats) addExecutionTimeMsec(startTime time.Time) {
if qs == nil {
return
}
d := time.Since(startTime).Milliseconds()
atomic.AddInt64(&qs.ExecutionTimeMsec, d)
}
func (ec *EvalConfig) validate() {
@@ -190,6 +206,11 @@ func (ec *EvalConfig) mayCache() bool {
if !ec.MayCache {
return false
}
if ec.Start == ec.End {
// There is no need in aligning start and end to step for instant query
// in order to cache its results.
return true
}
if ec.Start%ec.Step != 0 {
return false
}
@@ -239,7 +260,7 @@ func getTimestamps(start, end, step int64, maxPointsPerSeries int) []int64 {
func evalExpr(qt *querytracer.Tracer, ec *EvalConfig, e metricsql.Expr) ([]*timeseries, error) {
if qt.Enabled() {
query := string(e.AppendString(nil))
query = bytesutil.LimitStringLen(query, 300)
query = stringsutil.LimitStringLen(query, 300)
mayCache := ec.mayCache()
qt = qt.NewChild("eval: query=%s, timeRange=%s, step=%d, mayCache=%v", query, ec.timeRangeString(), ec.Step, mayCache)
}
@@ -1038,46 +1059,595 @@ func removeNanValues(dstValues []float64, dstTimestamps []int64, values []float6
return dstValues, dstTimestamps
}
// evalInstantRollup evaluates instant rollup where ec.Start == ec.End.
func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string, rf rollupFunc,
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, window int64) ([]*timeseries, error) {
if ec.Start != ec.End {
logger.Panicf("BUG: evalInstantRollup cannot be called on non-empty time range; got %s", ec.timeRangeString())
}
timestamp := ec.Start
if qt.Enabled() {
qt = qt.NewChild("instant rollup %s; time=%s, window=%d", expr.AppendString(nil), storage.TimestampToHumanReadableFormat(timestamp), window)
defer qt.Done()
}
evalAt := func(qt *querytracer.Tracer, timestamp, window int64) ([]*timeseries, error) {
ecCopy := copyEvalConfig(ec)
ecCopy.Start = timestamp
ecCopy.End = timestamp
pointsPerSeries := int64(1)
return evalRollupFuncNoCache(qt, ecCopy, funcName, rf, expr, me, iafc, window, pointsPerSeries)
}
tooBigOffset := func(offset int64) bool {
maxOffset := window / 2
if maxOffset > 1800*1000 {
maxOffset = 1800 * 1000
}
return offset >= maxOffset
}
deleteCachedSeries := func(qt *querytracer.Tracer) {
rollupResultCacheV.DeleteInstantValues(qt, expr, window, ec.Step, ec.EnforcedTagFilterss)
}
getCachedSeries := func(qt *querytracer.Tracer) ([]*timeseries, int64, error) {
again:
offset := int64(0)
tssCached := rollupResultCacheV.GetInstantValues(qt, expr, window, ec.Step, ec.EnforcedTagFilterss)
ec.QueryStats.addSeriesFetched(len(tssCached))
if len(tssCached) == 0 {
// Cache miss. Re-populate the missing data.
start := int64(fasttime.UnixTimestamp()*1000) - cacheTimestampOffset.Milliseconds()
offset = timestamp - start
if offset < 0 {
start = timestamp
offset = 0
}
if tooBigOffset(offset) {
qt.Printf("cannot apply instant rollup optimization because the -search.cacheTimestampOffset=%s is too big "+
"for the requested time=%s and window=%d", cacheTimestampOffset, storage.TimestampToHumanReadableFormat(timestamp), window)
tss, err := evalAt(qt, timestamp, window)
return tss, 0, err
}
qt.Printf("calculating the rollup at time=%s, because it is missing in the cache", storage.TimestampToHumanReadableFormat(start))
tss, err := evalAt(qt, start, window)
if err != nil {
return nil, 0, err
}
if hasDuplicateSeries(tss) {
qt.Printf("cannot apply instant rollup optimization because the result contains duplicate series")
tss, err := evalAt(qt, timestamp, window)
return tss, 0, err
}
rollupResultCacheV.PutInstantValues(qt, expr, window, ec.Step, ec.EnforcedTagFilterss, tss)
return tss, offset, nil
}
// Cache hit. Verify whether it is OK to use the cached data.
offset = timestamp - tssCached[0].Timestamps[0]
if offset < 0 {
qt.Printf("do not apply instant rollup optimization because the cached values have bigger timestamp=%s than the requested one=%s",
storage.TimestampToHumanReadableFormat(tssCached[0].Timestamps[0]), storage.TimestampToHumanReadableFormat(timestamp))
// Delete the outdated cached values, so the cache could be re-populated with newer values.
deleteCachedSeries(qt)
goto again
}
if tooBigOffset(offset) {
qt.Printf("do not apply instant rollup optimization because the offset=%d between the requested timestamp "+
"and the cached values is too big comparing to window=%d", offset, window)
// Delete the outdated cached values, so the cache could be re-populated with newer values.
deleteCachedSeries(qt)
goto again
}
return tssCached, offset, nil
}
if !ec.mayCache() {
qt.Printf("do not apply instant rollup optimization because of disabled cache")
return evalAt(qt, timestamp, window)
}
if window < minWindowForInstantRollupOptimization.Milliseconds() {
qt.Printf("do not apply instant rollup optimization because of too small window=%d; must be equal or bigger than %d",
window, minWindowForInstantRollupOptimization.Milliseconds())
return evalAt(qt, timestamp, window)
}
switch funcName {
case "avg_over_time":
if iafc != nil {
qt.Printf("do not apply instant rollup optimization for incremental aggregate %s()", iafc.ae.Name)
return evalAt(qt, timestamp, window)
}
qt.Printf("optimized calculation for instant rollup avg_over_time(m[d]) as (sum_over_time(m[d]) / count_over_time(m[d]))")
fe := expr.(*metricsql.FuncExpr)
feSum := *fe
feSum.Name = "sum_over_time"
feCount := *fe
feCount.Name = "count_over_time"
be := &metricsql.BinaryOpExpr{
Op: "/",
KeepMetricNames: fe.KeepMetricNames,
Left: &feSum,
Right: &feCount,
}
return evalExpr(qt, ec, be)
case "rate":
if iafc != nil {
if strings.ToLower(iafc.ae.Name) != "sum" {
qt.Printf("do not apply instant rollup optimization for incremental aggregate %s()", iafc.ae.Name)
return evalAt(qt, timestamp, window)
}
qt.Printf("optimized calculation for sum(rate(m[d])) as (sum(increase(m[d])) / d)")
afe := expr.(*metricsql.AggrFuncExpr)
fe := afe.Args[0].(*metricsql.FuncExpr)
feIncrease := *fe
feIncrease.Name = "increase"
re := fe.Args[0].(*metricsql.RollupExpr)
d := re.Window.Duration(ec.Step)
if d == 0 {
d = ec.Step
}
afeIncrease := *afe
afeIncrease.Args = []metricsql.Expr{&feIncrease}
be := &metricsql.BinaryOpExpr{
Op: "/",
KeepMetricNames: true,
Left: &afeIncrease,
Right: &metricsql.NumberExpr{
N: float64(d) / 1000,
},
}
return evalExpr(qt, ec, be)
}
qt.Printf("optimized calculation for instant rollup rate(m[d]) as (increase(m[d]) / d)")
fe := expr.(*metricsql.FuncExpr)
feIncrease := *fe
feIncrease.Name = "increase"
re := fe.Args[0].(*metricsql.RollupExpr)
d := re.Window.Duration(ec.Step)
if d == 0 {
d = ec.Step
}
be := &metricsql.BinaryOpExpr{
Op: "/",
KeepMetricNames: fe.KeepMetricNames,
Left: &feIncrease,
Right: &metricsql.NumberExpr{
N: float64(d) / 1000,
},
}
return evalExpr(qt, ec, be)
case "max_over_time":
if iafc != nil {
if strings.ToLower(iafc.ae.Name) != "max" {
qt.Printf("do not apply instant rollup optimization for non-max incremental aggregate %s()", iafc.ae.Name)
return evalAt(qt, timestamp, window)
}
}
// Calculate
//
// max_over_time(m[window] @ timestamp)
//
// as the maximum of
//
// - max_over_time(m[window] @ (timestamp-offset))
// - max_over_time(m[offset] @ timestamp)
//
// if max_over_time(m[offset] @ (timestamp-window)) < max_over_time(m[window] @ (timestamp-offset))
// otherwise do not apply the optimization
//
// where
//
// - max_over_time(m[window] @ (timestamp-offset)) is obtained from cache
// - max_over_time(m[offset] @ timestamp) and max_over_time(m[offset] @ (timestamp-window)) are calculated from the storage
// These rollups are calculated faster than max_over_time(m[window]) because offset is smaller than window.
qtChild := qt.NewChild("optimized calculation for instant rollup %s at time=%s with lookbehind window=%d",
expr.AppendString(nil), storage.TimestampToHumanReadableFormat(timestamp), window)
defer qtChild.Done()
tssCached, offset, err := getCachedSeries(qtChild)
if err != nil {
return nil, err
}
if offset == 0 {
return tssCached, nil
}
// Calculate max_over_time(m[offset] @ timestamp)
tssStart, err := evalAt(qtChild, timestamp, offset)
if err != nil {
return nil, err
}
if hasDuplicateSeries(tssStart) {
qtChild.Printf("cannot apply instant rollup optimization, since tssStart contains duplicate series")
return evalAt(qtChild, timestamp, window)
}
// Calculate max_over_time(m[offset] @ (timestamp - window))
tssEnd, err := evalAt(qtChild, timestamp-window, offset)
if err != nil {
return nil, err
}
if hasDuplicateSeries(tssEnd) {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains duplicate series")
return evalAt(qtChild, timestamp, window)
}
// Calculate the result
tss, ok := getMaxInstantValues(qtChild, tssCached, tssStart, tssEnd)
if !ok {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains bigger values than tssCached")
deleteCachedSeries(qtChild)
return evalAt(qt, timestamp, window)
}
return tss, nil
case "min_over_time":
if iafc != nil {
if strings.ToLower(iafc.ae.Name) != "min" {
qt.Printf("do not apply instant rollup optimization for non-min incremental aggregate %s()", iafc.ae.Name)
return evalAt(qt, timestamp, window)
}
}
// Calculate
//
// min_over_time(m[window] @ timestamp)
//
// as the minimum of
//
// - min_over_time(m[window] @ (timestamp-offset))
// - min_over_time(m[offset] @ timestamp)
//
// if min_over_time(m[offset] @ (timestamp-window)) > min_over_time(m[window] @ (timestamp-offset))
// otherwise do not apply the optimization
//
// where
//
// - min_over_time(m[window] @ (timestamp-offset)) is obtained from cache
// - min_over_time(m[offset] @ timestamp) and min_over_time(m[offset] @ (timestamp-window)) are calculated from the storage
// These rollups are calculated faster than min_over_time(m[window]) because offset is smaller than window.
qtChild := qt.NewChild("optimized calculation for instant rollup %s at time=%s with lookbehind window=%d",
expr.AppendString(nil), storage.TimestampToHumanReadableFormat(timestamp), window)
defer qtChild.Done()
tssCached, offset, err := getCachedSeries(qtChild)
if err != nil {
return nil, err
}
if offset == 0 {
return tssCached, nil
}
// Calculate min_over_time(m[offset] @ timestamp)
tssStart, err := evalAt(qtChild, timestamp, offset)
if err != nil {
return nil, err
}
if hasDuplicateSeries(tssStart) {
qtChild.Printf("cannot apply instant rollup optimization, since tssStart contains duplicate series")
return evalAt(qtChild, timestamp, window)
}
// Calculate min_over_time(m[offset] @ (timestamp - window))
tssEnd, err := evalAt(qtChild, timestamp-window, offset)
if err != nil {
return nil, err
}
if hasDuplicateSeries(tssEnd) {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains duplicate series")
return evalAt(qtChild, timestamp, window)
}
// Calculate the result
tss, ok := getMinInstantValues(qtChild, tssCached, tssStart, tssEnd)
if !ok {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains smaller values than tssCached")
deleteCachedSeries(qtChild)
return evalAt(qt, timestamp, window)
}
return tss, nil
case
"count_eq_over_time",
"count_gt_over_time",
"count_le_over_time",
"count_ne_over_time",
"count_over_time",
"increase",
"increase_pure",
"sum_over_time":
if iafc != nil && strings.ToLower(iafc.ae.Name) != "sum" {
qt.Printf("do not apply instant rollup optimization for non-sum incremental aggregate %s()", iafc.ae.Name)
return evalAt(qt, timestamp, window)
}
// Calculate
//
// rf(m[window] @ timestamp)
//
// as
//
// rf(m[window] @ (timestamp-offset)) + rf(m[offset] @ timestamp) - rf(m[offset] @ (timestamp-window))
//
// where
//
// - rf is count_over_time, sum_over_time or increase
// - rf(m[window] @ (timestamp-offset)) is obtained from cache
// - rf(m[offset] @ timestamp) and rf(m[offset] @ (timestamp-window)) are calculated from the storage
// These rollups are calculated faster than rf(m[window]) because offset is smaller than window.
qtChild := qt.NewChild("optimized calculation for instant rollup %s at time=%s with lookbehind window=%d",
expr.AppendString(nil), storage.TimestampToHumanReadableFormat(timestamp), window)
defer qtChild.Done()
tssCached, offset, err := getCachedSeries(qtChild)
if err != nil {
return nil, err
}
if offset == 0 {
return tssCached, nil
}
// Calculate rf(m[offset] @ timestamp)
tssStart, err := evalAt(qtChild, timestamp, offset)
if err != nil {
return nil, err
}
if hasDuplicateSeries(tssStart) {
qtChild.Printf("cannot apply instant rollup optimization, since tssStart contains duplicate series")
return evalAt(qtChild, timestamp, window)
}
// Calculate rf(m[offset] @ (timestamp - window))
tssEnd, err := evalAt(qtChild, timestamp-window, offset)
if err != nil {
return nil, err
}
if hasDuplicateSeries(tssEnd) {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains duplicate series")
return evalAt(qtChild, timestamp, window)
}
// Calculate the result
tss := getSumInstantValues(qtChild, tssCached, tssStart, tssEnd)
return tss, nil
default:
qt.Printf("instant rollup optimization isn't implemented for %s()", funcName)
return evalAt(qt, timestamp, window)
}
}
func hasDuplicateSeries(tss []*timeseries) bool {
if len(tss) <= 1 {
return false
}
m := make(map[string]struct{}, len(tss))
bb := bbPool.Get()
defer bbPool.Put(bb)
for _, ts := range tss {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if _, ok := m[string(bb.B)]; ok {
return true
}
m[string(bb.B)] = struct{}{}
}
return false
}
func getMinInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries) ([]*timeseries, bool) {
qt = qt.NewChild("calculate the minimum for instant values across series; cached=%d, start=%d, end=%d", len(tssCached), len(tssStart), len(tssEnd))
defer qt.Done()
getMin := func(a, b float64) float64 {
if a < b {
return a
}
return b
}
tss, ok := getMinMaxInstantValues(tssCached, tssStart, tssEnd, getMin)
qt.Printf("resulting series=%d; ok=%v", len(tss), ok)
return tss, ok
}
func getMaxInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries) ([]*timeseries, bool) {
qt = qt.NewChild("calculate the maximum for instant values across series; cached=%d, start=%d, end=%d", len(tssCached), len(tssStart), len(tssEnd))
defer qt.Done()
getMax := func(a, b float64) float64 {
if a > b {
return a
}
return b
}
tss, ok := getMinMaxInstantValues(tssCached, tssStart, tssEnd, getMax)
qt.Printf("resulting series=%d", len(tss))
return tss, ok
}
func getMinMaxInstantValues(tssCached, tssStart, tssEnd []*timeseries, f func(a, b float64) float64) ([]*timeseries, bool) {
assertInstantValues(tssCached)
assertInstantValues(tssStart)
assertInstantValues(tssEnd)
bb := bbPool.Get()
defer bbPool.Put(bb)
m := make(map[string]*timeseries, len(tssCached))
for _, ts := range tssCached {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if _, ok := m[string(bb.B)]; ok {
logger.Panicf("BUG: duplicate series found: %s", &ts.MetricName)
}
m[string(bb.B)] = ts
}
mStart := make(map[string]*timeseries, len(tssStart))
for _, ts := range tssStart {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if _, ok := mStart[string(bb.B)]; ok {
logger.Panicf("BUG: duplicate series found: %s", &ts.MetricName)
}
mStart[string(bb.B)] = ts
tsCached := m[string(bb.B)]
if tsCached != nil && !math.IsNaN(tsCached.Values[0]) {
if !math.IsNaN(ts.Values[0]) {
tsCached.Values[0] = f(ts.Values[0], tsCached.Values[0])
}
} else {
m[string(bb.B)] = ts
}
}
for _, ts := range tssEnd {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
tsCached := m[string(bb.B)]
if tsCached != nil && !math.IsNaN(tsCached.Values[0]) && !math.IsNaN(ts.Values[0]) {
if ts.Values[0] == f(ts.Values[0], tsCached.Values[0]) {
tsStart := mStart[string(bb.B)]
if tsStart == nil || math.IsNaN(tsStart.Values[0]) || tsStart.Values[0] != f(ts.Values[0], tsStart.Values[0]) {
return nil, false
}
}
}
}
rvs := make([]*timeseries, 0, len(m))
for _, ts := range m {
rvs = append(rvs, ts)
}
return rvs, true
}
// getSumInstantValues calculates tssCached + tssStart - tssEnd
func getSumInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries) []*timeseries {
qt = qt.NewChild("calculate the sum for instant values across series; cached=%d, start=%d, end=%d", len(tssCached), len(tssStart), len(tssEnd))
defer qt.Done()
assertInstantValues(tssCached)
assertInstantValues(tssStart)
assertInstantValues(tssEnd)
m := make(map[string]*timeseries, len(tssCached))
bb := bbPool.Get()
defer bbPool.Put(bb)
for _, ts := range tssCached {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if _, ok := m[string(bb.B)]; ok {
logger.Panicf("BUG: duplicate series found: %s", &ts.MetricName)
}
m[string(bb.B)] = ts
}
for _, ts := range tssStart {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
tsCached := m[string(bb.B)]
if tsCached != nil && !math.IsNaN(tsCached.Values[0]) {
if !math.IsNaN(ts.Values[0]) {
tsCached.Values[0] += ts.Values[0]
}
} else {
m[string(bb.B)] = ts
}
}
for _, ts := range tssEnd {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
tsCached := m[string(bb.B)]
if tsCached != nil && !math.IsNaN(tsCached.Values[0]) {
if !math.IsNaN(ts.Values[0]) {
tsCached.Values[0] -= ts.Values[0]
}
}
}
rvs := make([]*timeseries, 0, len(m))
for _, ts := range m {
rvs = append(rvs, ts)
}
qt.Printf("resulting series=%d", len(rvs))
return rvs
}
func assertInstantValues(tss []*timeseries) {
for _, ts := range tss {
if len(ts.Values) != 1 {
logger.Panicf("BUG: instant series must contain a single value; got %d values", len(ts.Values))
}
if len(ts.Timestamps) != 1 {
logger.Panicf("BUG: instant series must contain a single timestamp; got %d timestamps", len(ts.Timestamps))
}
}
}
var (
rollupResultCacheFullHits = metrics.NewCounter(`vm_rollup_result_cache_full_hits_total`)
rollupResultCachePartialHits = metrics.NewCounter(`vm_rollup_result_cache_partial_hits_total`)
rollupResultCacheMiss = metrics.NewCounter(`vm_rollup_result_cache_miss_total`)
memoryIntensiveQueries = metrics.NewCounter(`vm_memory_intensive_queries_total`)
)
func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcName string, rf rollupFunc,
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, windowExpr *metricsql.DurationExpr) ([]*timeseries, error) {
var rollupMemorySize int64
window, err := windowExpr.NonNegativeDuration(ec.Step)
if err != nil {
return nil, fmt.Errorf("cannot parse lookbehind window in square brackets at %s: %w", expr.AppendString(nil), err)
}
if qt.Enabled() {
qt = qt.NewChild("rollup %s(): timeRange=%s, step=%d, window=%d", funcName, ec.timeRangeString(), ec.Step, window)
defer func() {
qt.Donef("neededMemoryBytes=%d", rollupMemorySize)
}()
}
if me.IsEmpty() {
return evalNumber(ec, nan), nil
}
if ec.Start == ec.End {
rvs, err := evalInstantRollup(qt, ec, funcName, rf, expr, me, iafc, window)
if err != nil {
err = &UserReadableError{
Err: err,
}
return nil, err
}
return rvs, nil
}
// Search for partial results in cache.
tssCached, start := rollupResultCacheV.Get(qt, ec, expr, window)
tssCached, start := rollupResultCacheV.GetSeries(qt, ec, expr, window)
ec.QueryStats.addSeriesFetched(len(tssCached))
if start > ec.End {
// The result is fully cached.
qt.Printf("the result is fully cached")
rollupResultCacheFullHits.Inc()
return tssCached, nil
}
if start > ec.Start {
qt.Printf("partial cache hit")
rollupResultCachePartialHits.Inc()
} else {
qt.Printf("cache miss")
rollupResultCacheMiss.Inc()
}
// Obtain rollup configs before fetching data from db,
// so type errors can be caught earlier.
sharedTimestamps := getTimestamps(start, ec.End, ec.Step, ec.MaxPointsPerSeries)
preFunc, rcs, err := getRollupConfigs(funcName, rf, expr, start, ec.End, ec.Step, ec.MaxPointsPerSeries, window, ec.LookbackDelta, sharedTimestamps)
ecCopy := copyEvalConfig(ec)
ecCopy.Start = start
pointsPerSeries := 1 + (ec.End-ec.Start)/ec.Step
tss, err := evalRollupFuncNoCache(qt, ecCopy, funcName, rf, expr, me, iafc, window, pointsPerSeries)
if err != nil {
err = &UserReadableError{
Err: err,
}
return nil, err
}
rvs, err := mergeTimeseries(qt, tssCached, tss, start, ec)
if err != nil {
return nil, fmt.Errorf("cannot merge series: %w", err)
}
if tss != nil {
rollupResultCacheV.PutSeries(qt, ec, expr, window, tss)
}
return rvs, nil
}
// evalRollupFuncNoCache calculates the given rf with the given lookbehind window.
//
// pointsPerSeries is used only for estimating the needed memory for query processing
func evalRollupFuncNoCache(qt *querytracer.Tracer, ec *EvalConfig, funcName string, rf rollupFunc,
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, window, pointsPerSeries int64) ([]*timeseries, error) {
if qt.Enabled() {
qt = qt.NewChild("rollup %s: timeRange=%s, step=%d, window=%d", expr.AppendString(nil), ec.timeRangeString(), ec.Step, window)
defer qt.Done()
}
if window < 0 {
return nil, nil
}
// Obtain rollup configs before fetching data from db, so type errors could be caught earlier.
sharedTimestamps := getTimestamps(ec.Start, ec.End, ec.Step, ec.MaxPointsPerSeries)
preFunc, rcs, err := getRollupConfigs(funcName, rf, expr, ec.Start, ec.End, ec.Step, ec.MaxPointsPerSeries, window, ec.LookbackDelta, sharedTimestamps)
if err != nil {
return nil, err
}
@@ -1085,7 +1655,10 @@ func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcNa
// Fetch the remaining part of the result.
tfss := searchutils.ToTagFilterss(me.LabelFilterss)
tfss = searchutils.JoinTagFilterss(tfss, ec.EnforcedTagFilterss)
minTimestamp := start - maxSilenceInterval
minTimestamp := ec.Start
if needSilenceIntervalForRollupFunc(funcName) {
minTimestamp -= maxSilenceInterval
}
if window > ec.Step {
minTimestamp -= window
} else {
@@ -1097,21 +1670,16 @@ func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcNa
sq := storage.NewSearchQuery(minTimestamp, ec.End, tfss, ec.MaxSeries)
rss, err := netstorage.ProcessSearchQuery(qt, sq, ec.Deadline)
if err != nil {
return nil, &UserReadableError{
Err: err,
}
return nil, err
}
rssLen := rss.Len()
if rssLen == 0 {
rss.Cancel()
tss := mergeTimeseries(tssCached, nil, start, ec)
return tss, nil
return nil, nil
}
ec.QueryStats.addSeriesFetched(rssLen)
// Verify timeseries fit available memory after the rollup.
// Take into account points from tssCached.
pointsPerTimeseries := 1 + (ec.End-ec.Start)/ec.Step
// Verify timeseries fit available memory during rollup calculations.
timeseriesLen := rssLen
if iafc != nil {
// Incremental aggregates require holding only GOMAXPROCS timeseries in memory.
@@ -1131,9 +1699,10 @@ func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcNa
timeseriesLen = rssLen
}
}
rollupPoints := mulNoOverflow(pointsPerTimeseries, int64(timeseriesLen*len(rcs)))
rollupMemorySize = sumNoOverflow(mulNoOverflow(int64(rssLen), 1000), mulNoOverflow(rollupPoints, 16))
rollupPoints := mulNoOverflow(pointsPerSeries, int64(timeseriesLen*len(rcs)))
rollupMemorySize := sumNoOverflow(mulNoOverflow(int64(rssLen), 1000), mulNoOverflow(rollupPoints, 16))
if maxMemory := int64(logQueryMemoryUsage.N); maxMemory > 0 && rollupMemorySize > maxMemory {
memoryIntensiveQueries.Inc()
requestURI := ec.GetRequestURI()
logger.Warnf("remoteAddr=%s, requestURI=%s: the %s requires %d bytes of memory for processing; "+
"logging this query, since it exceeds the -search.logQueryMemoryUsage=%d; "+
@@ -1142,44 +1711,33 @@ func evalRollupFuncWithMetricExpr(qt *querytracer.Tracer, ec *EvalConfig, funcNa
}
if maxMemory := int64(maxMemoryPerQuery.N); maxMemory > 0 && rollupMemorySize > maxMemory {
rss.Cancel()
return nil, &UserReadableError{
Err: fmt.Errorf("not enough memory for processing %s, which returns %d data points across %d time series with %d points in each time series "+
"according to -search.maxMemoryPerQuery=%d; requested memory: %d bytes; "+
"possible solutions are: reducing the number of matching time series; increasing `step` query arg (step=%gs); "+
"increasing -search.maxMemoryPerQuery",
expr.AppendString(nil), rollupPoints, timeseriesLen*len(rcs), pointsPerTimeseries, maxMemory, rollupMemorySize, float64(ec.Step)/1e3),
}
err := fmt.Errorf("not enough memory for processing %s, which returns %d data points across %d time series with %d points in each time series "+
"according to -search.maxMemoryPerQuery=%d; requested memory: %d bytes; "+
"possible solutions are: reducing the number of matching time series; increasing `step` query arg (step=%gs); "+
"increasing -search.maxMemoryPerQuery",
expr.AppendString(nil), rollupPoints, timeseriesLen*len(rcs), pointsPerSeries, maxMemory, rollupMemorySize, float64(ec.Step)/1e3)
return nil, err
}
rml := getRollupMemoryLimiter()
if !rml.Get(uint64(rollupMemorySize)) {
rss.Cancel()
return nil, &UserReadableError{
Err: fmt.Errorf("not enough memory for processing %s, which returns %d data points across %d time series with %d points in each time series; "+
"total available memory for concurrent requests: %d bytes; "+
"requested memory: %d bytes; "+
"possible solutions are: reducing the number of matching time series; increasing `step` query arg (step=%gs); "+
"switching to node with more RAM; increasing -memory.allowedPercent",
expr.AppendString(nil), rollupPoints, timeseriesLen*len(rcs), pointsPerTimeseries, rml.MaxSize, uint64(rollupMemorySize), float64(ec.Step)/1e3),
}
err := fmt.Errorf("not enough memory for processing %s, which returns %d data points across %d time series with %d points in each time series; "+
"total available memory for concurrent requests: %d bytes; requested memory: %d bytes; "+
"possible solutions are: reducing the number of matching time series; increasing `step` query arg (step=%gs); "+
"switching to node with more RAM; increasing -memory.allowedPercent",
expr.AppendString(nil), rollupPoints, timeseriesLen*len(rcs), pointsPerSeries, rml.MaxSize, uint64(rollupMemorySize), float64(ec.Step)/1e3)
return nil, err
}
defer rml.Put(uint64(rollupMemorySize))
qt.Printf("the rollup evaluation needs an estimated %d bytes of RAM for %d series and %d points per series (summary %d points)",
rollupMemorySize, timeseriesLen, pointsPerSeries, rollupPoints)
// Evaluate rollup
keepMetricNames := getKeepMetricNames(expr)
var tss []*timeseries
if iafc != nil {
tss, err = evalRollupWithIncrementalAggregate(qt, funcName, keepMetricNames, iafc, rss, rcs, preFunc, sharedTimestamps)
} else {
tss, err = evalRollupNoIncrementalAggregate(qt, funcName, keepMetricNames, rss, rcs, preFunc, sharedTimestamps)
return evalRollupWithIncrementalAggregate(qt, funcName, keepMetricNames, iafc, rss, rcs, preFunc, sharedTimestamps)
}
if err != nil {
return nil, &UserReadableError{
Err: err,
}
}
tss = mergeTimeseries(tssCached, tss, start, ec)
rollupResultCacheV.Put(qt, ec, expr, window, tss)
return tss, nil
return evalRollupNoIncrementalAggregate(qt, funcName, keepMetricNames, rss, rcs, preFunc, sharedTimestamps)
}
var (
@@ -1194,6 +1752,53 @@ func getRollupMemoryLimiter() *memoryLimiter {
return &rollupMemoryLimiter
}
func needSilenceIntervalForRollupFunc(funcName string) bool {
// All rollup the functions, which do not rely on the previous sample
// before the lookbehind window (aka prevValue), do not need silence interval.
switch strings.ToLower(funcName) {
case
"absent_over_time",
"avg_over_time",
"count_eq_over_time",
"count_gt_over_time",
"count_le_over_time",
"count_ne_over_time",
"count_over_time",
"default_rollup",
"first_over_time",
"histogram_over_time",
"hoeffding_bound_lower",
"hoeffding_bound_upper",
"last_over_time",
"mad_over_time",
"max_over_time",
"median_over_time",
"min_over_time",
"predict_linear",
"present_over_time",
"quantile_over_time",
"quantiles_over_time",
"range_over_time",
"share_gt_over_time",
"share_le_over_time",
"share_eq_over_time",
"stale_samples_over_time",
"stddev_over_time",
"stdvar_over_time",
"sum_over_time",
"tfirst_over_time",
"timestamp",
"timestamp_with_name",
"tlast_over_time",
"tmax_over_time",
"tmin_over_time",
"zscore_over_time":
return false
default:
return true
}
}
func evalRollupWithIncrementalAggregate(qt *querytracer.Tracer, funcName string, keepMetricNames bool,
iafc *incrementalAggrFuncContext, rss *netstorage.Results, rcs []*rollupConfig,
preFunc func(values []float64, timestamps []int64), sharedTimestamps []int64) ([]*timeseries, error) {

View File

@@ -48,7 +48,10 @@ func (ure *UserReadableError) Error() string {
func Exec(qt *querytracer.Tracer, ec *EvalConfig, q string, isFirstPointOnly bool) ([]netstorage.Result, error) {
if querystats.Enabled() {
startTime := time.Now()
defer querystats.RegisterQuery(q, ec.End-ec.Start, startTime)
defer func() {
querystats.RegisterQuery(q, ec.End-ec.Start, startTime)
ec.QueryStats.addExecutionTimeMsec(startTime)
}()
}
ec.validate()

View File

@@ -6910,6 +6910,30 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`outliers_iqr()`, func(t *testing.T) {
t.Parallel()
q := `sort(outliers_iqr((
alias(time(), "m1"),
alias(time()*1.5, "m2"),
alias(time()*10, "m3"),
alias(time()*1.2, "m4"),
alias(time()*0.1, "m5"),
)))`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{100, 120, 140, 160, 180, 200},
Timestamps: timestampsExpected,
}
r1.MetricName.MetricGroup = []byte("m5")
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{10000, 12000, 14000, 16000, 18000, 20000},
Timestamps: timestampsExpected,
}
r2.MetricName.MetricGroup = []byte("m3")
resultExpected := []netstorage.Result{r1, r2}
f(q, resultExpected)
})
t.Run(`outliers_mad(1)`, func(t *testing.T) {
t.Parallel()
q := `outliers_mad(1, (

View File

@@ -62,6 +62,7 @@ var rollupFuncs = map[string]newRollupFunc{
"median_over_time": newRollupFuncOneArg(rollupMedian),
"min_over_time": newRollupFuncOneArg(rollupMin),
"mode_over_time": newRollupFuncOneArg(rollupModeOverTime),
"outlier_iqr_over_time": newRollupFuncOneArg(rollupOutlierIQR),
"predict_linear": newRollupPredictLinear,
"present_over_time": newRollupFuncOneArg(rollupPresent),
"quantile_over_time": newRollupQuantile,
@@ -122,6 +123,7 @@ var rollupAggrFuncs = map[string]rollupFunc{
"increases_over_time": rollupIncreases,
"integrate": rollupIntegrate,
"irate": rollupIderiv,
"iqr_over_time": rollupOutlierIQR,
"lag": rollupLag,
"last_over_time": rollupLast,
"lifetime": rollupLifetime,
@@ -225,6 +227,7 @@ var rollupFuncsKeepMetricName = map[string]bool{
"hoeffding_bound_lower": true,
"hoeffding_bound_upper": true,
"holt_winters": true,
"iqr_over_time": true,
"last_over_time": true,
"max_over_time": true,
"median_over_time": true,
@@ -952,11 +955,11 @@ func newRollupHoltWinters(args []interface{}) (rollupFunc, error) {
return rfa.prevValue
}
sf := sfs[rfa.idx]
if sf <= 0 || sf >= 1 {
if sf < 0 || sf > 1 {
return nan
}
tf := tfs[rfa.idx]
if tf <= 0 || tf >= 1 {
if tf < 0 || tf > 1 {
return nan
}
@@ -1287,6 +1290,29 @@ func newRollupQuantiles(args []interface{}) (rollupFunc, error) {
return rf, nil
}
func rollupOutlierIQR(rfa *rollupFuncArg) float64 {
// There is no need in handling NaNs here, since they must be cleaned up
// before calling rollup funcs.
// See Outliers section at https://en.wikipedia.org/wiki/Interquartile_range
values := rfa.values
if len(values) < 2 {
return nan
}
qs := getFloat64s()
qs.A = quantiles(qs.A[:0], iqrPhis, values)
q25 := qs.A[0]
q75 := qs.A[1]
iqr := 1.5 * (q75 - q25)
putFloat64s(qs)
v := values[len(values)-1]
if v > q75+iqr || v < q25-iqr {
return v
}
return nan
}
func newRollupQuantile(args []interface{}) (rollupFunc, error) {
if err := expectRollupArgsNum(args, 2); err != nil {
return nil, err

View File

@@ -17,6 +17,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/stringsutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/workingsetcache"
"github.com/VictoriaMetrics/fastcache"
"github.com/VictoriaMetrics/metrics"
@@ -202,11 +203,74 @@ func ResetRollupResultCache() {
logger.Infof("rollupResult cache has been cleared")
}
func (rrc *rollupResultCache) Get(qt *querytracer.Tracer, ec *EvalConfig, expr metricsql.Expr, window int64) (tss []*timeseries, newStart int64) {
func (rrc *rollupResultCache) GetInstantValues(qt *querytracer.Tracer, expr metricsql.Expr, window, step int64, etfss [][]storage.TagFilter) []*timeseries {
if qt.Enabled() {
query := string(expr.AppendString(nil))
query = bytesutil.LimitStringLen(query, 300)
qt = qt.NewChild("rollup cache get: query=%s, timeRange=%s, step=%d, window=%d", query, ec.timeRangeString(), ec.Step, window)
query = stringsutil.LimitStringLen(query, 300)
qt = qt.NewChild("rollup cache get instant values: query=%s, window=%d, step=%d", query, window, step)
defer qt.Done()
}
// Obtain instant values from the cache
bb := bbPool.Get()
defer bbPool.Put(bb)
bb.B = marshalRollupResultCacheKeyForInstantValues(bb.B[:0], expr, window, step, etfss)
tss, ok := rrc.getSeriesFromCache(qt, bb.B)
if !ok || len(tss) == 0 {
return nil
}
assertInstantValues(tss)
qt.Printf("found %d series for time=%s", len(tss), storage.TimestampToHumanReadableFormat(tss[0].Timestamps[0]))
return tss
}
func (rrc *rollupResultCache) PutInstantValues(qt *querytracer.Tracer, expr metricsql.Expr, window, step int64, etfss [][]storage.TagFilter, tss []*timeseries) {
if qt.Enabled() {
query := string(expr.AppendString(nil))
query = stringsutil.LimitStringLen(query, 300)
startStr := ""
if len(tss) > 0 {
startStr = storage.TimestampToHumanReadableFormat(tss[0].Timestamps[0])
}
qt = qt.NewChild("rollup cache put instant values: query=%s, window=%d, step=%d, series=%d, time=%s", query, window, step, len(tss), startStr)
defer qt.Done()
}
if len(tss) == 0 {
qt.Printf("do not cache empty series list")
return
}
assertInstantValues(tss)
bb := bbPool.Get()
defer bbPool.Put(bb)
bb.B = marshalRollupResultCacheKeyForInstantValues(bb.B[:0], expr, window, step, etfss)
_ = rrc.putSeriesToCache(qt, bb.B, step, tss)
}
func (rrc *rollupResultCache) DeleteInstantValues(qt *querytracer.Tracer, expr metricsql.Expr, window, step int64, etfss [][]storage.TagFilter) {
bb := bbPool.Get()
defer bbPool.Put(bb)
bb.B = marshalRollupResultCacheKeyForInstantValues(bb.B[:0], expr, window, step, etfss)
if !rrc.putSeriesToCache(qt, bb.B, step, nil) {
logger.Panicf("BUG: cannot store zero series to cache")
}
if qt.Enabled() {
query := string(expr.AppendString(nil))
query = stringsutil.LimitStringLen(query, 300)
qt.Printf("rollup result cache delete instant values: query=%s, window=%d, step=%d", query, window, step)
}
}
func (rrc *rollupResultCache) GetSeries(qt *querytracer.Tracer, ec *EvalConfig, expr metricsql.Expr, window int64) (tss []*timeseries, newStart int64) {
if qt.Enabled() {
query := string(expr.AppendString(nil))
query = stringsutil.LimitStringLen(query, 300)
qt = qt.NewChild("rollup cache get series: query=%s, timeRange=%s, window=%d, step=%d", query, ec.timeRangeString(), window, ec.Step)
defer qt.Done()
}
if !ec.mayCache() {
@@ -218,7 +282,7 @@ func (rrc *rollupResultCache) Get(qt *querytracer.Tracer, ec *EvalConfig, expr m
bb := bbPool.Get()
defer bbPool.Put(bb)
bb.B = marshalRollupResultCacheKey(bb.B[:0], expr, window, ec.Step, ec.EnforcedTagFilterss)
bb.B = marshalRollupResultCacheKeyForSeries(bb.B[:0], expr, window, ec.Step, ec.EnforcedTagFilterss)
metainfoBuf := rrc.c.Get(nil, bb.B)
if len(metainfoBuf) == 0 {
qt.Printf("nothing found")
@@ -233,31 +297,17 @@ func (rrc *rollupResultCache) Get(qt *querytracer.Tracer, ec *EvalConfig, expr m
qt.Printf("nothing found on the timeRange")
return nil, ec.Start
}
var ok bool
bb.B = key.Marshal(bb.B[:0])
compressedResultBuf := resultBufPool.Get()
defer resultBufPool.Put(compressedResultBuf)
compressedResultBuf.B = rrc.c.GetBig(compressedResultBuf.B[:0], bb.B)
if len(compressedResultBuf.B) == 0 {
tss, ok = rrc.getSeriesFromCache(qt, bb.B)
if !ok {
mi.RemoveKey(key)
metainfoBuf = mi.Marshal(metainfoBuf[:0])
bb.B = marshalRollupResultCacheKey(bb.B[:0], expr, window, ec.Step, ec.EnforcedTagFilterss)
bb.B = marshalRollupResultCacheKeyForSeries(bb.B[:0], expr, window, ec.Step, ec.EnforcedTagFilterss)
rrc.c.Set(bb.B, metainfoBuf)
qt.Printf("missing cache entry")
return nil, ec.Start
}
// Decompress into newly allocated byte slice, since tss returned from unmarshalTimeseriesFast
// refers to the byte slice, so it cannot be returned to the resultBufPool.
qt.Printf("load compressed entry from cache with size %d bytes", len(compressedResultBuf.B))
resultBuf, err := encoding.DecompressZSTD(nil, compressedResultBuf.B)
if err != nil {
logger.Panicf("BUG: cannot decompress resultBuf from rollupResultCache: %s; it looks like it was improperly saved", err)
}
qt.Printf("unpack the entry into %d bytes", len(resultBuf))
tss, err = unmarshalTimeseriesFast(resultBuf)
if err != nil {
logger.Panicf("BUG: cannot unmarshal timeseries from rollupResultCache: %s; it looks like it was improperly saved", err)
}
qt.Printf("unmarshal %d series", len(tss))
// Extract values for the matching timestamps
timestamps := tss[0].Timestamps
@@ -266,12 +316,10 @@ func (rrc *rollupResultCache) Get(qt *querytracer.Tracer, ec *EvalConfig, expr m
i++
}
if i == len(timestamps) {
// no matches.
qt.Printf("no datapoints found in the cached series on the given timeRange")
return nil, ec.Start
}
if timestamps[i] != ec.Start {
// The cached range doesn't cover the requested range.
qt.Printf("cached series don't cover the given timeRange")
return nil, ec.Start
}
@@ -282,7 +330,7 @@ func (rrc *rollupResultCache) Get(qt *querytracer.Tracer, ec *EvalConfig, expr m
}
j++
if j <= i {
// no matches.
qt.Printf("no matching samples for the given timeRange")
return nil, ec.Start
}
@@ -303,17 +351,21 @@ func (rrc *rollupResultCache) Get(qt *querytracer.Tracer, ec *EvalConfig, expr m
var resultBufPool bytesutil.ByteBufferPool
func (rrc *rollupResultCache) Put(qt *querytracer.Tracer, ec *EvalConfig, expr metricsql.Expr, window int64, tss []*timeseries) {
func (rrc *rollupResultCache) PutSeries(qt *querytracer.Tracer, ec *EvalConfig, expr metricsql.Expr, window int64, tss []*timeseries) {
if qt.Enabled() {
query := string(expr.AppendString(nil))
query = bytesutil.LimitStringLen(query, 300)
qt = qt.NewChild("rollup cache put: query=%s, timeRange=%s, step=%d, window=%d, series=%d", query, ec.timeRangeString(), ec.Step, window, len(tss))
query = stringsutil.LimitStringLen(query, 300)
qt = qt.NewChild("rollup cache put series: query=%s, timeRange=%s, step=%d, window=%d, series=%d", query, ec.timeRangeString(), ec.Step, window, len(tss))
defer qt.Done()
}
if len(tss) == 0 || !ec.mayCache() {
if !ec.mayCache() {
qt.Printf("do not store series to cache, since it is disabled in the current context")
return
}
if len(tss) == 0 {
qt.Printf("do not store empty series list")
return
}
// Remove values up to currentTime - step - cacheTimestampOffset,
// since these values may be added later.
@@ -346,7 +398,7 @@ func (rrc *rollupResultCache) Put(qt *querytracer.Tracer, ec *EvalConfig, expr m
metainfoBuf := bbPool.Get()
defer bbPool.Put(metainfoBuf)
metainfoKey.B = marshalRollupResultCacheKey(metainfoKey.B[:0], expr, window, ec.Step, ec.EnforcedTagFilterss)
metainfoKey.B = marshalRollupResultCacheKeyForSeries(metainfoKey.B[:0], expr, window, ec.Step, ec.EnforcedTagFilterss)
metainfoBuf.B = rrc.c.Get(metainfoBuf.B[:0], metainfoKey.B)
var mi rollupResultCacheMetainfo
if len(metainfoBuf.B) > 0 {
@@ -365,31 +417,17 @@ func (rrc *rollupResultCache) Put(qt *querytracer.Tracer, ec *EvalConfig, expr m
return
}
maxMarshaledSize := getRollupResultCacheSize() / 4
resultBuf := resultBufPool.Get()
defer resultBufPool.Put(resultBuf)
resultBuf.B = marshalTimeseriesFast(resultBuf.B[:0], tss, maxMarshaledSize, ec.Step)
if len(resultBuf.B) == 0 {
tooBigRollupResults.Inc()
qt.Printf("cannot store series in the cache, since they would occupy more than %d bytes", maxMarshaledSize)
return
}
if qt.Enabled() {
startString := storage.TimestampToHumanReadableFormat(start)
endString := storage.TimestampToHumanReadableFormat(end)
qt.Printf("marshal %d series on a timeRange=[%s..%s] into %d bytes", len(tss), startString, endString, len(resultBuf.B))
}
compressedResultBuf := resultBufPool.Get()
defer resultBufPool.Put(compressedResultBuf)
compressedResultBuf.B = encoding.CompressZSTDLevel(compressedResultBuf.B[:0], resultBuf.B, 1)
qt.Printf("compress %d bytes into %d bytes", len(resultBuf.B), len(compressedResultBuf.B))
var key rollupResultCacheKey
key.prefix = rollupResultCacheKeyPrefix
key.suffix = atomic.AddUint64(&rollupResultCacheKeySuffix, 1)
rollupResultKey := key.Marshal(nil)
rrc.c.SetBig(rollupResultKey, compressedResultBuf.B)
qt.Printf("store %d bytes in the cache", len(compressedResultBuf.B))
bb := bbPool.Get()
bb.B = key.Marshal(bb.B[:0])
ok := rrc.putSeriesToCache(qt, bb.B, ec.Step, tss)
bbPool.Put(bb)
if !ok {
return
}
mi.AddKey(key, timestamps[0], timestamps[len(timestamps)-1])
metainfoBuf.B = mi.Marshal(metainfoBuf.B[:0])
@@ -401,6 +439,52 @@ var (
rollupResultCacheKeySuffix = uint64(time.Now().UnixNano())
)
func (rrc *rollupResultCache) getSeriesFromCache(qt *querytracer.Tracer, key []byte) ([]*timeseries, bool) {
compressedResultBuf := resultBufPool.Get()
compressedResultBuf.B = rrc.c.GetBig(compressedResultBuf.B[:0], key)
if len(compressedResultBuf.B) == 0 {
qt.Printf("nothing found in the cache")
resultBufPool.Put(compressedResultBuf)
return nil, false
}
qt.Printf("load compressed entry from cache with size %d bytes", len(compressedResultBuf.B))
// Decompress into newly allocated byte slice, since tss returned from unmarshalTimeseriesFast
// refers to the byte slice, so it cannot be re-used.
resultBuf, err := encoding.DecompressZSTD(nil, compressedResultBuf.B)
if err != nil {
logger.Panicf("BUG: cannot decompress resultBuf from rollupResultCache: %s; it looks like it was improperly saved", err)
}
resultBufPool.Put(compressedResultBuf)
qt.Printf("unpack the entry into %d bytes", len(resultBuf))
tss, err := unmarshalTimeseriesFast(resultBuf)
if err != nil {
logger.Panicf("BUG: cannot unmarshal timeseries from rollupResultCache: %s; it looks like it was improperly saved", err)
}
qt.Printf("unmarshal %d series", len(tss))
return tss, true
}
func (rrc *rollupResultCache) putSeriesToCache(qt *querytracer.Tracer, key []byte, step int64, tss []*timeseries) bool {
maxMarshaledSize := getRollupResultCacheSize() / 4
resultBuf := resultBufPool.Get()
defer resultBufPool.Put(resultBuf)
resultBuf.B = marshalTimeseriesFast(resultBuf.B[:0], tss, maxMarshaledSize, step)
if len(resultBuf.B) == 0 {
tooBigRollupResults.Inc()
qt.Printf("cannot store %d series in the cache, since they would occupy more than %d bytes", len(tss), maxMarshaledSize)
return false
}
qt.Printf("marshal %d series into %d bytes", len(tss), len(resultBuf.B))
compressedResultBuf := resultBufPool.Get()
defer resultBufPool.Put(compressedResultBuf)
compressedResultBuf.B = encoding.CompressZSTDLevel(compressedResultBuf.B[:0], resultBuf.B, 1)
qt.Printf("compress %d bytes into %d bytes", len(resultBuf.B), len(compressedResultBuf.B))
rrc.c.SetBig(key, compressedResultBuf.B)
qt.Printf("store %d bytes in the cache", len(compressedResultBuf.B))
return true
}
func newRollupResultCacheKeyPrefix() uint64 {
var buf [8]byte
if _, err := rand.Read(buf[:]); err != nil {
@@ -439,14 +523,36 @@ func mustSaveRollupResultCacheKeyPrefix(path string) {
var tooBigRollupResults = metrics.NewCounter("vm_too_big_rollup_results_total")
// Increment this value every time the format of the cache changes.
const rollupResultCacheVersion = 9
const rollupResultCacheVersion = 10
func marshalRollupResultCacheKey(dst []byte, expr metricsql.Expr, window, step int64, etfs [][]storage.TagFilter) []byte {
const (
rollupResultCacheTypeSeries = 0
rollupResultCacheTypeInstantValues = 1
)
func marshalRollupResultCacheKeyForSeries(dst []byte, expr metricsql.Expr, window, step int64, etfs [][]storage.TagFilter) []byte {
dst = append(dst, rollupResultCacheVersion)
dst = encoding.MarshalUint64(dst, rollupResultCacheKeyPrefix)
dst = append(dst, rollupResultCacheTypeSeries)
dst = encoding.MarshalInt64(dst, window)
dst = encoding.MarshalInt64(dst, step)
dst = marshalTagFiltersForRollupResultCacheKey(dst, etfs)
dst = expr.AppendString(dst)
return dst
}
func marshalRollupResultCacheKeyForInstantValues(dst []byte, expr metricsql.Expr, window, step int64, etfs [][]storage.TagFilter) []byte {
dst = append(dst, rollupResultCacheVersion)
dst = encoding.MarshalUint64(dst, rollupResultCacheKeyPrefix)
dst = append(dst, rollupResultCacheTypeInstantValues)
dst = encoding.MarshalInt64(dst, window)
dst = encoding.MarshalInt64(dst, step)
dst = marshalTagFiltersForRollupResultCacheKey(dst, etfs)
dst = expr.AppendString(dst)
return dst
}
func marshalTagFiltersForRollupResultCacheKey(dst []byte, etfs [][]storage.TagFilter) []byte {
for i, etf := range etfs {
for _, f := range etf {
dst = f.Marshal(dst)
@@ -461,12 +567,15 @@ func marshalRollupResultCacheKey(dst []byte, expr metricsql.Expr, window, step i
// mergeTimeseries concatenates b with a and returns the result.
//
// Preconditions:
// - a mustn't intersect with b.
// - a mustn't intersect with b by timestamps.
// - a timestamps must be smaller than b timestamps.
//
// Postconditions:
// - a and b cannot be used after returning from the call.
func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timeseries {
func mergeTimeseries(qt *querytracer.Tracer, a, b []*timeseries, bStart int64, ec *EvalConfig) ([]*timeseries, error) {
qt = qt.NewChild("merge series len(a)=%d, len(b)=%d", len(a), len(b))
defer qt.Done()
sharedTimestamps := ec.getSharedTimestamps()
if bStart == ec.Start {
// Nothing to merge - b covers all the time range.
@@ -478,7 +587,7 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
logger.Panicf("BUG: unexpected number of values in b; got %d; want %d", len(tsB.Values), len(tsB.Timestamps))
}
}
return b
return b, nil
}
m := make(map[string]*timeseries, len(a))
@@ -486,6 +595,9 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
defer bbPool.Put(bb)
for _, ts := range a {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if _, ok := m[string(bb.B)]; ok {
return nil, fmt.Errorf("duplicate series found: %s", &ts.MetricName)
}
m[string(bb.B)] = ts
}
@@ -512,7 +624,8 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
}
tmp.Values = append(tmp.Values, tsB.Values...)
if len(tmp.Values) != len(tmp.Timestamps) {
logger.Panicf("BUG: unexpected values after merging new values; got %d; want %d", len(tmp.Values), len(tmp.Timestamps))
logger.Panicf("BUG: unexpected values after merging new values; got %d; want %d; len(a.Values)=%d; len(b.Values)=%d",
len(tmp.Values), len(tmp.Timestamps), len(tsA.Values), len(tsB.Values))
}
rvs = append(rvs, &tmp)
}
@@ -536,7 +649,8 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
}
rvs = append(rvs, &tmp)
}
return rvs
qt.Printf("resulting series=%d", len(rvs))
return rvs, nil
}
type rollupResultCacheMetainfo struct {

View File

@@ -61,7 +61,7 @@ func TestRollupResultCache(t *testing.T) {
// Try obtaining an empty value.
t.Run("empty", func(t *testing.T) {
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != ec.Start {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, ec.Start)
}
@@ -79,8 +79,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 1400 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1400)
}
@@ -100,8 +100,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, ae, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, ae, window)
rollupResultCacheV.PutSeries(nil, ec, ae, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, ae, window)
if newStart != 1400 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1400)
}
@@ -123,8 +123,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{333, 0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 1000 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1000)
}
@@ -142,8 +142,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 1000 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1000)
}
@@ -161,8 +161,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 1000 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1000)
}
@@ -180,8 +180,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 1000 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1000)
}
@@ -199,8 +199,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2, 3, 4, 5, 6, 7},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 2200 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 2200)
}
@@ -222,8 +222,8 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{1, 2, 3, 4, 5, 6},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 2200 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 2200)
}
@@ -247,8 +247,8 @@ func TestRollupResultCache(t *testing.T) {
}
tss = append(tss, ts)
}
rollupResultCacheV.Put(nil, ec, fe, window, tss)
tssResult, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss)
tssResult, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 2200 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 2200)
}
@@ -276,10 +276,10 @@ func TestRollupResultCache(t *testing.T) {
Values: []float64{0, 1, 2},
},
}
rollupResultCacheV.Put(nil, ec, fe, window, tss1)
rollupResultCacheV.Put(nil, ec, fe, window, tss2)
rollupResultCacheV.Put(nil, ec, fe, window, tss3)
tss, newStart := rollupResultCacheV.Get(nil, ec, fe, window)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss1)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss2)
rollupResultCacheV.PutSeries(nil, ec, fe, window, tss3)
tss, newStart := rollupResultCacheV.GetSeries(nil, ec, fe, window)
if newStart != 1400 {
t.Fatalf("unexpected newStart; got %d; want %d", newStart, 1400)
}
@@ -311,7 +311,10 @@ func TestMergeTimeseries(t *testing.T) {
Values: []float64{1, 2, 3, 4, 5, 6},
},
}
tss := mergeTimeseries(a, b, 1000, ec)
tss, err := mergeTimeseries(nil, a, b, 1000, ec)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tssExpected := []*timeseries{
{
Timestamps: []int64{1000, 1200, 1400, 1600, 1800, 2000},
@@ -328,7 +331,10 @@ func TestMergeTimeseries(t *testing.T) {
Values: []float64{3, 4, 5, 6},
},
}
tss := mergeTimeseries(a, b, bStart, ec)
tss, err := mergeTimeseries(nil, a, b, bStart, ec)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tssExpected := []*timeseries{
{
Timestamps: []int64{1000, 1200, 1400, 1600, 1800, 2000},
@@ -345,7 +351,10 @@ func TestMergeTimeseries(t *testing.T) {
},
}
b := []*timeseries{}
tss := mergeTimeseries(a, b, bStart, ec)
tss, err := mergeTimeseries(nil, a, b, bStart, ec)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tssExpected := []*timeseries{
{
Timestamps: []int64{1000, 1200, 1400, 1600, 1800, 2000},
@@ -367,7 +376,10 @@ func TestMergeTimeseries(t *testing.T) {
Values: []float64{3, 4, 5, 6},
},
}
tss := mergeTimeseries(a, b, bStart, ec)
tss, err := mergeTimeseries(nil, a, b, bStart, ec)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tssExpected := []*timeseries{
{
Timestamps: []int64{1000, 1200, 1400, 1600, 1800, 2000},
@@ -391,7 +403,10 @@ func TestMergeTimeseries(t *testing.T) {
},
}
b[0].MetricName.MetricGroup = []byte("foo")
tss := mergeTimeseries(a, b, bStart, ec)
tss, err := mergeTimeseries(nil, a, b, bStart, ec)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tssExpected := []*timeseries{
{
MetricName: storage.MetricName{

View File

@@ -12,6 +12,35 @@ var (
testTimestamps = []int64{5, 15, 24, 36, 49, 60, 78, 80, 97, 115, 120, 130}
)
func TestRollupOutlierIQR(t *testing.T) {
f := func(values []float64, resultExpected float64) {
t.Helper()
rfa := &rollupFuncArg{
values: values,
timestamps: nil,
}
result := rollupOutlierIQR(rfa)
if math.IsNaN(result) {
if !math.IsNaN(resultExpected) {
t.Fatalf("unexpected value; got %v; want %v", result, resultExpected)
}
} else {
if math.IsNaN(resultExpected) {
t.Fatalf("unexpected value; got %v; want %v", result, resultExpected)
}
if result != resultExpected {
t.Fatalf("unexpected value; got %v; want %v", result, resultExpected)
}
}
}
f([]float64{1, 2, 3, 4, 5}, nan)
f([]float64{1, 2, 3, 4, 7}, nan)
f([]float64{1, 2, 3, 4, 8}, 8)
f([]float64{1, 2, 3, 4, -2}, nan)
f([]float64{1, 2, 3, 4, -3}, -3)
}
func TestRollupIderivDuplicateTimestamps(t *testing.T) {
rfa := &rollupFuncArg{
values: []float64{1, 2, 3, 4, 5},
@@ -186,6 +215,9 @@ func testRollupFunc(t *testing.T, funcName string, args []interface{}, vExpected
t.Fatalf("unexpected value; got %v; want %v", v, vExpected)
}
} else {
if math.IsNaN(v) {
t.Fatalf("unexpected value; got %v want %v", v, vExpected)
}
eps := math.Abs(v - vExpected)
if eps > 1e-14 {
t.Fatalf("unexpected value; got %v; want %v", v, vExpected)
@@ -438,12 +470,12 @@ func TestRollupHoltWinters(t *testing.T) {
}
f(-1, 0.5, nan)
f(0, 0.5, nan)
f(1, 0.5, nan)
f(0, 0.5, -856)
f(1, 0.5, 34)
f(2, 0.5, nan)
f(0.5, -1, nan)
f(0.5, 0, nan)
f(0.5, 1, nan)
f(0.5, 0, -54.1474609375)
f(0.5, 1, 25.25)
f(0.5, 2, nan)
f(0.5, 0.5, 34.97794532775879)
f(0.1, 0.5, -131.30529492371622)
@@ -514,6 +546,7 @@ func TestRollupNewRollupFuncSuccess(t *testing.T) {
f("increase", 398)
f("increase_prometheus", 275)
f("irate", 0)
f("outlier_iqr_over_time", nan)
f("rate", 2200)
f("resets", 5)
f("range_over_time", 111)

View File

@@ -79,7 +79,10 @@ var timeseriesPool sync.Pool
func marshalTimeseriesFast(dst []byte, tss []*timeseries, maxSize int, step int64) []byte {
if len(tss) == 0 {
logger.Panicf("BUG: tss cannot be empty")
// marshal zero timeseries and zero timestamps
dst = encoding.MarshalUint64(dst, 0)
dst = encoding.MarshalUint64(dst, 0)
return dst
}
// timestamps are stored only once for all the tss, since they must be identical

View File

@@ -1,5 +1,4 @@
//go:build go1.15
// +build go1.15
package promql

View File

@@ -1,13 +1,13 @@
{
"files": {
"main.css": "./static/css/main.d9ac05de.css",
"main.js": "./static/js/main.70434a4f.js",
"static/js/522.b5ae4365.chunk.js": "./static/js/522.b5ae4365.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.957b90ab4cb4852eec26.md",
"main.css": "./static/css/main.349e6522.css",
"main.js": "./static/js/main.c93073e5.js",
"static/js/522.da77e7b3.chunk.js": "./static/js/522.da77e7b3.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.8644fd7c964802dd34a9.md",
"index.html": "./index.html"
},
"entrypoints": [
"static/css/main.d9ac05de.css",
"static/js/main.70434a4f.js"
"static/css/main.349e6522.css",
"static/js/main.c93073e5.js"
]
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

View File

@@ -1 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.70434a4f.js"></script><link href="./static/css/main.d9ac05de.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.c93073e5.js"></script><link href="./static/css/main.349e6522.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -7,7 +7,7 @@
/*! regenerator-runtime -- Copyright (c) 2014-present, Facebook, Inc. -- license (MIT): https://github.com/facebook/regenerator/blob/main/LICENSE */
/**
* @remix-run/router v1.7.2
* @remix-run/router v1.10.0
*
* Copyright (c) Remix Software Inc.
*
@@ -18,7 +18,7 @@
*/
/**
* React Router DOM v6.14.2
* React Router DOM v6.17.0
*
* Copyright (c) Remix Software Inc.
*
@@ -29,7 +29,7 @@
*/
/**
* React Router v6.14.2
* React Router v6.17.0
*
* Copyright (c) Remix Software Inc.
*

View File

@@ -1,11 +1,11 @@
---
sort: 14
weight: 14
sort: 23
weight: 23
title: MetricsQL
menu:
docs:
parent: "victoriametrics"
weight: 14
parent: 'victoriametrics'
weight: 23
aliases:
- /ExtendedPromQL.html
- /MetricsQL.html
@@ -21,7 +21,8 @@ However, there are some [intentional differences](https://medium.com/@romanhavro
[Standalone MetricsQL package](https://godoc.org/github.com/VictoriaMetrics/metricsql) can be used for parsing MetricsQL in external apps.
If you are unfamiliar with PromQL, then it is suggested reading [this tutorial for beginners](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085).
If you are unfamiliar with PromQL, then it is suggested reading [this tutorial for beginners](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085)
and introduction into [basic querying via MetricsQL](https://docs.victoriametrics.com/keyConcepts.html#metricsql).
The following functionality is implemented differently in MetricsQL compared to PromQL. This improves user experience:
@@ -109,7 +110,7 @@ The list of MetricsQL features on top of PromQL:
* [histogram_quantile](#histogram_quantile) accepts optional third arg - `boundsLabel`.
In this case it returns `lower` and `upper` bounds for the estimated percentile.
See [this issue for details](https://github.com/prometheus/prometheus/issues/5706).
* `default` binary operator. `q1 default q2` fills gaps in `q1` with the corresponding values from `q2`.
* `default` binary operator. `q1 default q2` fills gaps in `q1` with the corresponding values from `q2`. See also [drop_empty_series](#drop_empty_series).
* `if` binary operator. `q1 if q2` removes values from `q1` for missing values from `q2`.
* `ifnot` binary operator. `q1 ifnot q2` removes values from `q1` for existing values from `q2`.
* `WITH` templates. This feature simplifies writing and managing complex queries.
@@ -531,7 +532,7 @@ See also [duration_over_time](#duration_over_time) and [lag](#lag).
`mad_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which calculates [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation)
over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
See also [mad](#mad) and [range_mad](#range_mad).
See also [mad](#mad), [range_mad](#range_mad) and [outlier_iqr_over_time](#outlier_iqr_over_time).
#### max_over_time
@@ -561,6 +562,18 @@ This function is supported by PromQL. See also [tmin_over_time](#tmin_over_time)
for raw samples on the given lookbehind window `d`. It is calculated individually per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering). It is expected that raw sample values are discrete.
#### outlier_iqr_over_time
`outlier_iqr_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the last sample on the given lookbehind window `d`
if its value is either smaller than the `q25-1.5*iqr` or bigger than `q75+1.5*iqr` where:
- `iqr` is an [Interquartile range](https://en.wikipedia.org/wiki/Interquartile_range) over raw samples on the lookbehind window `d`
- `q25` and `q75` are 25th and 75th [percentiles](https://en.wikipedia.org/wiki/Percentile) over raw samples on the lookbehind window `d`.
The `outlier_iqr_over_time()` is useful for detecting anomalies in gauge values based on the previous history of values.
For example, `outlier_iqr_over_time(memory_usage_bytes[1h])` triggers when `memory_usage_bytes` suddenly goes outside the usual value range for the last 24 hours.
See also [outliers_iqr](#outliers_iqr).
#### predict_linear
`predict_linear(series_selector[d], t)` is a [rollup function](#rollup-functions), which calculates the value `t` seconds in the future using
@@ -865,7 +878,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [zscore](#zscore) and [range_trim_zscore](#range_trim_zscore).
See also [zscore](#zscore), [range_trim_zscore](#range_trim_zscore) and [outlier_iqr_over_time](#outlier_iqr_over_time).
### Transform functions
@@ -1055,6 +1068,17 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
This function is supported by PromQL. See also [rad](#rad).
#### drop_empty_series
`drop_empty_series(q)` is a [transform function](#transform-functions), which drops empty series from `q`.
This function can be used when `default` operator should be applied only to non-empty series. For example,
`drop_empty_series(temperature < 30) default 42` returns series, which have at least a single sample smaller than 30 on the selected time range,
while filling gaps in the returned series with 42.
On the other hand `(temperature < 30) default 40` returns all the `temperature` series, even if they have no samples smaller than 30,
by replacing all the values bigger or equal to 30 with 40.
#### end
`end()` is a [transform function](#transform-functions), which returns the unix timestamp in seconds for the last point.
@@ -1591,7 +1615,7 @@ which maps `label` values from `src_*` to `dst*` for all the time series returne
which drops time series from `q` with `label` not matching the given `regexp`.
This function can be useful after [rollup](#rollup)-like functions, which may return multiple time series for every input series.
See also [label_mismatch](#label_mismatch).
See also [label_mismatch](#label_mismatch) and [labels_equal](#labels_equal).
#### label_mismatch
@@ -1599,7 +1623,7 @@ See also [label_mismatch](#label_mismatch).
which drops time series from `q` with `label` matching the given `regexp`.
This function can be useful after [rollup](#rollup)-like functions, which may return multiple time series for every input series.
See also [label_match](#label_match).
See also [label_match](#label_match) and [labels_equal](#labels_equal).
#### label_move
@@ -1642,23 +1666,30 @@ for the given `label` for every time series returned by `q`.
For example, if `label_value(foo, "bar")` is applied to `foo{bar="1.234"}`, then it will return a time series
`foo{bar="1.234"}` with `1.234` value. Function will return no data for non-numeric label values.
#### labels_equal
`labels_equal(q, "label1", "label2", ...)` is [label manipulation function](#label-manipulation-functions), which returns `q` series with identical values for the listed labels
"label1", "label2", etc.
See also [label_match](#label_match) and [label_mismatch](#label_mismatch).
#### sort_by_label
`sort_by_label(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels.
`sort_by_label(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels.
For example, `sort_by_label(foo, "bar")` would sort `foo` series by values of the label `bar` in these series.
See also [sort_by_label_desc](#sort_by_label_desc) and [sort_by_label_numeric](#sort_by_label_numeric).
#### sort_by_label_desc
`sort_by_label_desc(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order by the given set of labels.
`sort_by_label_desc(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order by the given set of labels.
For example, `sort_by_label(foo, "bar")` would sort `foo` series by values of the label `bar` in these series.
See also [sort_by_label](#sort_by_label) and [sort_by_label_numeric_desc](#sort_by_label_numeric_desc).
#### sort_by_label_numeric
`sort_by_label_numeric(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels
`sort_by_label_numeric(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in ascending order by the given set of labels
using [numeric sort](https://www.gnu.org/software/coreutils/manual/html_node/Version-sort-is-not-the-same-as-numeric-sort.html).
For example, if `foo` series have `bar` label with values `1`, `101`, `15` and `2`, then `sort_by_label_numeric(foo, "bar")` would return series
in the following order of `bar` label values: `1`, `2`, `15` and `101`.
@@ -1667,7 +1698,7 @@ See also [sort_by_label_numeric_desc](#sort_by_label_numeric_desc) and [sort_by_
#### sort_by_label_numeric_desc
`sort_by_label_numeric_desc(q, label1, ... labelN)` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order
`sort_by_label_numeric_desc(q, "label1", ... "labelN")` is [label manipulation function](#label-manipulation-functions), which sorts series in descending order
by the given set of labels using [numeric sort](https://www.gnu.org/software/coreutils/manual/html_node/Version-sort-is-not-the-same-as-numeric-sort.html).
For example, if `foo` series have `bar` label with values `1`, `101`, `15` and `2`, then `sort_by_label_numeric(foo, "bar")`
would return series in the following order of `bar` label values: `101`, `15`, `2` and `1`.
@@ -1839,20 +1870,33 @@ This function is supported by PromQL.
`mode(q) by (group_labels)` is [aggregate function](#aggregate-functions), which returns [mode](https://en.wikipedia.org/wiki/Mode_(statistics))
per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
#### outliers_iqr
`outliers_iqr(q)` is [aggregate function](#aggregate-functions), which returns time series from `q` with at least a single point
outside e.g. [Interquartile range outlier bounds](https://en.wikipedia.org/wiki/Interquartile_range) `[q25-1.5*iqr .. q75+1.5*iqr]`
comparing to other time series at the given point, where:
- `iqr` is an [Interquartile range](https://en.wikipedia.org/wiki/Interquartile_range) calculated independently per each point on the graph across `q` series.
- `q25` and `q75` are 25th and 75th [percentiles](https://en.wikipedia.org/wiki/Percentile) calculated independently per each point on the graph across `q` series.
The `outliers_iqr()` is useful for detecting anomalous series in the group of series. For example, `outliers_iqr(temperature) by (country)` returns
per-country series with anomalous outlier values comparing to the rest of per-country series.
See also [outliers_mad](#outliers_mad), [outliersk](#outliersk) and [outlier_iqr_over_time](#outlier_iqr_over_time).
#### outliers_mad
`outliers_mad(tolerance, q)` is [aggregate function](#aggregate-functions), which returns time series from `q` with at least
a single point outside [Median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) (aka MAD) multiplied by `tolerance`.
E.g. it returns time series with at least a single point below `median(q) - mad(q)` or a single point above `median(q) + mad(q)`.
See also [outliersk](#outliersk) and [mad](#mad).
See also [outliers_iqr](#outliers_iqr), [outliersk](#outliersk) and [mad](#mad).
#### outliersk
`outliersk(k, q)` is [aggregate function](#aggregate-functions), which returns up to `k` time series with the biggest standard deviation (aka outliers)
out of time series returned by `q`.
See also [outliers_mad](#outliers_mad).
See also [outliers_iqr](#outliers_iqr) and [outliers_mad](#outliers_mad).
#### quantile
@@ -1972,7 +2016,7 @@ See also [bottomk_min](#bottomk_min).
per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
This function is useful for detecting anomalies in the group of related time series.
See also [zscore_over_time](#zscore_over_time) and [range_trim_zscore](#range_trim_zscore).
See also [zscore_over_time](#zscore_over_time), [range_trim_zscore](#range_trim_zscore) and [outliers_iqr](#outliers_iqr).
## Subqueries

2
app/vmui/.gitignore vendored
View File

@@ -105,5 +105,3 @@ dist
# WebStorm etc
.idea/
MetricsQL.md

View File

@@ -1,4 +1,4 @@
FROM golang:1.21.3 as build-web-stage
FROM golang:1.21.4 as build-web-stage
COPY build /build
WORKDIR /build

File diff suppressed because it is too large Load Diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

View File

@@ -3,5 +3,5 @@ import { TimeParams } from "../types";
export const getQueryRangeUrl = (server: string, query: string, period: TimeParams, nocache: boolean, queryTracing: boolean): string =>
`${server}/api/v1/query_range?query=${encodeURIComponent(query)}&start=${period.start}&end=${period.end}&step=${period.step}${nocache ? "&nocache=1" : ""}${queryTracing ? "&trace=1" : ""}`;
export const getQueryUrl = (server: string, query: string, period: TimeParams, queryTracing: boolean): string =>
`${server}/api/v1/query?query=${encodeURIComponent(query)}&time=${period.end}${queryTracing ? "&trace=1" : ""}`;
export const getQueryUrl = (server: string, query: string, period: TimeParams, nocache: boolean, queryTracing: boolean): string =>
`${server}/api/v1/query?query=${encodeURIComponent(query)}&time=${period.end}${nocache ? "&nocache=1" : ""}${queryTracing ? "&trace=1" : ""}`;

View File

@@ -23,6 +23,7 @@ export interface TracingData {
export interface QueryStats {
seriesFetched?: string;
executionTimeMsec?: number;
resultLength?: number;
isPartial?: boolean;
}

File diff suppressed because it is too large Load Diff

View File

@@ -9,6 +9,9 @@ import { TuneIcon } from "../../Main/Icons";
import Button from "../../Main/Button/Button";
import classNames from "classnames";
import useBoolean from "../../../hooks/useBoolean";
import useEventListener from "../../../hooks/useEventListener";
import Tooltip from "../../Main/Tooltip/Tooltip";
import { AUTOCOMPLETE_KEY } from "../../Main/ShortcutKeys/constants/keyList";
const AdditionalSettingsControls: FC<{isMobile?: boolean}> = ({ isMobile }) => {
const { autocomplete } = useQueryState();
@@ -29,6 +32,16 @@ const AdditionalSettingsControls: FC<{isMobile?: boolean}> = ({ isMobile }) => {
queryDispatch({ type: "TOGGLE_AUTOCOMPLETE" });
};
const handleKeyDown = (e: KeyboardEvent) => {
const { key, ctrlKey, metaKey, shiftKey } = e;
if (key === "a" && shiftKey && (ctrlKey || metaKey)) {
e.preventDefault();
onChangeAutocomplete();
}
};
useEventListener("keydown", handleKeyDown);
return (
<div
className={classNames({
@@ -36,12 +49,14 @@ const AdditionalSettingsControls: FC<{isMobile?: boolean}> = ({ isMobile }) => {
"vm-additional-settings_mobile": isMobile
})}
>
<Switch
label={"Autocomplete"}
value={autocomplete}
onChange={onChangeAutocomplete}
fullWidth={isMobile}
/>
<Tooltip title={AUTOCOMPLETE_KEY}>
<Switch
label={"Autocomplete"}
value={autocomplete}
onChange={onChangeAutocomplete}
fullWidth={isMobile}
/>
</Tooltip>
<Switch
label={"Disable cache"}
value={nocache}

Some files were not shown because too many files have changed in this diff Show More