Compare commits

..

318 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
911bab4f6a docs/CHANGELOG.md: cut v1.89.0 2023-03-12 17:29:44 -07:00
Aliaksandr Valialkin
1428aa2c22 app/vmselect/vmui: make vmui-update after 00a0816ab1 2023-03-12 17:19:19 -07:00
Yury Molodov
00a0816ab1 vmui: predefined dashboards docs (#3895)
* fix: correct display predefined panels

* docs: update the documentation for predefined dashboards
2023-03-12 17:16:26 -07:00
Aliaksandr Valialkin
be68a6a1ee Makefile: update golangci-lint from v1.51.1 to v1.51.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.51.2
2023-03-12 17:08:19 -07:00
Aliaksandr Valialkin
468de76e9a app/vmselect: remove data race on updating EvalConfig.IsPartialResponse from concurrently running goroutines
This properly returns `is_partial: true` for partial responses.
2023-03-12 16:54:08 -07:00
Aliaksandr Valialkin
0af9e2b693 app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM)
Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927
2023-03-12 16:32:08 -07:00
Artem Navoiev
7257a2a97f fix typos on image
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-12 17:12:57 +01:00
Aliaksandr Valialkin
c28c25ed2e vendor: make vendor-update 2023-03-12 03:13:53 -07:00
Yury Molodov
01367faa39 vmui: remove send step param for instant queries (#3931)
* fix: remove step param for instant queries (#3896)

* vmui: remove send step param for instant queries

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-12 03:09:56 -07:00
Aliaksandr Valialkin
a52413ce0a docs/CHANGELOG.md: document 113a89904d 2023-03-12 01:58:18 -08:00
Aliaksandr Valialkin
b19de3fa12 docs/CHANGELOG.md: yet another typo fix 2023-03-12 01:06:40 -08:00
Aliaksandr Valialkin
2f1d24fccf docs/CHANGELOG.md: typo fix 2023-03-12 01:04:14 -08:00
Aliaksandr Valialkin
b5db69fe05 app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage
It has been appeared that this interning may lead to increased memory usage and increased CPU usage
when vmselect performs queries, which select big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
2023-03-12 00:52:35 -08:00
Aliaksandr Valialkin
babc9e9815 docs/CHANGELOG.md: document 927d9da270 2023-03-12 00:25:00 -08:00
Aliaksandr Valialkin
d95037a175 app/vmctl/README.md: remove trailing space from the line added at 4c3bc04efa 2023-03-12 00:11:51 -08:00
Aliaksandr Valialkin
e3488c6cbc docs/CHANGELOG.md: typo fixes 2023-03-12 00:09:26 -08:00
Aliaksandr Valialkin
48e32b325e docs/CHANGELOG.md: document c9f44daaee8f4282d9ed41e3ba799c7a33841313 2023-03-11 23:55:13 -08:00
Roman Khavronenko
856c2db144 vmalert: support concurrent reading from object storage (#532)
* vmalert: support concurrent reading from object storage

Config reading from GCS or S3 can be slow if object storage
contains a big number of files. Object storages are usually
fast for downloading and are slow for individual operations.
If there would be thousands of files to read, vmalert could
spend significant time for retrieving those because it is
done sequentially.

The change introduces ability to read configs from object
storage concurrently. By default, both GCS and S3 are now
read with 50 concurrent readers. This significantly reduces
the load time:
* loading 500 files with concurrency=1 takes 27s
* loading 500 files with concurrency=50 takes <1s

* vmalert: add note to Changelog

* vmalert: cleanup

* vmalert: use ticker properly

* app/vmalert: improve status reporting during config loading

* vmalert: support concurrent reading from object storage

Config reading from GCS or S3 can be slow if object storage
contains a big number of files. Object storages are usually
fast for downloading and are slow for individual operations.
If there would be thousands of files to read, vmalert could
spend significant time for retrieving those because it is
done sequentially.

The change introduces ability to read configs from object
storage concurrently. By default, both GCS and S3 are now
read with 50 concurrent readers. This significantly reduces
the load time:
* loading 500 files with concurrency=1 takes 27s
* loading 500 files with concurrency=50 takes <1s

* app/vmalert: make linter happy
2023-03-11 23:51:23 -08:00
Nikolay
927d9da270 lib/storage: correctly handle io.EOF error for pre-fetched metrics (#3946)
io.EOF shouldn't be returned from this function. It breaks all search
API logic and may result in empty query results.
2023-03-11 23:29:43 -08:00
Alexander Marshalov
c0c3dc02cf Stream aggregation doc improvements based on users feedback (#3934)
docs: stream aggregation doc improvements based on users feedback
2023-03-10 21:39:58 +01:00
Roman Khavronenko
3eebe52a06 Dashboards upd (#3942)
* dashboards/cluser: use `quantile` since `median` isn't supported by PromQL

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/*: add `restarts` annotation to show when there were restarts

The cluster's annotation query is aggregated `by job`,
while vmagent/vmalert are aggregated `by job, instance`.
This is because cluster dashboard can contains too many instances
and annotation could become too noisy.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/*: support instance filter in Version annotation

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-10 17:13:19 +01:00
Zakhar Bessarab
f7a7eb8f3e docs: add a note about cache reset for vmalert backfilling docs (#3940)
docs: add a note about cache reset for vmalert backfilling docs
2023-03-10 13:45:11 +01:00
Dmytro Kozlov
4c3bc04efa app/vmctl: update importing tips when migrating data with overlapping time range (#3941)
app/vmctl: update importing tips when migrating data with overlapping time range
2023-03-10 11:29:08 +01:00
Dmytro Kozlov
3c9058c168 app/vmctl: add support of basic auth and barer token (#3921)
app/vmctl: add support of basic auth and bearer token
2023-03-09 14:53:29 +01:00
Roman Khavronenko
d66bae212b app/vmalert: log number of configration files found for each specified -rule (#3936)
The change also introduces `List` method to `FS` interface.
The `List` method can be used for wildcard support in object storage FS.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-03-09 14:46:19 +01:00
Dmytro Kozlov
7f54c181bb app/vmctl: follow up after 09e3742a82 (#3937)
app/vmctl: follow up after 09e3742a82
2023-03-09 13:28:55 +01:00
Roman Khavronenko
3de7fc5c71 security: bump go version to 1.20.2 (#3935)
upgrade Go builder from Go1.20.1 to Go1.20.2
See the list of issues addressed in Go1.20.2 here (https://github.com/golang/go/issues?q=milestone%3AGo1.20.2+label%3ACherryPickApproved).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-09 13:20:54 +01:00
Gowtam Lal
09e3742a82 app/vmctl: Allow vmnative exports to skip HTTP keepalive. (#3909)
app/vmctl: support HTTP keepalive disabling for vm-native mode
2023-03-09 09:47:46 +01:00
Denys Holius
cbba6bd3db Adds snap badge to README.md (#3930)
docs: adds snap badge to README.md
2023-03-08 16:31:58 +01:00
Aliaksandr Valialkin
1b5dc9f91d all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:26:55 -08:00
Aliaksandr Valialkin
05709bdfae app/vmselect/vmui: make vmui-update after bbf8e459a0 2023-03-08 01:15:52 -08:00
Aliaksandr Valialkin
ed73317622 docs/CHANGELOG.md: improve description for 4b136abff8 2023-03-08 01:05:33 -08:00
Aliaksandr Valialkin
70b831e684 docs/CHANGELOG.md: improve the description of the bugfix at 62beea23f7
- Make the description easier to read by humans :)
- Add a link to VictoriaMetrics datasource plugin for Grafana, so users could easily discover it
2023-03-08 00:59:42 -08:00
Aliaksandr Valialkin
6e8e64f695 docs/CHANGELOG.md: clarify the description for 6bfe9cc733
- Add the panic message to the description, so it is easier to google
- Add a link to the corresponding bugreport

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-08 00:39:34 -08:00
Aliaksandr Valialkin
138757629b app/vmctl/README.md: remove trailing space after cc5b916237 2023-03-08 00:32:00 -08:00
Aliaksandr Valialkin
9e71462cee all: typo fixes of the same type as in the d056be710b 2023-03-08 00:30:21 -08:00
Aliaksandr Valialkin
884e58d58d docs/CHANGELOG.md: clarify the description for the change at 8bab50dc29
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3600
2023-03-08 00:23:35 -08:00
Aliaksandr Valialkin
1bb529e23e app/vmagent/remotewrite: follow-up for e3a756d82869f8c357b072f6e635ebfc7d65dd2c
- Document the fix
- Move the detection of VictoriaMetrics remoteWrite protocol from client.init() to newHTTPClient()
  This simplifies the fix to the following diff:

diff --git a/app/vmagent/remotewrite/client.go b/app/vmagent/remotewrite/client.go
index 099899c19..70b904af4 100644
--- a/app/vmagent/remotewrite/client.go
+++ b/app/vmagent/remotewrite/client.go
@@ -151,10 +151,6 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
        }
        c.sendBlock = c.sendBlockHTTP

-       return c
-}
-
-func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
        useVMProto := forceVMProto.GetOptionalArg(argIdx)
        usePromProto := forcePromProto.GetOptionalArg(argIdx)
        if useVMProto && usePromProto {
@@ -173,6 +169,10 @@ func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
        }
        c.useVMProto = useVMProto

+       return c
+}
+
+func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
2023-03-07 23:54:24 -08:00
Dmytro Kozlov
66bf1987bf app/vmagent: fix panic if auth config not defined (#530) 2023-03-07 23:51:30 -08:00
Aliaksandr Valialkin
202083f38c docs/CHANGELOG.md: document ec2abf9b69 2023-03-07 23:33:19 -08:00
Alexander Marshalov
ccb08fc28b added documentation about new templates field of vmalertmanager specification in operator (https://github.com/VictoriaMetrics/operator/issues/592) (#3924)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-07 17:51:26 +01:00
Nikolay
7a3e16e774 lib/netutil: fixes panic at proxy protocol (#3905)
it may occur if non proxy protocol message received by tcp server.
Listener Accept method must return only non-recoverable errors.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-07 08:50:18 -08:00
Yury Molodov
bbf8e459a0 vmui: fix display of selected value in the selector (#3919)
vmui: fix selected value in dropdowns for Explore page
2023-03-07 16:23:02 +01:00
Nikolay
02f13d5681 docs: updates operator api.md (#3922) 2023-03-07 09:09:05 +01:00
Roman Khavronenko
2472baa934 app/vmalert: do not wait for group start on removal (#3891)
Each group in vmalert starts with an artifical delay to avoid
thundering herd problem. For some groups with high evaluation
intervals, the delay could be significant.
If during this delay user will remove the group from the config
and hot-reload it - vmalert will have to wait until the delay
ends. This results into slow config reloading and UI hang.

The change moves the start-delay logic back to the group's
`start` method. Now, group can immediately exit from the
delay when `group.close()` method is called.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-06 14:04:43 +01:00
Craig Rodrigues
1b194bc6de docs: move installation methods further up in README (#3904) 2023-03-06 12:44:33 +01:00
Dmytro Kozlov
cc5b916237 docs: follow up after 4b136abff8 (#3918)
docs: follow up after 4b136abff8
2023-03-06 12:41:48 +01:00
Alexander Marshalov
75977a4d02 added doc for placeholder support in vmagent specification for operator (https://github.com/VictoriaMetrics/operator/issues/592) (#3916) 2023-03-06 11:58:31 +01:00
Gowtam Lal
4b136abff8 app/vmctl: Add ability to set headers for vm-native HTTP requests. (#3906)
app/vmctl: Add ability to set headers for vm-native HTTP requests
2023-03-06 11:22:31 +01:00
Roman Khavronenko
95dc65e7b3 docs: follow-up after 62beea23f7 (#3907)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 22:45:19 +01:00
Roman Khavronenko
a0cbef1c46 github: fix validation errors (#3903)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 16:28:47 +01:00
Roman Khavronenko
4e8de26fec docs: follow-up e781e22c9c (#3902)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 16:15:15 +01:00
Roman Khavronenko
0d8f80ce90 github: add a Question issue type (#3901)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 16:11:06 +01:00
Yury Molodov
62beea23f7 vmui: show query error (#3890)
* add links support with old query params
* show error after execute query
2023-03-03 16:07:47 +01:00
Artem Makhortov
943243321a doc: vmctl vm-native-step-interval supported values (#3899) 2023-03-03 16:05:33 +01:00
Nikolay
6bfe9cc733 lib{mergset,storage}: prevent possible race condition with logging st… (#3900)
lib{mergset,storage}: prevent possible race condition with logging stats for merges

Previously partwrapper could be release by background process and reference for part may be invalid 
during logging stats. It will lead to panic at vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-03 12:33:42 +01:00
Haleygo
d056be710b fix some typo (#3898) 2023-03-03 11:02:13 +01:00
Artem Navoiev
de621c0cb7 remove image width for vmalert managed vm guide
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-02 21:51:34 +01:00
Artem Navoiev
14ef5311f0 change title
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-02 18:06:43 +01:00
Artem Navoiev
5872b5711d add vmalert managed vm integration guide
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-02 18:04:17 +01:00
Dmytro Kozlov
8bab50dc29 app/vmctl: add backoff retries to native protocol (#3859)
app/vmctl: vm-native - split migration on per-metric basis

`vm-native` mode now splits the migration process on per-metric basis. 
This allows to migrate metrics one-by-one according to the specified filter. 
This change allows to retry export/import requests for a specific metric and provides a better 
understanding of the migration progress.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-03-02 13:19:45 +01:00
Roman Khavronenko
d6fa4da712 vmalert: cancel in-flight requests on group's update or close (#3886)
When group's update() or close() method is called, the group
still need to wait for its current evaluation to finish.
Sometimes, evaluation could take a significant amount of time
which slows configuration update or vmalert's graceful shutdown.

The change interrupts current evaluation in order to speed up
the graceful shutdown or config update procedures.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-01 15:48:20 +01:00
Roman Khavronenko
2e153b68cd dashboards: account for indexdb size in Bytes-per-Point panel (#3884)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-28 17:47:52 +01:00
Roman Khavronenko
3814747e94 deployment/docker: fix typo (#3883)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-28 14:29:28 +01:00
Dmytro Kozlov
ec2abf9b69 app/vmctl: Increase http request timeout made by remote read client, add importing tips (#3879)
app/vmctl: Increase default http request timeout made by remote read client
2023-02-28 09:50:33 +01:00
Aliaksandr Valialkin
06854738b6 .github/workflows/check-licenses.yml: use the correct version of Go - 1.20.1 - instead of 1.21.0 2023-02-27 19:25:13 -08:00
Aliaksandr Valialkin
95e1173423 docs/CHANGELOG.md: document v1.79.10 release 2023-02-27 17:35:48 -08:00
Aliaksandr Valialkin
8937de5f99 vendor: make vendor-update 2023-02-27 15:32:45 -08:00
Aliaksandr Valialkin
8288e327ee docs/CHANGELOG.md: cut v1.88.1 2023-02-27 15:28:18 -08:00
Aliaksandr Valialkin
dfe3939665 docs/CHANGELOG.md: link to the issue, which may benefit from -internStringDisableCache command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
2023-02-27 14:55:40 -08:00
Aliaksandr Valialkin
46127b432d lib/bytesutil: add -internStringDisableCache and -internStringCacheExpireDuration command-line flags
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3872
2023-02-27 14:16:49 -08:00
Aliaksandr Valialkin
0d3f31f60e lib/storage: follow-up for 39cdc546dd
- Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout,
  since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years,
  while the -snapshotCreateTimeout is usually smaller than one hour.
- Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md,
  so readers could easily find the corresponding docs when reading the changelog.
- Properly remove all the created directories on unsuccessful attempt to create
  snapshot in Storage.CreateSnapshot().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2023-02-27 13:07:38 -08:00
Zakhar Bessarab
39cdc546dd lib/storage: enhancements for snapshots process (#3873)
* lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858)

* lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage

* docs: fix formatting

* app/vmstorage: add metrics to track status of snapshots

* app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update flag name in docs

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: reflect new metrics names change in docs

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 12:12:03 -08:00
Zakhar Bessarab
5fadd58cf6 lib/promscrape: correctly register vm_promscrape_config_* metrics (#3876)
* lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 11:53:53 -08:00
Zakhar Bessarab
ac6d937372 doc: add changelog reference for vmgateway OpenID discovery (#3877)
* doc: add changelog reference for vmgateway OpenID discovery

* doc: add vmgateway docs

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 11:50:22 -08:00
Aliaksandr Valialkin
b3bb18d674 app/vmselect/promql: fix panic when calculating aggr_func(rollup*())
The panic has been introduced in dac21d874b
2023-02-27 11:48:27 -08:00
Zakhar Bessarab
ac4c7adec6 app/vmgateway: add new flag doc
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-27 11:18:34 -08:00
Zakhar Bessarab
5d7da8479f app/vmgateway: fix typo in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-27 11:18:08 -08:00
Zakhar Bessarab
4ee73f54a6 app/vmgateway: add OpenID discovery of JWKS endpoints
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-27 11:17:39 -08:00
Aliaksandr Valialkin
23871fb0bf app/vmagent: add -remoteWrite.vmProtoCompressLevel command-line flag for tuning the compression level for VictoriaMetrics remote write protocol 2023-02-27 11:03:49 -08:00
Aliaksandr Valialkin
1a6f2f07fd lib/httpserver: use github.com/klauspost/compress/gzhttp for compressing http responses
This allows removing gzip-related code from lib/httpserver.
2023-02-27 10:33:43 -08:00
Dmytro Kozlov
27c9446520 app/vmctl: skip series if measurement not found (#3869)
app/vmctl: skip measurements with no fields for influxdb mode
2023-02-27 14:28:47 +01:00
Dmytro Kozlov
bcbf73225e app/vmctl: enable version flag (#3868) 2023-02-27 10:30:53 +01:00
Aliaksandr Valialkin
255a0cf635 all: add makefile rules for GOARCH=s390x for all the VictoriaMetrics components
This is a follow-up for 007530f882
2023-02-26 12:36:51 -08:00
v1gnesh
007530f882 Option to build for s390x (Linux on IBM Z) (#3870)
Added `victoria-metrics-linux-s390x` to allow single node builds for `s390x` platform.

Leaving other packaging options at the moment, as on this platform, they're mostly going to be built from/with container images hosted within the company as a base, and not alpine.
2023-02-26 12:29:50 -08:00
Aliaksandr Valialkin
f7ef80aaad .golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:18:59 -08:00
Aliaksandr Valialkin
ffa327d6d1 app/vmagent: use the provided auth options when checking whether the remote storage supports VictoriaMetrics remote write protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-26 12:07:47 -08:00
Aliaksandr Valialkin
eece61a611 deployment/marketplace: update VictoriaMetrics release from v1.87.1 to v1.88.0 2023-02-24 18:58:39 -08:00
Aliaksandr Valialkin
0c016279d5 deployment/docker: update VictoriaMetrics docker tag from v1.87.1 to v1.88.0 2023-02-24 18:57:31 -08:00
Aliaksandr Valialkin
2d36dbcfa9 docs/CHANGELOG.md: cut v1.88.0 2023-02-24 17:54:21 -08:00
Aliaksandr Valialkin
8cfe4064b5 vendor: make vendor-update 2023-02-24 17:26:51 -08:00
Aliaksandr Valialkin
b4bf8dc2f0 docs: update -help output after e1c3267e34 2023-02-24 17:16:13 -08:00
Roman Khavronenko
e1c3267e34 vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 16:59:26 -08:00
Aliaksandr Valialkin
c33cc4322c docs/CHANGELOG.md: document v1.87.2 release 2023-02-24 16:14:28 -08:00
Aliaksandr Valialkin
6a88d5402d docs/CHANGELOG.md: document v1.79.9 release 2023-02-24 15:10:48 -08:00
Roman Khavronenko
dac21d874b metricsql: support optional 2nd argument for rollup functions (#3841)
* metricsql: support optional 2nd argument for rollup functions

Support optional 2nd argument `min`, `max` or `avg` for rollup functions:
 * rollup
 * rollup_delta
 * rollup_deriv
 * rollup_increase
 * rollup_rate
 * rollup_scrape_interval

 If second argument is passed, then rollup function will return only the selected aggregation type.
 This change can be useful for situations where only one type of rollup calculation is needed.
 For example, `rollup_rate(requests_total[5m], "max")`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 13:47:52 -08:00
Aliaksandr Valialkin
87aeeec3e8 docs/CHANGELOG.md: document d8eaa511b0 2023-02-24 12:42:02 -08:00
Zakhar Bessarab
d8eaa511b0 lib/{fs,mergeset,storage}: skip .must-remove. dirs when creating snapshot (#3858) (#3867) 2023-02-24 12:38:42 -08:00
Aliaksandr Valialkin
a2340e6c95 docs/CHANGELOG.md: typo fix: scrape scrape -> scrape 2023-02-24 12:33:29 -08:00
Aliaksandr Valialkin
c6ad3692ad lib/promscrape: follow-up for 43e104a83f
- Return immediately on context cancel during the backoff sleep.
  This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747

- Add a comment describing why the second attempt to obtain the response from remote side
  is perfromed immediately after the first attempt.

- Remove fasthttp dependency from lib/promscrape/discoveryutils

- Set context deadline before calling doRequestWithPossibleRetry().
  This simplifies the doRequestWithPossibleRetry() a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293
2023-02-24 12:20:42 -08:00
Zakhar Bessarab
43e104a83f fix: do not use exponential backoff for first retry of scrape request (#3824)
* fix: do not use exponential backoff for first retry of scrape request (#3293)

* lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update lib/promscrape/client.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-24 11:39:56 -08:00
Aliaksandr Valialkin
c87c7d1e29 app/vmselect/promql: measure the time required for calculating the aggregate function from the prepared source time series 2023-02-23 20:05:14 -08:00
Aliaksandr Valialkin
8b7a828c65 app/vmselect/vmui: make vmui-update after d4fc0ed874 2023-02-23 19:25:52 -08:00
Yury Molodov
d4fc0ed874 vmui: improve mobile ui (#3848)
* feat: improve mobile ui

* feat: improve mobile ui

* fix: change style server url

* fix: improve ExploreMetrics mobile

* fix: display global settings on all pages
2023-02-23 19:18:49 -08:00
Aliaksandr Valialkin
a02cf92fd1 docs: update --help descriptions after recent changes 2023-02-23 19:02:27 -08:00
Aliaksandr Valialkin
c734416f86 lib/protoparser: fix golangci-lint warning after f579cac297 2023-02-23 18:50:34 -08:00
Aliaksandr Valialkin
b285207aa7 app/vmselect: add -search.logQueryMemoryUsage command-line flag for logging queries, which take big amounts of memory
Thanks to @michal-kralik for initial attempts for this feature:

- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3651
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3715

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3553
2023-02-23 18:47:08 -08:00
Aliaksandr Valialkin
c080443fef app/vmagent: automatically detect whether the remote storage supports VictoriaMetrics remote write protocol
Substitute -remoteWrite.useVMProto with -remoteWrite.forcePromProto command-line flag,
which can be used for forcing Prometheus remote write protocol in cases when the remote storage
supports VictoriaMetrics remote write protocol.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-23 17:36:55 -08:00
Aliaksandr Valialkin
e688121de8 lib/promscrape/discovery/kuma: substitute blocking HTTP call with non-blocking HTTP call at discoveryutils.Client 2023-02-23 15:13:08 -08:00
Denys Holius
24915fd4bc Fix some typos and adds improvements for packer builds (#3825)
* update helper scripts to latest versions

* added missed command for initialisation variables from Makefile

* deployment/marketplace/vultr/helper-scripts/vultr-helper.sh: update helper script to latest version

* fixed typo for using VM_VERSION variable

* added an example of specifying the VM_VERSION and tokens for API's

* set packer logging to STDOUT by default
2023-02-23 12:42:55 +04:00
Aliaksandr Valialkin
637043a40e docs/CHANGELOG.md: document 6d019a3c37
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3830
2023-02-22 19:24:00 -08:00
Mattias Ängehov
6d019a3c37 Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832)
* Modify API version when running in Container App

* Handle expires on from token response

Response from IMDS does not always contain expires in value which is
currently used to get the token expiry time. An example resources that
doesn't provide it are Container Apps and App Service.

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>

* Fix client id parameter for user assigned identity

* Apply suggestions from code review

---------

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2023-02-22 19:19:53 -08:00
Aliaksandr Valialkin
510f78a96b all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 18:58:46 -08:00
my-git9
9dec3c8f80 chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:53:05 -08:00
Alexander Marshalov
aaef0bac00 fix interpolate function for filling only intermediate gaps (#3816) (#3857)
* fix interpolate function for filling only intermediate gaps (#3816)

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-22 18:38:43 -08:00
Yury Molodov
fc720a5a78 fix: change query params update (#3860) 2023-02-22 18:25:32 -08:00
Aliaksandr Valialkin
383bf9689e docs/CHANGELOG.md: document d2b92d3264
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
2023-02-22 17:51:52 -08:00
Aliaksandr Valialkin
9fbd45a22f lib/promscrape/discovery/kuma: follow-up for 317fef95f9
- Do not generate __meta_server label, since it is unavailable in Prometheus.
- Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md,
  so users could click it and read the docs without the need to search the corresponding docs.
- Remove kumaTarget struct, since it is easier generating labels for discovered targets
  directly from the response returned by Kuma. This simplifies the code.
- Store the generated labels for discovered targets inside atomic.Value. This allows reading them
  from concurrent goroutines without the need to use mutex.
- Use synchronouse requests to Kuma instead of long polling, since there is a little sense
  in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval.
- Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used.
- Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus.
- Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability.
- Remove unused fields from discoveryRequest and discoveryResponse.
- Update tests.
- Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config.
- Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback,
  since these are public types.

Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label,
since usually this label is set by the Prometheus itself to __address__ after the relabeling phase.
See https://www.robustperception.io/life-of-a-label/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3389

See https://github.com/prometheus/prometheus/issues/7919
and https://github.com/prometheus/prometheus/pull/8844
as a reference implementation in Prometheus
2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
eb08579452 lib/promscrape/discovery: add a comment explaining why duplicates are removed from the generated target labels 2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
4f649a0573 docs/CHANGELOG.md: document 110c3896e7
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3600
2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
f8811411fd app/vmagent/remotewrite: removed unneeded code in testPushWriteRequest after 57801660ab 2023-02-22 17:51:50 -08:00
Alexander Marshalov
b6845951a5 fixed typo in dns+srv documentation (#3861) 2023-02-23 02:40:00 +01:00
Zakhar Bessarab
d2b92d3264 lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3853)
* lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (see #3747)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: fix order of params for `doRequestWithPossibleRetry` to follow codestyle

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: accept deadline explicitly and extend passed context for local use

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-22 17:05:16 +01:00
Artem Navoiev
84c82e988d Add Dig Security case study
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 16:38:57 +01:00
Alexander Marshalov
317fef95f9 add kuma_sd_config for Kuma Control Plane targets discovery (#3389) (#3840) 2023-02-22 13:59:56 +01:00
Dmytro Kozlov
110c3896e7 app/vmctl: add retry backoff policy (#3844)
app/vmctl: move retries logic into a separate pkg
2023-02-22 13:06:55 +01:00
Alexander Marshalov
57801660ab Fix TestPushWriteRequest for remotewriteprotocol for pure-test (with native zstd) (#3850)
app/vmagent: fix TestPushWriteRequest for pure-test (with native zstd)

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-22 11:26:19 +01:00
Artem Navoiev
6b4ccc17c2 docs: fix typo OpentTSDB -> OpenTSDB (#3854)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 11:24:26 +01:00
Aliaksandr Valialkin
10cf6c9781 app/vmselect: allow zero value for -search.latencyOffset command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2061#issuecomment-1299109836

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/218
2023-02-21 18:06:55 -08:00
Aliaksandr Valialkin
836d56876a vendor: make vendor-update 2023-02-21 18:06:20 -08:00
Aliaksandr Valialkin
d59dc7616d go.mod: update github.com/VictoriaMetrics/fastcache from v1.12.0 to v1.12.1 2023-02-21 17:51:41 -08:00
Aliaksandr Valialkin
ffebc20f6d vendor: update github.com/VictoriaMetrics/fasthttp from v1.1.0 to v1.2.0
The v1.2.0 adds HostClient.DoCtx() function, which is needed by https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
for implementing fast canceling of pending requests to scrape targets on config update
2023-02-21 17:49:30 -08:00
Aliaksandr Valialkin
f04ec714c2 docs/Articles.md: mention rules backfilling via vmalert article
This is a follow-up for 5446ce0018
2023-02-21 17:44:08 -08:00
Roman Khavronenko
5446ce0018 docs: mention rules replay blogpost in vmalert docs (#3851)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-21 22:16:15 +01:00
panguicai
21546e6922 docs: update operator release name to be consistent with the following (#3845)
Signed-off-by: panguicai008 <1121906548@qq.com>
2023-02-21 16:57:54 +01:00
Aliaksandr Valialkin
7be8a8a37b docs/CHANGELOG.md: fix a link to VictoriaMetrics remote write protocol 2023-02-20 19:59:44 -08:00
Aliaksandr Valialkin
6c38702212 docs/Single-server-VictoriaMetrics.md: remove + from start and end query args, since curl substitutes them with whitespace and breaks the query 2023-02-20 19:32:09 -08:00
Aliaksandr Valialkin
ef1ef0598b docs/guides/migrate-from-influx.md: remove misleading doublequotes from the query example 2023-02-20 19:25:51 -08:00
Zakhar Bessarab
9d91d8fc91 vmgateway: add support of JWKS endpoint usage for JWT keys verification (#521) 2023-02-20 19:22:55 -08:00
Aliaksandr Valialkin
f4f1f2f976 docs/vmagent.md: remove the claim that VictoriaMetrics remote write protocol reduces the network bandwidth usage by up to 10x comparing to Prometheus remote write protocol
The 10x savings are reproduced only on artificial data.
The savings on production data are usually in the range 2x-4x.
2023-02-20 19:11:31 -08:00
Aliaksandr Valialkin
a678cbe62e docs/vmagent.md: mention that Mimir doesnt support backfilling 2023-02-20 19:11:30 -08:00
Aliaksandr Valialkin
76f2c70be3 app/vmagent: add support for VictoriaMetrics remote write protocol, which allows saving up to 10x on network bandwidth costs under high load
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-20 19:11:30 -08:00
Roman Khavronenko
74f122294d docs: update vmalert docs (#3843)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-20 16:39:31 +01:00
Corporte Gadfly
d5171c155c docs: typo fix (#3839) 2023-02-20 10:06:19 +01:00
Aliaksandr Valialkin
2cea923ff7 app/vmselect/promql: add share(q) aggregate function for normalizing results across multiple time series in [0..1] value range per each timestamp and aggregation group 2023-02-18 22:42:01 -08:00
Aliaksandr Valialkin
c86f1f1d1b app/vmselect/promql: add range_zscore(q) and range_trim_zscore(z, q) functions
These functions may be useful for dropping outliers at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759
2023-02-18 22:41:59 -08:00
Aliaksandr Valialkin
1030be91ae vendor: make vendor-update 2023-02-18 15:36:41 -08:00
Aliaksandr Valialkin
8bd2eee8e0 app/vmalert/README.md: sync with docs/vmalert.md after 6ef6f3a771 2023-02-18 15:21:24 -08:00
Aliaksandr Valialkin
406822a16c app/vmselect/promql: add range_mad(q) and range_trim_outliers(k, q) functions
These functions may help trimming outliers during query time
for the use case described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759
2023-02-18 15:19:10 -08:00
Aliaksandr Valialkin
c8467f37a9 vendor: update github.com/valyala/gozstd from v1.17.0 to v1.18.0 2023-02-18 15:19:10 -08:00
Haleygo
6ef6f3a771 vmalert: fix maxResolveDuration flag note (#3827)
Signed-off-by: Haleygo <hui.wang@daocloud.io>
2023-02-16 19:26:17 +01:00
Duc Tran
b58107c85a docs: fix links to alerts and alertmanager (#3829) 2023-02-16 19:25:10 +01:00
Aliaksandr Valialkin
87f1ed5d87 app/vmui: tooltip formatting enhancements according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706#issuecomment-1429980038 2023-02-14 23:37:26 -08:00
Aliaksandr Valialkin
11ce30820b all: update Go builder from Go1.20.0 to Go1.20.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.1+label%3ACherryPickApproved
2023-02-14 23:05:16 -08:00
Aliaksandr Valialkin
5123a61be9 vendor: make vendor-update 2023-02-13 11:14:17 -08:00
Aliaksandr Valialkin
5c4f5b83fc all: rename ParseStream -> stream.Parse
This is a follow-up for 057698f7fb
2023-02-13 10:52:05 -08:00
Aliaksandr Valialkin
ccdddf7996 lib/protoparser/promremotewrite: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:46:54 -08:00
Aliaksandr Valialkin
9be1398b92 lib/protoparser/native: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:43:05 -08:00
Aliaksandr Valialkin
8830607021 lib/protoparser/graphite: extract stream parsing code into a separate stream package 2023-02-13 10:32:36 -08:00
Aliaksandr Valialkin
a646841c07 lib/protoparser/csvimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:25:46 -08:00
Aliaksandr Valialkin
7568658c19 lib/protoparser/vmimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:20:19 -08:00
Aliaksandr Valialkin
af37717108 lib/protoparser/opentsdbhttp: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:16:03 -08:00
Aliaksandr Valialkin
7720d403c0 lib/protoparser/opentsdb: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:03:16 -08:00
Aliaksandr Valialkin
fe196e0b7a lib/protoparser/influx: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:58:52 -08:00
Aliaksandr Valialkin
f83d6d69b2 lib/protoparser/datadog: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:51:47 -08:00
Aliaksandr Valialkin
f3be9483f4 docs/CHANGELOG.md: improve the docs for 8ea02eaa8e 2023-02-13 09:41:46 -08:00
Roman Khavronenko
057698f7fb lib/protoparser/prometheus: move streamparser to subpackage (#3814)
`lib/protoparser/prometheus` is used by various applications,
such as `app/vmalert`. The recent change to the
`lib/protoparser/prometheus` package introduced a new dependency
of `lib/writeconcurrencylimiter` which exposes some metrics.
Because of the dependency, now all applications which have this
dependency also expose these metrics.

Creating a new `lib/protoparser/prometheus/stream` package helps
to remove these metrics from apps which use `lib/protoparser/prometheus`
as dependency.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 09:26:07 -08:00
Roman Khavronenko
a645a95bd6 docs: improve troubleshooting docs for vmalert (#3812)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 17:29:30 +01:00
Roman Khavronenko
934646ccf7 follow-up after d1cbc35cf6 (#3813)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 16:23:50 +01:00
Droxenator
8ea02eaa8e fixed opentsdbListenAddr timestamp conversion (#3810)
Co-authored-by: Andrei Ivanov <a.ivanov@corp.mail.ru>
2023-02-13 16:07:53 +01:00
Artem Navoiev
2e2b1ba87e change docs for VictoriaMetrics Managed (#3803)
* change docs for VictoriaMetrics Managedd

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* Update docs/managed-victoriametrics/user-managment.md

Co-authored-by: Max Golionko <8kirk8@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Max Golionko <8kirk8@gmail.com>
2023-02-13 13:29:03 +01:00
Oleksandr Redko
9fff48c3e3 app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
438b2e11bd app/vmauth: allow specifying max_concurrent_requests value on a per-user basis bigger than the -maxConcurrentPerUserRequests value 2023-02-11 20:53:08 -08:00
Aliaksandr Valialkin
e5070c0bcd docs/sd_configs.md: properly escape __address__ string 2023-02-11 14:45:54 -08:00
Aliaksandr Valialkin
c5b9f7f751 docs/sd_configs.md: document how the __address__ label is generated per each discovered target 2023-02-11 14:41:59 -08:00
Aliaksandr Valialkin
f9b3409ee3 lib/promscrape/discovery/openstack: use port 80 for the discovered target by default if it isnt specified in the config 2023-02-11 14:41:58 -08:00
Aliaksandr Valialkin
25a9017a72 app/vmui: show median instead of avg on graph tooltip and line legend
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706
2023-02-11 12:51:12 -08:00
Aliaksandr Valialkin
3ec8a4dc80 lib/{mergeset,storage}: allow at least 3 concurrent flushes during background merges on systems with 1 or 2 CPU cores
This should prevent from data ingestion slowdown and query performance degradation
on systems with small number of CPU cores (1 or 2), when big merge is performed.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
2023-02-11 12:08:52 -08:00
Aliaksandr Valialkin
4156e78e11 docs/Articles.md: add a link to https://www.techetio.com/2022/08/21/evaluating-backend-options-for-prometheus-metrics/ 2023-02-11 12:08:52 -08:00
Roman Khavronenko
b209d4ace0 dashboards: use median instead of avg (#3800)
`avg` can be affected by just one outlier, which may lead
to false conclusions. `median` is supposed to reflect
reality better by leveling outliers out.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-11 10:01:30 -08:00
Aliaksandr Valialkin
b11bdc46be app/vmselect/promql: add mad_over_time(m[d]) function
See https://github.com/prometheus/prometheus/issues/5514
2023-02-11 01:06:20 -08:00
Aliaksandr Valialkin
ed4492ddd5 all: update alpine base docker image from 1.17.1 to 1.17.2
See https://alpinelinux.org/posts/Alpine-3.17.2-released.html
2023-02-11 00:37:17 -08:00
Aliaksandr Valialkin
776391917f app/vmauth: improve load balancing by sending incoming requests to backends with the lowest number of concurrent requests
While at it, stop sending requests to unavailable backend for 3 seconds
before the next attempt. This should reduce the amounts of useless work
and the number of useless network packets when the backend is temporarily unavailable.
2023-02-11 00:30:31 -08:00
Aliaksandr Valialkin
f3625e4f3f app/vmauth: add -maxConcurrentPerUserRequests command-line option for limiting the number of concurrent requests on a per-user basis
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346
2023-02-10 21:58:21 -08:00
Aliaksandr Valialkin
70f8911ca7 app/vmauth: automatically retry failing GET requests on the remaining backends 2023-02-09 21:05:55 -08:00
Dmytro Kozlov
f582f9e8ab app/vmauth: add concurrent requests limit per auth record (#3749)
* app/vmauth: add concurent requests limit per auth record

* app/vmauth: added clarification comment

* app/vmauth: remove unused code

* app/vmauth: move read from limiter

* app/vmauth: fix text

* app/vmauth: fix comments

* - Clarify the docs for the max_concurrent_requests option at docs/vmauth.md
- Clarify the description of the change at docs/CHANGELOG.md
- Make sure that the -maxConcurrentRequests takes precedence over per-user max_concurrent_requests
- Update tests for verifying that the max_concurrent_requests option is parsed properly

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 20:03:01 -08:00
Aliaksandr Valialkin
73358571ee app/vmalert: follow-up after d3c64aae8768d58781ee7e358bd7f3d8e0eb836d
- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
2023-02-09 18:52:00 -08:00
Roman Khavronenko
14c20c1843 vmalert: support object storage for rules (#519)
* vmalert: support object storage for rules

Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.

* review fixes
2023-02-09 18:50:48 -08:00
Aliaksandr Valialkin
c12eb2cc7f deployment/docker: update VictoriaMetrics Docker images from v1.87.0 to v1.87.1 2023-02-09 15:53:08 -08:00
Aliaksandr Valialkin
291c41978e vendor: make vendor-update 2023-02-09 14:48:16 -08:00
Aliaksandr Valialkin
d4b97b69bf docs/CHANGELOG.md: document d621d50d4fb3b43a0bcb4419bee979f0192d38fe 2023-02-09 14:40:07 -08:00
Aliaksandr Valialkin
035a2b5ed5 all: skip issues with low severity at docker scan 2023-02-09 14:25:13 -08:00
Aliaksandr Valialkin
0e0095d350 all: run apk update && apk upgrade in base Alpine Docker image in order to get all the recent security fixes 2023-02-09 14:01:32 -08:00
Aliaksandr Valialkin
a42e3e8dfb docs/CHANGELOG.md: cut v1.87.1 and mark 1.87.x as LTS release 2023-02-09 11:20:57 -08:00
Zakhar Bessarab
f13a255918 lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3791)
* lib/promscrape: fix cancelling in-flight scrape requests during configuration reload when using `streamParse` mode (see #3747)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 11:13:06 -08:00
Aliaksandr Valialkin
513707a8c7 app/vmui: UX enhancements for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706
- Display `min` value additionally to `avg`, `max` and `last`
- Allow copy-n-pasting metric name with its labels from both legend and tooltup
2023-02-09 11:04:51 -08:00
Aliaksandr Valialkin
f40661e7b7 docs/vmagent.md: clarify that automatically generated metrics contain all the target-specific labels, including instance and job 2023-02-09 11:04:51 -08:00
Air
a1432e6b0a Possibly spelling in the Quick start 2023-02-09 15:50:20 +02:00
Yury Molodov
bff18cb5dd vmui: lazy loading predefined panels (#3795)
* fix: change logic lazy loading predefined panels

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 00:11:55 -08:00
Yury Molodov
e1063ce3c1 vmui: improve tenant selector (#3794)
* fix: change styles tenant selector (#3792)

* docs/CHANGELOG.md: document the change

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3792

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 00:08:59 -08:00
Aliaksandr Valialkin
46a521191f docs/CHANGELOG.md: document changes at v1.79.8 LTS release 2023-02-08 23:38:46 -08:00
Yury Molodov
8afc0aef8d vmui: add last/max/avg values (#3789)
* feat: add last/max/avg values (#3706)

* fix: change filter exclude values

* app/vmui: wip

- improve the visualization for avg/max/last values
- make getAvgFromArray() function resilient against inf/undefined/nil
- export getLastFromArray() function, which is resilient against inf/undefined/nil
- run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-08 22:41:20 -08:00
Aliaksandr Valialkin
114c14febf docs/CHANGELOG.md: document 75bcf86a31
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3740
2023-02-08 11:24:07 -08:00
Yury Molodov
75bcf86a31 fix: turn off the local dashboards(#3740) (#3793) 2023-02-08 11:13:15 -08:00
Aliaksandr Valialkin
a1ee679042 docs/CHANGELOG.md: add more context to the bugfix description in Nomad service discovery
See 146fd2eca3
2023-02-08 09:24:48 -08:00
Aliaksandr Valialkin
a8e88e74cc lib/backup/azremote: fix after upgrading github.com/Azure/azure-sdk-for-go/sdk/storage/azblob from v0.6.1 to v1.0.0 2023-02-08 09:18:23 -08:00
Aliaksandr Valialkin
c9d2934bb4 vendor: make vendor-update 2023-02-08 08:55:14 -08:00
Aliaksandr Valialkin
6c21b6ec09 docs/CHANGELOG.md: document the change at 67b01329a0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
2023-02-08 08:42:19 -08:00
Roman Khavronenko
e83f14210d Vmalert fixes (#3788)
* vmalert: use group's ID in UI to avoid collisions

Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: prevent disabling state updates tracking

The minimum number of update states to track is now set to 1.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: display `debug` field for rule in UI

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: exclude `updates` field from json marhsaling

This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 14:34:03 +01:00
Max Golionko
6495b62866 bump go to 1.20 in ci jobs (#3787) 2023-02-08 14:32:42 +01:00
Roman Khavronenko
86e47177dc docs: follow-up after 2e4bfcce63 (#3785)
2e4bfcce63

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 09:48:05 +01:00
Karan Sharma
146fd2eca3 sd/nomad: panic in nomad watcher because of nil map (#3784)
properly initialize url.Values
2023-02-08 09:43:29 +01:00
Aliaksandr Valialkin
67b01329a0 lib/writeconcurrencylimiter: initialize concurrencyLimitCh before exporting vm_concurrent_insert_capacity and vm_concurrent_insert_current metrics
This will result in proper calculations for the the alerting rule:

 avg_over_time(vm_concurrent_insert_current[1m]) >= vm_concurrent_insert_capacity

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
2023-02-07 11:08:17 -08:00
Aliaksandr Valialkin
f2be447270 Makefile: update golangci-lint from v1.50.1 to v1.51.1 2023-02-07 11:08:11 -08:00
Aliaksandr Valialkin
1901fbf19b docs/CHANGELOG.md: fix formatting for the change from 6fd10e8871 2023-02-07 09:34:57 -08:00
earthgecko
3a51a3bc42 Clarifications between standalone/cluster ingestion endpoints (#3771)
docs: clarifications between standalone/cluster ingestion endpoints

This is an attempt to make it a bit clearer to the user that the cluster version ingestion URLs are different from the standalone ones.  I have also changed the order of the list items to make it a bit clearer and hopefully stop the user simply inferring that `/prometheus/api/v1` is only related to Prometheus data.
2023-02-07 15:17:12 +01:00
Max Golionko
6f24fa2055 CI: speedup build by 2.4x. restore nightly build (#3772)
* setup docker buildx
* add snyk integration
* add go cache for docker build
* cancel redundant job if there is new commit into same PR or branch
2023-02-07 10:12:16 +08:00
Max Golionko
977c642934 docs: update formatiing for k8s monitoring with Managed VictoriaMetrics (#3768)
* jekyll formatting madness
2023-02-06 18:33:40 +08:00
Roman Khavronenko
c32d8ea29e vmalert: update docs (#3770)
vmalert: update flags description

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-06 09:51:30 +01:00
Denys Holius
8aa7559462 fixed wrong vmstorage port number (#3769) 2023-02-06 09:08:50 +01:00
Roman Khavronenko
6fd10e8871 vmalert: speed up state restore procedure on start (#3758)
* vmalert: speed up state restore procedure on start

Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.

While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.

This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.

See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* make lint happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-03 19:46:13 -08:00
Aliaksandr Valialkin
0a824d9490 app/vmselect/vmui: make vmui-update after e4c04b6dbe 2023-02-03 19:34:01 -08:00
Yury Molodov
e4c04b6dbe vmui: set light theme for app mode (#3748)
* fix: set light theme for app mode

* fix: check inputTenantID flag

* fix: rename inputTenantID to useTenantID
2023-02-03 19:31:37 -08:00
Aliaksandr Valialkin
98dc968920 docs/CHANGELOG.md: document f63f487787
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3707
2023-02-03 19:30:12 -08:00
Yury Molodov
f63f487787 vmui: mobile view (#3742)
* feat: add detect the system theme

* fix: change logic fetch tenants

* feat: add docs and info to cardinality page

* feat: add mobile view #3707
2023-02-03 19:27:57 -08:00
Aliaksandr Valialkin
88fed0232c dashboards: typo fix Datapoints scanned per series -> Datapoints scanned per query 2023-02-03 19:12:33 -08:00
Aliaksandr Valialkin
af1a9c5eda deployment/docker: update Go builder from Go1.19.5 to Go1.20.0
See https://go.dev/blog/go1.20
2023-02-03 14:17:33 -08:00
Aliaksandr Valialkin
7440f971ab docs/MetricsQL.md: add links to "rollup results" explanation 2023-02-03 11:09:42 -08:00
Aliaksandr Valialkin
12cf8d9f69 docs/managed-victoriametrics/how-to-monitor-k8s.md: rename image files according to docs/assets/README.md 2023-02-03 10:43:02 -08:00
Max Golionko
e18f8e9413 docs: move managed victoria metics guide into right folder (#3750)
* move guide folder
* image width control
2023-02-03 03:13:36 +08:00
Max Golionko
07b7fe83c4 Update docs/managed_victoriametrics/how-to-monitor-k8s.md
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-02 15:43:55 +02:00
Max Golionko
a2ba1f09e4 add guide to list of guides 2023-02-02 15:43:55 +02:00
Max Golionko
79527441ec added k8s guide for managed VM 2023-02-02 15:43:55 +02:00
Max Golionko
df1e545c0e disable codeql for docs. merge build and test back to one job (#3746) 2023-02-02 20:59:08 +08:00
Aliaksandr Valialkin
dc142867b8 docs/CHANGELOG.md: typo fixes 2023-02-01 20:40:44 -08:00
Aliaksandr Valialkin
da539bc286 deployment/docker: update VictoriaMetrics docker image tag from v1.86.2 to v1.87.0 2023-02-01 20:03:04 -08:00
Aliaksandr Valialkin
fe736c5388 docs/CHANGELOG.md: cut v1.87.0 2023-02-01 13:03:11 -08:00
Aliaksandr Valialkin
607b542222 vendor: make vendor-update 2023-02-01 12:23:23 -08:00
Aliaksandr Valialkin
8b9ebf625a lib/promscrape: add a comment explaining the logic behind adding exported_ perfix to metric names
This is a follow-up for 7b87fac8e7

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3557
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3406
2023-02-01 12:00:52 -08:00
Dmytro Kozlov
7b87fac8e7 lib/promscrape: fix honor_labels behavior (#3739)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-01 11:21:44 -08:00
Aliaksandr Valialkin
7b1caf1db3 docs/CHANGELOG.md: document 9254e494f9 2023-02-01 09:56:52 -08:00
Nikolay
9254e494f9 lib/storage: fixes finalDedup for backfilled data (#3737)
previously historical data backfilling may trigger force merge for previous month every hour
it consumes cpu, disk io and decrease cluster performance.
Following commit fixes it by applying deduplication for InMemoryParts
2023-02-01 09:54:21 -08:00
Zakhar Bessarab
68985455f1 fix: vmselect multi-level setup panic (#3738)
* app/vmselect/netstorage: fix panic for multi-level cluster setup when `replicationFactor` was set and request contained `trace` parameter (#3734)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmselect/netstorage: use correct context for retry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-01 08:59:30 -08:00
Zakhar Bessarab
4cf37c5e70 app/vmbackup: fix deleting snapshot after backup completion (#3735) (#3736)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-01 11:23:58 +01:00
Aliaksandr Valialkin
3d331e4c5d app/vmselect/vmui: make vmui-update after dcc5616126 2023-01-31 13:24:43 -08:00
Aliaksandr Valialkin
4ecb61e247 docs/CHANGELOG.md: document 442a9f16b4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3661
2023-01-31 13:03:46 -08:00
Denys Holius
442a9f16b4 Makefile: adds i386 architecture (#3725)
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3661
2023-01-31 12:58:53 -08:00
Yury Molodov
dcc5616126 vmui: improvement the theme (#3731)
* feat: add detect the system theme

* fix: change logic fetch tenants

* feat: add docs and info to cardinality page

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-31 12:54:59 -08:00
Aliaksandr Valialkin
080a3e2396 vendor: make vendor-update 2023-01-31 11:03:20 -08:00
Aliaksandr Valialkin
ac8bc77688 lib/bytesutil/internstring.go: increase the limit on the maximum string lengths, which can be interned
The limit has been increased from 300 bytes to 500 bytes according to the collected production stats.
This allows reducing CPU usage without significant increase of RAM usage in most practical cases.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
2023-01-31 10:56:55 -08:00
Roman Khavronenko
1cbdcd391c docs: mention -vmalert.proxyURL in vmalert docs (#3730)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-30 16:28:33 +01:00
Aliaksandr Valialkin
0788be35eb lib/promscrape/discovery/azure: add __meta_azure_machine_size label in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/11650
2023-01-27 17:07:12 -08:00
Aliaksandr Valialkin
ab57b92932 lib/promscrape/discovery/kubernetes: add support for __meta_kubernetes_pod_container_id
See https://github.com/prometheus/prometheus/issues/11843
and https://github.com/prometheus/prometheus/pull/11844
2023-01-27 16:34:06 -08:00
Aliaksandr Valialkin
6a7faf9f22 vendor: make vendor-update 2023-01-27 15:57:38 -08:00
Yury Molodov
ac14d50c18 vmui: add select of Tenant ID (#3673)
* feat: add select of tenantID

* feat: replace tenantID to default url

* fix: move the tenantID selector to the top header

* fix: hide tenantID selector by condition

* fix: correct z-index

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-27 15:53:14 -08:00
Aliaksandr Valialkin
51ad94677c docs: update security chapters after bd716d1b0c 2023-01-27 15:45:35 -08:00
Denys Holius
bd716d1b0c Improving docs by adding additional security sections (#3713)
* docs/Cluster-VictoriaMetrics.md: adds security section

* docs/Quick-Start.md: adds Security recommendation section
2023-01-27 15:22:21 -08:00
Aliaksandr Valialkin
06f6b76521 app/vmagent: properly return 200 response code when importing data via Prometheus PushGateway protocol
This is the same fix as has been already applied to app/vminsert at cdb6d651e9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1415
2023-01-27 14:40:02 -08:00
Aliaksandr Valialkin
a0c8b86eab docs/vmauth.md: update docs after ff39a91147
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346
2023-01-27 14:10:19 -08:00
Aliaksandr Valialkin
ff39a91147 app/vmauth: limit the number of concurrent requests served by vmauth with the -maxConcurrentRequests command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346

This commit is based on the https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3486
2023-01-27 14:07:30 -08:00
Aliaksandr Valialkin
372b1688d7 app/vmauth: do not use net/http/httputil.ReverseProxy
This allows better controlling requests to backends and providing better error logging.
For example, if the backend was unavailable, then the ReverseProxy was logging the error
message without client ip and the initial request uri. This could harden debugging.

This is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3486
2023-01-27 13:40:05 -08:00
Aliaksandr Valialkin
1b81d8f542 lib/netutil: move IsTrivialNetworkError() function there, since it is used in multiple places across the code 2023-01-27 13:24:30 -08:00
Aliaksandr Valialkin
7e355080ce app/vmauth: pass the target url to reverse proxy via context.Value instead of request header
This is less hacky way, since it doesn't clash with request headers
2023-01-27 12:15:52 -08:00
Aliaksandr Valialkin
2afa4ae00a docs/managed-victoriametrics: typo fix in links to images 2023-01-27 11:36:20 -08:00
Aliaksandr Valialkin
6342eb5523 docs/assets/README.md: mention that locally placed doc-specific images simplify referring them from various views without the need to deal with folder prefixes 2023-01-27 11:29:23 -08:00
Aliaksandr Valialkin
54a0ccbaca docs/managed-victoriametrics/user-management.md: move the associated images to docs/managed-victoriametrics/ folder with user-management_ prefix according to docs/assets/README.md 2023-01-27 11:25:22 -08:00
Aliaksandr Valialkin
b28cf0faa8 docs/managed-victoriametrics/quickstart.md: move the associated images to docs/managed-victoriametrics/ folder with quickstart_ prefix according to docs/assets/README.md 2023-01-27 11:16:03 -08:00
Aliaksandr Valialkin
3251d392b5 docs/assets: add README.md with the explanation on which files can be put into the docs/assets folder 2023-01-27 11:02:16 -08:00
Aliaksandr Valialkin
119010f7f2 docs/Cluster-VictoriaMetrics.md: move Naive_cluster_scheme.png from the docs/assets/images/ folder into docs/ folder and add Cluster-VictoriaMetrics_ prefix to the image name
The docs/assets folder should be used only for assets specific to docs generation at https://docs.victoriametrics.com, e.g. css, js and images.

All the other assets related to specific docs should be placed in the same folder as the corresponding *.md file.
These assets should have the same name prefix as the corresponding doc file name. This simplifies tracking the lifetime of these assets.
For example, if the doc is removed, it is very easy to remove all assets associated with it with a simple `rm -rf docs/doc-name*` command.

This also simplifies generating correct urls for doc-specific assets from both https://docs.victoriametrics.com
and from https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/ - just refer to the asset name without any directory prefixes.
2023-01-27 10:54:27 -08:00
Aliaksandr Valialkin
eedb294754 lib/netutil: typo fix in the error message 2023-01-27 10:38:38 -08:00
dmitryk-dk
4250b67fe8 docs: move how to register from dbaas to docs 2023-01-27 14:07:09 +02:00
dmitryk-dk
465c889f7f docs: use absolute path 2023-01-27 14:07:09 +02:00
dmitryk-dk
834c18a346 docs: add documentation of user management on managed-vm 2023-01-27 14:07:09 +02:00
Aliaksandr Valialkin
36941d6d75 app/vmauth: consistency renaming: UserInfo.URLMap -> UserInfo.URLMaps
This is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3486
2023-01-27 00:19:02 -08:00
Aliaksandr Valialkin
558165521b docs/Cluster-VictoriaMetrics.md: update command-line descriptions after ebebaecd94 2023-01-27 00:04:41 -08:00
Aliaksandr Valialkin
0890adde67 docs: update command-line descriptions after 73256fe438 2023-01-27 00:00:37 -08:00
Aliaksandr Valialkin
28d92a2f31 lib/netutil: limit the time needed for reading proxy protocol headers
This should prevent from misconfigured proxies and from possible Slowloris-type DoS attacks
(see https://en.wikipedia.org/wiki/Slowloris_(computer_security) )

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-01-26 23:46:51 -08:00
Aliaksandr Valialkin
cb374677a9 app/vmagent/prometheusimport: delete the temporary directory created by vmagent after the test is complete
This is a follow-up for 1cfa183c2b
2023-01-26 23:21:24 -08:00
Nikolay
73256fe438 lib/netutil: init implimentation of proxy protocol (#3687)
* lib/netutil: init implimentation of proxy protocol
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-26 23:08:35 -08:00
Aliaksandr Valialkin
2c1419c687 docs/CHANGELOG.md: make the description for the bugfix from 465a285324 more reader-friendly 2023-01-26 10:08:13 -08:00
Nikolay
465a285324 lib/storage: properly release parts inMerge lock (#3711)
if storage doesn't have enough disk space, finalDedupWatcher holds inMerge lock for all parts and never release it until storage restart
2023-01-26 08:05:20 -08:00
Roman Khavronenko
95ee86b600 docs: specify the time window for series_limit (#3708)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-25 09:30:20 -08:00
Aliaksandr Valialkin
28f66f0079 docs: update the list of command-line flags according to the latest changes 2023-01-25 09:20:24 -08:00
Aliaksandr Valialkin
d655d6b047 lib/streamaggr: add ability to de-duplicate input samples before aggregation 2023-01-25 09:14:49 -08:00
Yury Molodov
99d49e3ceb vmui: include fonts in its bundle (#3705)
* feat: include fonts in the build

* fix: reduce size fonts

* wip

- Document the change at docs/CHANGELOG.md
- Run `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-24 09:30:56 -08:00
Yury Molodov
20ad848c5d vmui: improvements to the UI styles (#3704)
* feat: add dark theme

* update packages

* feat: add multilevel menu (#3678)

* fix: correct styles

* fix: update link to cardinality-explorer

* fix: remove unused scss variables

* docs/CHANGELOG.md: document the changes

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-24 09:20:31 -08:00
Roman Khavronenko
1a8875b417 discover/ec2: follow-up after e2b4ab8384 (#3703)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-24 11:02:18 +01:00
Roman Khavronenko
c7c4786f3f discover/ec2: bump API version (#3702)
Switch to the actual API version `2016-11-15`,
since the old version doesn't provide access to all
the fields which implementation expects.
For example, old API missing `zone_id` field
in `DescribeAvailabilityZonesResponse` response.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3700

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-24 10:42:55 +01:00
Aliaksandr Valialkin
a971bcc3fe lib/bytesutil: do not intern long strings, since they may need big amounts of additional memory for the cache
Allow users fine-tuning the maximum string length for interning via -internStringMaxLen command-line flag.
This may be used for fine-tuning RAM vs CPU usage for certain workloads.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
2023-01-23 23:36:22 -08:00
Aliaksandr Valialkin
0fd440cdb4 deployment: sync with cluster branch 2023-01-23 22:59:00 -08:00
Aliaksandr Valialkin
c496f06ca3 app/vmagent/{promremotewrite,vmimport}: remove unused functions InsertHandlerForReader()
Thanks to 1cfa183c2b , where the first such function has been removed
2023-01-23 22:31:29 -08:00
Aliaksandr Valialkin
f7acdb13db app/{vmagent,vminsert}: follow-up for 1cfa183c2b
- Call httpserver.GetQuotedRemoteAddr() and httpserver.GetRequestURI() only when the error occurs.
  This saves CPU time on fast path when there are no parsing errors.
- Create a helper function - httpserver.LogError() - for logging the error with the request uri and remote addr context.
2023-01-23 22:26:53 -08:00
Artem Navoiev
1cfa183c2b add error handler for parsing prometheus text format to vmagent and v… (#3693)
* add error handler for parsing prometheus text format to vmagent and vminsert

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix typo

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* typo

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix variables naming and error message

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-23 22:14:34 -08:00
Yury Molodov
3536bef36e vmui: add open graph and twitter card tags (#3697)
* feat: add open graph and twitter card tags

* app/vmui: spelling fixes

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-23 22:04:46 -08:00
Aliaksandr Valialkin
babecd8363 lib/promscrape: follow-up for 393876e52a
- Document the change in docs/CHANGELOG.md
- Reduce memory usage when sending stale markers even more by parsing the response in stream parsing mode
- Update the TestSendStaleSeries

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
2023-01-23 21:52:59 -08:00
Roman Khavronenko
393876e52a lib/promscrape: limit number of sent stale series at once (#3686)
Stale series are sent when there is a difference between current
and previous scrapes. Those series which disappeared in the current scrape
are marked as stale and sent to the remote storage.

Sending stale series requires memory allocation and in case when too many
series disappear in the same it could result in noticeable memory spike.
For example, re-deploy of a big fleet of service can result into
excessive memory usage for vmagent, because all the series with old
pod name will be marked as stale and sent to the remote write storage.

This change limits the number of stale series which can be sent at once,
so memory usage remains steady.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-23 21:15:59 -08:00
Aliaksandr Valialkin
2c4e384f07 lib/promscrape: properly log the actual response size after c4229a1bba 2023-01-23 21:04:50 -08:00
Aliaksandr Valialkin
ba5a6c851c lib/storage: use deterministic random generator in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 20:10:32 -08:00
Aliaksandr Valialkin
1a3a6ef907 lib/mergeset: use deterministic random generator in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:43:49 -08:00
Aliaksandr Valialkin
7030429958 lib/mergeset: fix data race in BenchmarkInmemoryBlockMarshal 2023-01-23 19:43:18 -08:00
Aliaksandr Valialkin
30e968df6d app/vmselect: use consistent randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:27:25 -08:00
Aliaksandr Valialkin
f2b40dbe9a app/vmalert: use consistent randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:25:10 -08:00
Aliaksandr Valialkin
a11dc6689a lib/decimal: use consistent randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:23:39 -08:00
Aliaksandr Valialkin
0a4d8dc777 lib/uint64set: use repeatable randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:22:58 -08:00
Aliaksandr Valialkin
3d1cb011b6 lib/encoding: make deterministic tests which rely on math/rand
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 18:41:09 -08:00
Aliaksandr Valialkin
a7f8ce5e3d vendor: make vendor-update 2023-01-23 08:05:54 -08:00
Artem Navoiev
7c1daade15 tests: use DebugFlush instead of vmstorage stop. This simplifies the logic and allows to remove test-only methodds (#3694)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-23 14:45:59 +01:00
Denys Holius
f3cb412508 Fix/remove vmanomaly from release guide (#3699)
* docs/Release-Guide.md: remove vmanomaly from release guide because it has own release cycle

* fixed a typo
2023-01-23 14:32:26 +01:00
Aliaksandr Valialkin
edc51b3119 docs/Articles.md: added missing third-party articles about VictoriaMetrics 2023-01-22 14:23:45 -08:00
Aliaksandr Valialkin
cb5d073bc9 docs/Single-server-VictoriaMetrics.md: make it clear that VictoriaMetrics supports both pull and push protocols at how to import time series data chapter 2023-01-22 14:04:29 -08:00
Aliaksandr Valialkin
864c651c03 docs/Articles.md: add https://dev.to/aws-builders/ultra-monitoring-with-victoria-metrics-1p2 2023-01-22 13:51:53 -08:00
Aliaksandr Valialkin
4e25bc2087 lib/vmselectapi: propagate timeout errors from vmselect to vmstorage instead of closing the connection established from vmselect to vmstorage
This is a follow-up for 20e9598254
2023-01-20 19:33:42 -08:00
Aliaksandr Valialkin
74df30456b app/vmselect: make vmui-update after df7b81b44d 2023-01-20 12:07:13 -08:00
Yury Molodov
df7b81b44d vmui: add support for time zone selection for older versions of browsers (#3680)
* fix: add check for support of getting time zones

* vmui: add support for time zone selection for older versions of browsers
2023-01-20 11:47:53 -08:00
Denys Holius
d513230d43 Adds some improvements to release guide docs (#3679)
* docs/Release-Guide.md: fixed a typo

* Release-Guide.md: adds missed steps for updating vmanomaly and vmgateway helm charts
2023-01-19 16:03:42 +01:00
Max Golionko
e8554cd1cb ci: checkout correct branch for build step (#3676) 2023-01-19 08:34:20 +01:00
Aliaksandr Valialkin
59b67f1cfa snap: update Go builder from v1.19.4 to v1.19.5 2023-01-18 14:05:18 -08:00
Aliaksandr Valialkin
adeec6e369 deployment/docker: update VictoriaMetrics components in docker-compose from v1.86.0 to v1.86.2 2023-01-18 12:57:39 -08:00
1064 changed files with 53433 additions and 23949 deletions

32
.github/ISSUE_TEMPLATE/question.yml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Question
description: Ask a question regarding VictoriaMetrics or its components
labels: [question]
body:
- type: textarea
id: describe-the-component
attributes:
label: Is your question request related to a specific component?
placeholder: |
VictoriaMetrics, vmagent, vmalert, vmui, etc...
validations:
required: false
- type: textarea
id: describe-the-question
attributes:
label: Describe the question in detail
description: |
A clear and concise description of the issue and the question.
validations:
required: true
- type: checkboxes
id: troubleshooting
attributes:
label: Troubleshooting docs
description: I am familiar with the following troubleshooting docs
options:
- label: General - https://docs.victoriametrics.com/Troubleshooting.html
required: false
- label: vmagent - https://docs.victoriametrics.com/vmagent.html#troubleshooting
required: false
- label: vmalert - https://docs.victoriametrics.com/vmalert.html#troubleshooting
required: false

View File

@@ -17,7 +17,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.5
go-version: 1.20.2
id: go
- name: Code checkout
uses: actions/checkout@master

View File

@@ -13,6 +13,10 @@ on:
schedule:
- cron: "30 18 * * 2"
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze

View File

@@ -15,6 +15,7 @@ on:
push:
branches: [master, cluster]
paths-ignore:
- "docs/**"
- "**.md"
- "**.txt"
- "**.js"
@@ -22,12 +23,17 @@ on:
# The branches below must be a subset of the branches above
branches: [master, cluster]
paths-ignore:
- "docs/**"
- "**.md"
- "**.txt"
- "**.js"
schedule:
- cron: "30 18 * * 2"
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze
@@ -51,7 +57,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
go-version: 1.20.2
check-latest: true
cache: true
if: ${{ matrix.language == 'go' }}

View File

@@ -1,66 +0,0 @@
name: main - test
on:
push:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
permissions:
contents: read
jobs:
lint:
name: lint
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: Dependencies
run: |
make install-golangci-lint
make check-all
git diff --exit-code
test:
needs: lint
strategy:
matrix:
scenario: ["test-full", "test-pure", "test-full-386"]
name: test
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: run tests
run: |
make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt

View File

@@ -1,16 +1,29 @@
name: main
on:
workflow_run:
workflows: ["main - test"]
types:
- completed
push:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: Build
lint:
name: lint
runs-on: ubuntu-latest
steps:
- name: Code checkout
@@ -19,10 +32,64 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
go-version: 1.20.2
check-latest: true
cache: true
- name: Dependencies
run: |
make install-golangci-lint
make check-all
git diff --exit-code
test:
needs: lint
strategy:
matrix:
scenario: ["test-full", "test-pure", "test-full-386"]
name: test
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.20.2
check-latest: true
cache: true
- name: run tests
run: |
make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt
build:
needs: test
name: build
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
id: go
uses: actions/setup-go@v3
with:
go-version: 1.20.2
check-latest: true
cache: true
- uses: actions/cache@v3
with:
path: gocache-for-docker
key: gocache-docker-${{ runner.os }}-${{ steps.go.outputs.go-version }}-${{ hashFiles('go.mod') }}
- name: Build
run: |
make victoria-metrics-crossbuild

View File

@@ -12,28 +12,37 @@ jobs:
name: Build
runs-on: ubuntu-latest
steps:
-
name: Login to Docker Hub
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Setup Go
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.5
go-version: 1.20.2
id: go
-
name: Setup docker scan
- name: Setup docker scan
run: |
mkdir -p ~/.docker/cli-plugins && \
curl https://github.com/docker/scan-cli-plugin/releases/latest/download/docker-scan_linux_amd64 -L -s -S -o ~/.docker/cli-plugins/docker-scan &&\
chmod +x ~/.docker/cli-plugins/docker-scan
-
name: Code checkout
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Code checkout
uses: actions/checkout@master
-
name: Publish
- uses: actions/cache@v3
with:
path: gocache-for-docker
key: gocache-docker-${{ runner.os }}-${{ steps.go.outputs.go-version }}-${{ hashFiles('go.mod') }}
- name: build & publish
run: |
docker scan --severity=medium --login --token "$SNYK_TOKEN" --accept-license
LATEST_TAG=nightly PKG_TAG=nightly make publish
env:
SNYK_TOKEN: ${{ secrets.SNYK_AUTH_TOKEN }}

View File

@@ -1,14 +1,18 @@
run:
timeout: 2m
enable:
linters:
enable:
- revive
issues:
exclude-rules:
- linters:
- staticcheck
text: "SA(4003|1019|5011):"
- linters:
- staticcheck
text: "SA(4003|1019|5011):"
include:
- EXC0012
- EXC0014
linters-settings:
errcheck:

View File

@@ -144,6 +144,7 @@ vmutils-windows-amd64: \
vmctl-windows-amd64
victoria-metrics-crossbuild: \
victoria-metrics-linux-386 \
victoria-metrics-linux-amd64 \
victoria-metrics-linux-arm64 \
victoria-metrics-linux-arm \
@@ -155,6 +156,7 @@ victoria-metrics-crossbuild: \
victoria-metrics-openbsd-amd64
vmutils-crossbuild: \
vmutils-linux-386 \
vmutils-linux-amd64 \
vmutils-linux-arm64 \
vmutils-linux-arm \
@@ -177,6 +179,7 @@ release: \
release-vmutils
release-victoria-metrics: \
release-victoria-metrics-linux-386 \
release-victoria-metrics-linux-amd64 \
release-victoria-metrics-linux-arm \
release-victoria-metrics-linux-arm64 \
@@ -185,6 +188,10 @@ release-victoria-metrics: \
release-victoria-metrics-freebsd-amd64 \
release-victoria-metrics-openbsd-amd64
# adds i386 arch
release-victoria-metrics-linux-386:
GOOS=linux GOARCH=386 $(MAKE) release-victoria-metrics-goos-goarch
release-victoria-metrics-linux-amd64:
GOOS=linux GOARCH=amd64 $(MAKE) release-victoria-metrics-goos-goarch
@@ -216,6 +223,7 @@ release-victoria-metrics-goos-goarch: victoria-metrics-$(GOOS)-$(GOARCH)-prod
cd bin && rm -rf victoria-metrics-$(GOOS)-$(GOARCH)-prod
release-vmutils: \
release-vmutils-linux-386 \
release-vmutils-linux-amd64 \
release-vmutils-linux-arm64 \
release-vmutils-linux-arm \
@@ -225,6 +233,9 @@ release-vmutils: \
release-vmutils-openbsd-amd64 \
release-vmutils-windows-amd64
release-vmutils-linux-386:
GOOS=linux GOARCH=386 $(MAKE) release-vmutils-goos-goarch
release-vmutils-linux-amd64:
GOOS=linux GOARCH=amd64 $(MAKE) release-vmutils-goos-goarch
@@ -369,7 +380,7 @@ golangci-lint: install-golangci-lint
golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.50.1
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.51.2
govulncheck: install-govulncheck
govulncheck ./...

View File

@@ -2,6 +2,7 @@
[![Latest Release](https://img.shields.io/github/release/VictoriaMetrics/VictoriaMetrics.svg?style=flat-square)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
[![Docker Pulls](https://img.shields.io/docker/pulls/victoriametrics/victoria-metrics.svg?maxAge=604800)](https://hub.docker.com/r/victoriametrics/victoria-metrics)
[![victoriametrics](https://snapcraft.io/victoriametrics/badge.svg)](https://snapcraft.io/victoriametrics)
[![Slack](https://img.shields.io/badge/join%20slack-%23victoriametrics-brightgreen.svg)](https://slack.victoriametrics.com/)
[![GitHub license](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics.svg)](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/LICENSE)
[![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics)](https://goreportcard.com/report/github.com/VictoriaMetrics/VictoriaMetrics)
@@ -105,6 +106,7 @@ Case studies:
* [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch)
* [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern)
* [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl)
* [Dig Security](https://docs.victoriametrics.com/CaseStudies.html#dig-security)
* [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio)
* [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence)
* [Grammarly](https://docs.victoriametrics.com/CaseStudies.html#grammarly)
@@ -125,11 +127,22 @@ See also [articles and slides about VictoriaMetrics from our users](https://docs
## Operation
### How to start VictoriaMetrics
### Install
Just download [VictoriaMetrics executable](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) or [Docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) and start it with the desired command-line flags.
To quickly try VictoriaMetrics, just download [VictoriaMetrics executable](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) or [Docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) and start it with the desired command-line flags.
See also [QuickStart guide](https://docs.victoriametrics.com/Quick-Start.html) for additional information.
VictoriaMetrics can also be installed via these installation methods:
* [Helm charts for single-node and cluster versions of VictoriaMetrics](https://github.com/VictoriaMetrics/helm-charts).
* [Kubernetes operator for VictoriaMetrics](https://github.com/VictoriaMetrics/operator).
* [Ansible role for installing cluster VictoriaMetrics (by VictoriaMetrics)](https://github.com/VictoriaMetrics/ansible-playbooks).
* [Ansible role for installing cluster VictoriaMetrics (by community)](https://github.com/Slapper/ansible-victoriametrics-cluster-role).
* [Ansible role for installing single-node VictoriaMetrics (by community)](https://github.com/dreamteam-gg/ansible-victoriametrics-role).
* [Snap package for VictoriaMetrics](https://snapcraft.io/victoriametrics).
### How to start VictoriaMetrics
The following command-line flags are used the most:
* `-storageDataPath` - VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
@@ -997,7 +1010,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
Optional `max_rows_per_line` arg may be added to the request for limiting the maximum number of rows exported per each JSON line.
@@ -1046,7 +1059,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
The exported CSV data can be imported to VictoriaMetrics via [/api/v1/import/csv](#how-to-import-csv-data).
@@ -1074,7 +1087,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
The exported data can be imported to VictoriaMetrics via [/api/v1/import/native](#how-to-import-data-in-native-format).
@@ -1085,7 +1098,9 @@ The [deduplication](#deduplication) isn't applied for the data exported in nativ
## How to import time series data
Time series data can be imported into VictoriaMetrics via any supported data ingestion protocol:
VictoriaMetrics can discover and scrape metrics from Prometheus-compatible targets (aka "pull" protocol) -
see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
Additionally, VictoriaMetrics can accept metrics via the following popular data ingestion protocols (aka "push" protocols):
* [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). See [these docs](#prometheus-setup) for details.
* DataDog `submit metrics` API. See [these docs](#how-to-send-data-from-datadog-agent) for details.
@@ -1311,7 +1326,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
By default, the last point on the interval `[now - max_lookback ... now]` is scraped for each time series. The default value for `max_lookback` is `5m` (5 minutes), but it can be overridden with `max_lookback` query arg.
For instance, `/federate?match[]=up&max_lookback=1h` would return last points on the `[now - 1h ... now]` interval. This may be useful for time series federation
@@ -1354,6 +1369,7 @@ By default VictoriaMetrics is tuned for an optimal resource usage under typical
- `-search.maxSamplesPerQuery` limits the number of raw samples a single query can process. This allows limiting CPU usage for heavy queries.
- `-search.maxPointsPerTimeseries` limits the number of calculated points, which can be returned per each matching time series from [range query](https://docs.victoriametrics.com/keyConcepts.html#range-query).
- `-search.maxPointsSubqueryPerTimeseries` limits the number of calculated points, which can be generated per each matching time series during [subquery](https://docs.victoriametrics.com/MetricsQL.html#subqueries) evaluation.
- `-search.maxSeriesPerAggrFunc` limits the number of time series, which can be generated by [MetricsQL aggregate functions](https://docs.victoriametrics.com/MetricsQL.html#aggregate-functions) in a single query.
- `-search.maxSeries` limits the number of time series, which may be returned from [/api/v1/series](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers). This endpoint is used mostly by Grafana for auto-completion of metric names, label names and label values. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxSeries` to quite low value in order limit CPU and memory usage.
- `-search.maxTagKeys` limits the number of items, which may be returned from [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names). This endpoint is used mostly by Grafana for auto-completion of label names. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxTagKeys` to quite low value in order to limit CPU and memory usage.
- `-search.maxTagValues` limits the number of items, which may be returned from [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values). This endpoint is used mostly by Grafana for auto-completion of label values. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxTagValues` to quite low value in order to limit CPU and memory usage.
@@ -1614,7 +1630,9 @@ VictoriaMetrics provides the following security-related command-line flags:
Explicitly set internal network interface for TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats.
For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<internal_iface_ip>:2003`. This protects from unexpected requests from untrusted network interfaces.
VictoriaMetrics has achieved security certifications for Database Software Development and Software-Based Monitoring Services. We apply strict security measures in everything we do. See our [Security page](https://victoriametrics.com/security/) for more details.
See also [security recommendation for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#security)
and [the general security page at VictoriaMetrics website](https://victoriametrics.com/security/).
## Tuning
@@ -2033,17 +2051,10 @@ It is safe sharing the collected profiles from security point of view, since the
## Integrations
* [Helm charts for single-node and cluster versions of VictoriaMetrics](https://github.com/VictoriaMetrics/helm-charts).
* [Kubernetes operator for VictoriaMetrics](https://github.com/VictoriaMetrics/operator).
* [netdata](https://github.com/netdata/netdata) can push data into VictoriaMetrics via `Prometheus remote_write API`.
See [these docs](https://github.com/netdata/netdata#integrations).
* [go-graphite/carbonapi](https://github.com/go-graphite/carbonapi) can use VictoriaMetrics as time series backend.
See [this example](https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/carbonapi.example.victoriametrics.yaml).
* [Ansible role for installing cluster VictoriaMetrics (by VictoriaMetrics)](https://github.com/VictoriaMetrics/ansible-playbooks).
* [Ansible role for installing cluster VictoriaMetrics (by community)](https://github.com/Slapper/ansible-victoriametrics-cluster-role).
* [Ansible role for installing single-node VictoriaMetrics (by community)](https://github.com/dreamteam-gg/ansible-victoriametrics-role).
* [Snap package for VictoriaMetrics](https://snapcraft.io/victoriametrics).
* [netdata](https://github.com/netdata/netdata) can push data into VictoriaMetrics via `Prometheus remote_write API`.
See [these docs](https://github.com/netdata/netdata#integrations).
* [vmalert-cli](https://github.com/aorfanos/vmalert-cli) - a CLI application for managing [vmalert](https://docs.victoriametrics.com/vmalert.html).
## Third-party contributions
@@ -2170,7 +2181,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -graphiteListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
@@ -2190,7 +2203,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8428")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 104857600)
@@ -2203,7 +2218,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-influxDBLabel string
Default label for the DB name sent over '?db={db_name}' query parameter (default "db")
-influxListenAddr string
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write . See also -influxListenAddr.useProxyProtocol
-influxListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -influxListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
-influxSkipMeasurement
@@ -2215,7 +2232,13 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-inmemoryDataFlushInterval duration
The interval for guaranteed saving of in-memory data to disk. The saved data survives unclean shutdown such as OOM crash, hardware reset, SIGKILL, etc. Bigger intervals may help increasing lifetime of flash storage with limited write cycles (e.g. Raspberry PI). Smaller intervals increase disk IO load. Minimum supported value is 1s (default 5s)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringCacheExpireDuration duration
The expire duration for caches for interned strings. See https://en.wikipedia.org/wiki/String_interning . See also -internStringMaxLen and -internStringDisableCache (default 6m0s)
-internStringDisableCache
Whether to disable caches for interned strings. This may reduce memory usage at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringCacheExpireDuration and -internStringMaxLen
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringDisableCache and -internStringCacheExpireDuration (default 500)
-logNewSeries
Whether to log new series. This option is for debug purposes only. It can lead to performance issues when big number of new series are ingested into VictoriaMetrics
-loggerDisableTimestamps
@@ -2251,9 +2274,13 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
TCP and UDP address to listen for OpenTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
-opentsdbListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
@@ -2321,6 +2348,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
How frequently to reload the full state from Kubernetes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs for details (default 30s)
-promscrape.kumaSDCheckInterval duration
Interval for checking for changes in kuma service discovery. This works only if kuma_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxResponseHeadersSize size
@@ -2332,18 +2361,18 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.minResponseSizeForStreamParse size
The minimum target response size for automatic switching to stream parsing mode, which can reduce memory usage. See https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1000000)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.nomad.waitTime duration
Wait time used by Nomad service discovery. Default value is used if not set
-promscrape.nomadSDCheckInterval duration
Interval for checking for changes in Nomad. This works only if nomad_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#nomad_sd_configs for details (default 30s)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#openstack_sd_configs for details (default 30s)
-promscrape.seriesLimitPerTarget int
Optional limit on the number of unique time series a single scrape target can expose. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more info
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is possible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
@@ -2382,8 +2411,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The interval between datapoints stored in the database. It is used at Graphite Render API handler for normalizing the interval between datapoints in case it isn't normalized. It can be overridden by sending 'storage_step' query arg to /render API or by sending the desired interval via 'Storage-Step' http header during querying /render API. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default 10s)
-search.latencyOffset duration
The time when data points become visible in query results after the collection. It can be overridden on per-query basis via latency_offset arg. Too small value can result in incomplete last points for query results (default 30s)
-search.logQueryMemoryUsage size
Log queries, which require more memory than specified by this flag. This may help detecting and optimizing heavy queries. Query logging is disabled by default. See also -search.logSlowQueryDuration and -search.maxMemoryPerQuery
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
Log queries with execution time exceeding this value. Zero disables slow query logging. See also -search.logQueryMemoryUsage (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. See also -search.maxQueueDuration and -search.maxMemoryPerQuery (default 8)
-search.maxExportDuration duration
@@ -2397,7 +2429,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-search.maxLookback duration
Synonym to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxMemoryPerQuery size
The maximum amounts of memory a single query may consume. Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests
The maximum amounts of memory a single query may consume. Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests . See also -search.logQueryMemoryUsage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as VMUI or Grafana. There is no sense in setting this limit to values bigger than the horizontal resolution of the graph (default 30000)
@@ -2416,6 +2448,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage (default 30000000)
-search.maxSeries int
The maximum number of time series, which can be returned from /api/v1/series. This option allows limiting memory usage (default 30000)
-search.maxSeriesPerAggrFunc int
The maximum number of time series an aggregate MetricsQL function can generate (default 1000000)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.setLookbackToStep' flag
-search.maxStatusRequestDuration duration
@@ -2456,6 +2490,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-snapshotCreateTimeout duration
The timeout for creating new snapshot. If set, make sure that timeout is lower than backup period
-snapshotsMaxAge value
Automatically delete snapshots older than -snapshotsMaxAge if it is set to non-zero duration. Make sure that backup process has enough time to finish the backup before the corresponding snapshot is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 0)
@@ -2483,9 +2519,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-streamAggr.config string
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html .See also -remoteWrite.streamAggr.keepInput
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html . See also -remoteWrite.streamAggr.keepInput and -streamAggr.dedupInterval
-streamAggr.dedupInterval duration
Input samples are de-duplicated with this interval before being aggregated. Only the last sample per each time series per each interval is aggregated if the interval is greater than zero
-streamAggr.keepInput
Whether to keep input samples after the aggregation with -streamAggr.config .By default the input is dropped after the aggregation, so only the aggregate data is stored. See https://docs.victoriametrics.com/stream-aggregation.html
Whether to keep input samples after the aggregation with -streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is stored. See https://docs.victoriametrics.com/stream-aggregation.html
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -82,6 +82,9 @@ victoria-metrics-linux-arm64:
victoria-metrics-linux-ppc64le:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
victoria-metrics-linux-s390x:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
victoria-metrics-linux-386:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -24,7 +24,10 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the last sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication and https://docs.victoriametrics.com/#downsampling")
dryRun = flag.Bool("dryRun", false, "Whether to check only -promscrape.config and then exit. "+
@@ -51,7 +54,7 @@ func main() {
if err := promscrape.CheckConfig(); err != nil {
logger.Fatalf("error when checking -promscrape.config: %s", err)
}
logger.Infof("-promscrape.config is ok; exitting with 0 status code")
logger.Infof("-promscrape.config is ok; exiting with 0 status code")
return
}
@@ -64,7 +67,7 @@ func main() {
vminsert.Init()
startSelfScraper()
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
logger.Infof("started VictoriaMetrics in %.3f seconds", time.Since(startTime).Seconds())
sig := procutil.WaitForSigterm()
@@ -90,7 +93,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != "GET" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")

View File

@@ -129,10 +129,10 @@ func setUp() {
storagePath = filepath.Join(os.TempDir(), testStorageSuffix)
processFlags()
logger.Init()
vmstorage.InitWithoutMetrics(promql.ResetRollupResultCacheIfNeeded)
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
vmselect.Init()
vminsert.Init()
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, false, requestHandler)
readyStorageCheckFunc := func() bool {
resp, err := http.Get(testHealthHTTPPath)
if err != nil {
@@ -189,10 +189,8 @@ func tearDown() {
func TestWriteRead(t *testing.T) {
t.Run("write", testWrite)
vmstorage.Storage.DebugFlush()
time.Sleep(1 * time.Second)
vmstorage.Stop()
// open storage after stop in write
vmstorage.InitWithoutMetrics(promql.ResetRollupResultCacheIfNeeded)
t.Run("read", testRead)
}

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -85,6 +85,9 @@ vmagent-linux-arm64:
vmagent-linux-ppc64le:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmagent-linux-s390x:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmagent-linux-386:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -25,7 +25,9 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
* Can accept data via all the ingestion protocols supported by VictoriaMetrics - see [these docs](#how-to-push-data-to-vmagent).
* Can aggregate incoming samples by time and by labels before sending them to remote storage - see [these docs](https://docs.victoriametrics.com/stream-aggregation.html).
* Can replicate collected metrics simultaneously to multiple remote storage systems - see [these docs](#replication-and-high-availability).
* Can replicate collected metrics simultaneously to multiple Prometheus-compatible remote storage systems - see [these docs](#replication-and-high-availability).
* Can save egress network bandwidth usage costs when [VictoriaMetrics remote write protocol](#victoriametrics-remote-write-protocol)
is used for sending the data to VictoriaMetrics.
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
@@ -45,14 +47,15 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
Please download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) (
`vmagent` is also available in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags)),
unpack it and pass the following flags to the `vmagent` binary in order to start scraping Prometheus-compatible targets:
unpack it and pass the following flags to the `vmagent` binary in order to start scraping Prometheus-compatible targets
and sending the data to the Prometheus-compatible remote storage:
* `-promscrape.config` with the path to Prometheus config file (usually located at `/etc/prometheus/prometheus.yml`).
* `-promscrape.config` with the path to [Prometheus config file](https://docs.victoriametrics.com/sd_configs.html) (usually located at `/etc/prometheus/prometheus.yml`).
The path can point either to local file or to http url. `vmagent` doesn't support some sections of Prometheus config file,
so you may need either to delete these sections or to run `vmagent` with `-promscrape.config.strictParse=false` command-line flag.
In this case `vmagent` ignores unsupported sections. See [the list of unsupported sections](#unsupported-prometheus-config-sections).
* `-remoteWrite.url` with the remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified
multiple times to replicate data concurrently to an arbitrary number of remote storage systems. See [various use cases](#use-cases).
* `-remoteWrite.url` with Prometheus-compatible remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified
multiple times to replicate data concurrently to multiple remote storage systems. See [various use cases](#use-cases).
Example command line:
@@ -60,8 +63,6 @@ Example command line:
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
Example of scrape configuration for `-promscrape.config` argument you can find [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/prometheus.yml)
See [how to scrape Prometheus-compatible targets](#how-to-collect-metrics-in-prometheus-format) for more details.
If you use single-node VictoriaMetrics, then you can discover and scrape Prometheus-compatible targets directly from VictoriaMetrics
@@ -75,6 +76,8 @@ and sending it to the provided `-remoteWrite.url`:
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
`vmagent` can save network bandwidth usage costs under high load when [VictoriaMetrics remote write protocol is enabled](#victoriametrics-remote-write-protocol).
See [troubleshooting docs](#troubleshooting) if you encounter common issues with `vmagent`.
Pass `-help` to `vmagent` in order to see [the full list of supported command-line flags with their descriptions](#advanced-usage).
@@ -120,7 +123,8 @@ data to the remote storage. It re-tries sending the data to remote storage until
The maximum on-disk size for the buffered metrics can be limited with `-remoteWrite.maxDiskUsagePerURL`.
`vmagent` works on various architectures from the IoT world - 32-bit arm, 64-bit arm, ppc64, 386, amd64.
See [the corresponding Makefile rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmagent/Makefile) for details.
The `vmagent` can save network bandwidth usage costs by using [VictoriaMetrics remote write protocol](#victoriametrics-remote-write-protocol).
### Drop-in replacement for Prometheus
@@ -175,6 +179,32 @@ the `-remoteWrite.url` command-line flag should be configured as `<schema>://<vm
according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
There is also support for multitenant writes. See [these docs](#multitenancy).
## VictoriaMetrics remote write protocol
`vmagent` supports sending data to the configured `-remoteWrite.url` either via Prometheus remote write protocol
or via VictoriaMetrics remote write protocol.
VictoriaMetrics remote write protocol provides the following benefits comparing to Prometheus remote write protocol:
- Reduced network bandwidth usage by 2x-5x. This allows saving network bandwidth usage costs when `vmagent` and
the configured remote storage systems are located in different datacenters, availability zones or regions.
- Reduced disk read/write IO and disk space usage at `vmagent` when the remote storage is temporarily unavailable.
In this case `vmagent` buffers the incoming data to disk using the VictoriaMetrics remote write format.
This reduces disk read/write IO and disk space usage by 2x-5x comparing to Prometheus remote write format.
`vmagent` automatically switches to VictoriaMetrics remote write protocol when it sends data to VictoriaMetrics components such as other `vmagent` instances,
[single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html)
or `vminsert` at [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
It is possible to force switch to VictoriaMetrics remote write protocol by specifying `-remoteWrite.forceVMProto`
command-line flag for the corresponding `-remoteWrite.url`.
It is possible to tune the compression level for VictoriaMetrics remote write protocol with `-remoteWrite.vmProtoCompressLevel` command-line flag.
Bigger values reduce network usage at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of higher network usage.
`vmagent` automatically switches to Prometheus remote write protocol when it sends data to old versions of VictoriaMetrics components
or to other Prometheus-compatible remote storage systems. It is possible to force switch to Prometheus remote write protocol
by specifying `-remoteWrite.forcePromProto` command-line flag for the corresponding `-remoteWrite.url`.
## Multitenancy
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`.
@@ -324,7 +354,7 @@ Extra labels can be added to metrics collected by `vmagent` via the following me
## Automatically generated metrics
`vmagent` automatically generates the following metrics per each scrape of every [Prometheus-compatible target](#how-to-collect-metrics-in-prometheus-format)
and attaches target-specific `instance` and `job` labels to these metrics:
and attaches `instance`, `job` and other target-specific labels to these metrics:
* `up` - this metric exposes `1` value on successful scrape and `0` value on unsuccessful scrape. This allows monitoring
failing scrapes with the following [MetricsQL query](https://docs.victoriametrics.com/MetricsQL.html):
@@ -763,7 +793,8 @@ scrape_configs:
## Cardinality limiter
By default `vmagent` doesn't limit the number of time series each scrape target can expose. The limit can be enforced in the following places:
By default `vmagent` doesn't limit the number of time series each scrape target can expose.
The limit can be enforced in the following places:
* Via `-promscrape.seriesLimitPerTarget` command-line option. This limit is applied individually
to all the scrape targets defined in the file pointed by `-promscrape.config`.
@@ -774,10 +805,7 @@ By default `vmagent` doesn't limit the number of time series each scrape target
via [Kubernetes annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) for targets,
which may expose too high number of time series.
See also `sample_limit` option at [scrape_config section](https://docs.victoriametrics.com/sd_configs.html#scrape_configs).
Scraped metrics are dropped for time series exceeding the given limit.
Scraped metrics are dropped for time series exceeding the given limit on the time window of 24h.
`vmagent` creates the following additional per-target metrics for targets with non-zero series limit:
- `scrape_series_limit_samples_dropped` - the number of dropped samples during the scrape when the unique series limit is exceeded.
@@ -791,6 +819,7 @@ These metrics allow building the following alerting rules:
- `scrape_series_current / scrape_series_limit > 0.9` - alerts when the number of series exposed by the target reaches 90% of the limit.
- `sum_over_time(scrape_series_limit_samples_dropped[1h]) > 0` - alerts when some samples are dropped because the series limit on a particular target is reached.
See also `sample_limit` option at [scrape_config section](https://docs.victoriametrics.com/sd_configs.html#scrape_configs).
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`.
The limit can be enforced by setting the following command-line flags:
@@ -881,7 +910,7 @@ If you have suggestions for improvements or have found a bug - please open an is
The number of dropped blocks can be monitored via `vmagent_remotewrite_packets_dropped_total` metric exported at [/metrics page](#monitoring).
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling).
Such storage systems include Prometheus, Cortex and Thanos, which typically emit `out of order sample` errors.
Such storage systems include Prometheus, Mimir, Cortex and Thanos, which typically emit `out of order sample` errors.
The best solution is to use remote storage with [backfilling support](https://docs.victoriametrics.com/#backfilling) such as VictoriaMetrics.
* `vmagent` buffers scraped data at the `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
@@ -1162,7 +1191,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -graphiteListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
@@ -1182,7 +1213,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -httpListenAddr.useProxyProtocol (default ":8429")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 104857600)
@@ -1195,7 +1228,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-influxDBLabel string
Default label for the DB name sent over '?db={db_name}' query parameter (default "db")
-influxListenAddr string
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write . See also -influxListenAddr.useProxyProtocol
-influxListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -influxListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
-influxSkipMeasurement
@@ -1205,7 +1240,13 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-influxTrimTimestamp duration
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringCacheExpireDuration duration
The expire duration for caches for interned strings. See https://en.wikipedia.org/wiki/String_interning . See also -internStringMaxLen and -internStringDisableCache (default 6m0s)
-internStringDisableCache
Whether to disable caches for interned strings. This may reduce memory usage at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringCacheExpireDuration and -internStringMaxLen
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringDisableCache and -internStringCacheExpireDuration (default 500)
-kafka.consumer.topic array
Kafka topic names for data consumption. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
Supports an array of values separated by comma or specified via multiple flags.
@@ -1261,9 +1302,13 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
TCP and UDP address to listen for OpenTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
-opentsdbListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
@@ -1329,6 +1374,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
How frequently to reload the full state from Kubernetes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs for details (default 30s)
-promscrape.kumaSDCheckInterval duration
Interval for checking for changes in kuma service discovery. This works only if kuma_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxResponseHeadersSize size
@@ -1340,18 +1387,18 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-promscrape.minResponseSizeForStreamParse size
The minimum target response size for automatic switching to stream parsing mode, which can reduce memory usage. See https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1000000)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.nomad.waitTime duration
Wait time used by Nomad service discovery. Default value is used if not set
-promscrape.nomadSDCheckInterval duration
Interval for checking for changes in Nomad. This works only if nomad_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#nomad_sd_configs for details (default 30s)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#openstack_sd_configs for details (default 30s)
-promscrape.seriesLimitPerTarget int
Optional limit on the number of unique time series a single scrape target can expose. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more info
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is possible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
@@ -1409,6 +1456,12 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.flushInterval duration
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
-remoteWrite.forcePromProto array
Whether to force Prometheus remote write protocol for sending data to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.forceVMProto array
Whether to force VictoriaMetrics remote write protocol for sending data to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.headers array
Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'
Supports an array of values separated by comma or specified via multiple flags.
@@ -1468,8 +1521,11 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
The number of significant figures to leave in metric values before writing them to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.config array
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html . See also -remoteWrite.streamAggr.keepInput
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html . See also -remoteWrite.streamAggr.keepInput and -remoteWrite.streamAggr.dedupInterval
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.dedupInterval array
Input samples are de-duplicated with this interval before being aggregated. Only the last sample per each time series per each interval is aggregated if the interval is greater than zero
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.keepInput array
Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. See https://docs.victoriametrics.com/stream-aggregation.html
Supports array of values separated by comma or specified via multiple flags.
@@ -1491,11 +1547,13 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.tmpDataPath string
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
-remoteWrite.url array
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelConfig array
Optional path to relabel configs for the corresponding -remoteWrite.url. See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.vmProtoCompressLevel int
The compression level for VictoriaMetrics remote write protocol. Higher values reduce network traffic at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of increased network traffic. See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit
-tls

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -25,7 +26,7 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req, func(rows []parser.Row) error {
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,7 +29,7 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
return err
}
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return stream.Parse(req.Body, ce, func(series []parser.Series) error {
return insertRows(at, series, extraLabels)
})
}

View File

@@ -7,6 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -19,7 +20,7 @@ var (
//
// See https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol
func InsertHandler(r io.Reader) error {
return parser.ParseStream(r, insertRows)
return stream.Parse(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -15,6 +15,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -36,7 +37,7 @@ var (
//
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return parser.ParseStream(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return stream.Parse(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
}
@@ -54,7 +55,7 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return stream.Parse(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(at, db, rows, extraLabels)
})
}

View File

@@ -44,16 +44,30 @@ import (
var (
httpListenAddr = flag.String("httpListenAddr", ":8429", "TCP address to listen for http connections. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write . "+
"See also -influxListenAddr.useProxyProtocol")
influxUseProxyProtocol = flag.Bool("influxListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -influxListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. "+
"See also -graphiteListenAddr.useProxyProtocol")
graphiteUseProxyProtocol = flag.Bool("graphiteListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -graphiteListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpenTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty")
opentsdbHTTPListenAddr = flag.String("opentsdbHTTPListenAddr", "", "TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmagent. The following files are checked: "+
"Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol")
opentsdbUseProxyProtocol = flag.Bool("opentsdbListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbHTTPListenAddr = flag.String("opentsdbHTTPListenAddr", "", "TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. "+
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig . "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
)
@@ -85,7 +99,7 @@ func main() {
if err := promscrape.CheckConfig(); err != nil {
logger.Fatalf("error when checking -promscrape.config: %s", err)
}
logger.Infof("-promscrape.config is ok; exitting with 0 status code")
logger.Infof("-promscrape.config is ok; exiting with 0 status code")
return
}
if *dryRun {
@@ -95,7 +109,7 @@ func main() {
if err := promscrape.CheckConfig(); err != nil {
logger.Fatalf("error when checking -promscrape.config: %s", err)
}
logger.Infof("all the configs are ok; exitting with 0 status code")
logger.Infof("all the configs are ok; exiting with 0 status code")
return
}
@@ -104,26 +118,26 @@ func main() {
remotewrite.Init()
common.StartUnmarshalWorkers()
if len(*influxListenAddr) > 0 {
influxServer = influxserver.MustStart(*influxListenAddr, func(r io.Reader) error {
influxServer = influxserver.MustStart(*influxListenAddr, *influxUseProxyProtocol, func(r io.Reader) error {
return influx.InsertHandlerForReader(r, false)
})
}
if len(*graphiteListenAddr) > 0 {
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, graphite.InsertHandler)
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, *graphiteUseProxyProtocol, graphite.InsertHandler)
}
if len(*opentsdbListenAddr) > 0 {
httpInsertHandler := getOpenTSDBHTTPInsertHandler()
opentsdbServer = opentsdbserver.MustStart(*opentsdbListenAddr, opentsdb.InsertHandler, httpInsertHandler)
opentsdbServer = opentsdbserver.MustStart(*opentsdbListenAddr, *opentsdbUseProxyProtocol, opentsdb.InsertHandler, httpInsertHandler)
}
if len(*opentsdbHTTPListenAddr) > 0 {
httpInsertHandler := getOpenTSDBHTTPInsertHandler()
opentsdbhttpServer = opentsdbhttpserver.MustStart(*opentsdbHTTPListenAddr, httpInsertHandler)
opentsdbhttpServer = opentsdbhttpserver.MustStart(*opentsdbHTTPListenAddr, *opentsdbHTTPUseProxyProtocol, httpInsertHandler)
}
promscrape.Init(remotewrite.Push)
if len(*httpListenAddr) > 0 {
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
}
logger.Infof("started vmagent in %.3f seconds", time.Since(startTime).Seconds())
@@ -195,7 +209,7 @@ func getAuthTokenFromPath(path string) (*auth.Token, error) {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != "GET" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")
@@ -223,7 +237,14 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
statusCode := http.StatusNoContent
if strings.HasPrefix(path, "/prometheus/api/v1/import/prometheus/metrics/job/") ||
strings.HasPrefix(path, "/api/v1/import/prometheus/metrics/job/") {
// Return 200 status code for pushgateway requests.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
statusCode = http.StatusOK
}
w.WriteHeader(statusCode)
return true
}
if strings.HasPrefix(path, "datadog/") {
@@ -233,6 +254,9 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
switch path {
case "/prometheus/api/v1/write", "/api/v1/write":
if common.HandleVMProtoServerHandshake(w, r) {
return true
}
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(nil, r); err != nil {
prometheusWriteErrors.Inc()

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -10,7 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -30,12 +30,12 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzip := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return stream.Parse(req.Body, isGzip, func(block *stream.Block) error {
return insertRows(at, block, extraLabels)
})
}
func insertRows(at *auth.Token, block *parser.Block, extraLabels []prompbmarshal.Label) error {
func insertRows(at *auth.Token, block *stream.Block, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)

View File

@@ -7,6 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -19,7 +20,7 @@ var (
//
// See http://opentsdb.net/docs/build/html/api_telnet/put.html
func InsertHandler(r io.Reader) error {
return parser.ParseStream(r, insertRows)
return stream.Parse(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -24,7 +25,7 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req, func(rows []parser.Row) error {
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -1,15 +1,16 @@
package prometheusimport
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,16 +32,11 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return stream.Parse(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
}, nil)
}
// InsertHandlerForReader processes metrics from given reader with optional gzip format
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return parser.ParseStream(r, 0, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
}, nil)
}, func(s string) {
httpserver.LogError(req, s)
})
}
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {

View File

@@ -0,0 +1,60 @@
package prometheusimport
import (
"bytes"
"flag"
"log"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
)
var (
srv *httptest.Server
testOutput *bytes.Buffer
)
func TestInsertHandler(t *testing.T) {
setUp()
defer tearDown()
req := httptest.NewRequest(http.MethodPost, "/insert/0/api/v1/import/prometheus", bytes.NewBufferString(`{"foo":"bar"}
go_memstats_alloc_bytes_total 1`))
if err := InsertHandler(nil, req); err != nil {
t.Errorf("unxepected error %s", err)
}
expectedMsg := "cannot unmarshal Prometheus line"
if !strings.Contains(testOutput.String(), expectedMsg) {
t.Errorf("output %q should contain %q", testOutput.String(), expectedMsg)
}
}
func setUp() {
srv = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(204)
}))
flag.Parse()
remoteWriteFlag := "remoteWrite.url"
if err := flag.Lookup(remoteWriteFlag).Value.Set(srv.URL); err != nil {
log.Fatalf("unable to set %q with value %q, err: %v", remoteWriteFlag, srv.URL, err)
}
logger.Init()
common.StartUnmarshalWorkers()
remotewrite.Init()
testOutput = &bytes.Buffer{}
logger.SetOutputForTests(testOutput)
}
func tearDown() {
common.StopUnmarshalWorkers()
srv.Close()
logger.ResetOutputForTest()
tmpDataDir := flag.Lookup("remoteWrite.tmpDataPath").Value.String()
fs.MustRemoveAll(tmpDataDir)
}

View File

@@ -1,7 +1,6 @@
package promremotewrite
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
@@ -11,7 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,18 +27,12 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
isVMRemoteWrite := req.Header.Get("Content-Encoding") == "zstd"
return stream.Parse(req.Body, isVMRemoteWrite, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
}
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(at *auth.Token, r io.Reader) error {
return parser.ParseStream(r, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, nil)
})
}
func insertRows(at *auth.Token, timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)

View File

@@ -15,11 +15,17 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/metrics"
)
var (
forcePromProto = flagutil.NewArrayBool("remoteWrite.forcePromProto", "Whether to force Prometheus remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
forceVMProto = flagutil.NewArrayBool("remoteWrite.forceVMProto", "Whether to force VictoriaMetrics remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
rateLimit = flagutil.NewArrayInt("remoteWrite.rateLimit", "Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. "+
"By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"is sent after temporary unavailability of the remote storage")
@@ -69,8 +75,12 @@ var (
type client struct {
sanitizedURL string
remoteWriteURL string
fq *persistentqueue.FastQueue
hc *http.Client
// Whether to use VictoriaMetrics remote write protocol for sending the data to remoteWriteURL
useVMProto bool
fq *persistentqueue.FastQueue
hc *http.Client
sendBlock func(block []byte) bool
authCfg *promauth.Config
@@ -122,19 +132,39 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
}
tr.Proxy = http.ProxyURL(pu)
}
hc := &http.Client{
Transport: tr,
Timeout: sendTimeout.GetOptionalArgOrDefault(argIdx, time.Minute),
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authCfg: authCfg,
awsCfg: awsCfg,
fq: fq,
hc: &http.Client{
Transport: tr,
Timeout: sendTimeout.GetOptionalArgOrDefault(argIdx, time.Minute),
},
stopCh: make(chan struct{}),
hc: hc,
stopCh: make(chan struct{}),
}
c.sendBlock = c.sendBlockHTTP
useVMProto := forceVMProto.GetOptionalArg(argIdx)
usePromProto := forcePromProto.GetOptionalArg(argIdx)
if useVMProto && usePromProto {
logger.Fatalf("-remoteWrite.useVMProto and -remoteWrite.usePromProto cannot be set simultaneously for -remoteWrite.url=%s", sanitizedURL)
}
if !useVMProto && !usePromProto {
// Auto-detect whether the remote storage supports VictoriaMetrics remote write protocol.
doRequest := func(url string) (*http.Response, error) {
return c.doRequest(url, nil)
}
useVMProto = common.HandleVMProtoClientHandshake(c.remoteWriteURL, doRequest)
if !useVMProto {
logger.Infof("the remote storage at %q doesn't support VictoriaMetrics remote write protocol. Switching to Prometheus remote write protocol. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol", sanitizedURL)
}
}
c.useVMProto = useVMProto
return c
}
@@ -291,38 +321,45 @@ func (c *client) runWorker() {
}
}
// sendBlockHTTP returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
retriesCount := 0
c.bytesSent.Add(len(block))
c.blocksSent.Inc()
sigv4Hash := ""
if c.awsCfg != nil {
sigv4Hash = awsapi.HashHex(block)
}
again:
req, err := http.NewRequest("POST", c.remoteWriteURL, bytes.NewBuffer(block))
func (c *client) doRequest(url string, body []byte) (*http.Response, error) {
reqBody := bytes.NewBuffer(body)
req, err := http.NewRequest(http.MethodPost, url, reqBody)
if err != nil {
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", c.sanitizedURL, err)
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", url, err)
}
c.authCfg.SetHeaders(req, true)
h := req.Header
h.Set("User-Agent", "vmagent")
h.Set("Content-Type", "application/x-protobuf")
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
if c.useVMProto {
h.Set("Content-Encoding", "zstd")
h.Set("X-VictoriaMetrics-Remote-Write-Version", "1")
} else {
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
}
if c.awsCfg != nil {
sigv4Hash := awsapi.HashHex(body)
if err := c.awsCfg.SignRequest(req, sigv4Hash); err != nil {
// there is no need in retry, request will be rejected by client.Do and retried by code below
logger.Warnf("cannot sign remoteWrite request with AWS sigv4: %s", err)
}
}
return c.hc.Do(req)
}
// sendBlockHTTP sends the given block to c.remoteWriteURL.
//
// The function returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
retriesCount := 0
again:
startTime := time.Now()
resp, err := c.hc.Do(req)
resp, err := c.doRequest(c.remoteWriteURL, block)
c.requestDuration.UpdateDuration(startTime)
if err != nil {
c.errorsCount.Inc()
@@ -347,6 +384,8 @@ again:
if statusCode/100 == 2 {
_ = resp.Body.Close()
c.requestsOKCount.Inc()
c.bytesSent.Add(len(block))
c.blocksSent.Inc()
return true
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -23,6 +24,9 @@ var (
"This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url")
maxUnpackedBlockSize = flagutil.NewBytes("remoteWrite.maxBlockSize", 8*1024*1024, "The maximum block size to send to remote storage. Bigger blocks may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxRowsPerBlock")
maxRowsPerBlock = flag.Int("remoteWrite.maxRowsPerBlock", 10000, "The maximum number of samples to send in each block to remote storage. Higher number may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxBlockSize")
vmProtoCompressLevel = flag.Int("remoteWrite.vmProtoCompressLevel", 0, "The compression level for VictoriaMetrics remote write protocol. "+
"Higher values reduce network traffic at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of increased network traffic. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
)
type pendingSeries struct {
@@ -33,9 +37,10 @@ type pendingSeries struct {
periodicFlusherWG sync.WaitGroup
}
func newPendingSeries(pushBlock func(block []byte), significantFigures, roundDigits int) *pendingSeries {
func newPendingSeries(pushBlock func(block []byte), isVMRemoteWrite bool, significantFigures, roundDigits int) *pendingSeries {
var ps pendingSeries
ps.wr.pushBlock = pushBlock
ps.wr.isVMRemoteWrite = isVMRemoteWrite
ps.wr.significantFigures = significantFigures
ps.wr.roundDigits = roundDigits
ps.stopCh = make(chan struct{})
@@ -88,6 +93,9 @@ type writeRequest struct {
// pushBlock is called when whe write request is ready to be sent.
pushBlock func(block []byte)
// Whether to encode the write request with VictoriaMetrics remote write protocol.
isVMRemoteWrite bool
// How many significant figures must be left before sending the writeRequest to pushBlock.
significantFigures int
@@ -104,7 +112,7 @@ type writeRequest struct {
}
func (wr *writeRequest) reset() {
// Do not reset pushBlock, significantFigures and roundDigits, since they are re-used.
// Do not reset lastFlushTime, pushBlock, isVMRemoteWrite, significantFigures and roundDigits, since they are re-used.
wr.wr.Timeseries = nil
@@ -126,7 +134,7 @@ func (wr *writeRequest) flush() {
wr.wr.Timeseries = wr.tss
wr.adjustSampleValues()
atomic.StoreUint64(&wr.lastFlushTime, fasttime.UnixTimestamp())
pushWriteRequest(&wr.wr, wr.pushBlock)
pushWriteRequest(&wr.wr, wr.pushBlock, wr.isVMRemoteWrite)
wr.reset()
}
@@ -188,7 +196,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
wr.buf = buf
}
func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byte)) {
func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byte), isVMRemoteWrite bool) {
if len(wr.Timeseries) == 0 {
// Nothing to push
return
@@ -197,7 +205,11 @@ func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byt
bb.B = prompbmarshal.MarshalWriteRequest(bb.B[:0], wr)
if len(bb.B) <= maxUnpackedBlockSize.IntN() {
zb := snappyBufPool.Get()
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
if isVMRemoteWrite {
zb.B = zstd.CompressLevel(zb.B[:0], bb.B, *vmProtoCompressLevel)
} else {
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
}
writeRequestBufPool.Put(bb)
if len(zb.B) <= persistentqueue.MaxBlockSize {
pushBlock(zb.B)
@@ -221,18 +233,18 @@ func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byt
}
n := len(samples) / 2
wr.Timeseries[0].Samples = samples[:n]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries[0].Samples = samples[n:]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries[0].Samples = samples
return
}
timeseries := wr.Timeseries
n := len(timeseries) / 2
wr.Timeseries = timeseries[:n]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries = timeseries[n:]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries = timeseries
}

View File

@@ -2,40 +2,48 @@ package remotewrite
import (
"fmt"
"math"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/golang/snappy"
)
func TestPushWriteRequest(t *testing.T) {
for _, rowsCount := range []int{1, 10, 100, 1e3, 1e4} {
rowsCounts := []int{1, 10, 100, 1e3, 1e4}
expectedBlockLensProm := []int{216, 1848, 16424, 169882, 1757876}
expectedBlockLensVM := []int{138, 492, 3927, 34995, 288476}
for i, rowsCount := range rowsCounts {
expectedBlockLenProm := expectedBlockLensProm[i]
expectedBlockLenVM := expectedBlockLensVM[i]
t.Run(fmt.Sprintf("%d", rowsCount), func(t *testing.T) {
testPushWriteRequest(t, rowsCount)
testPushWriteRequest(t, rowsCount, expectedBlockLenProm, expectedBlockLenVM)
})
}
}
func testPushWriteRequest(t *testing.T, rowsCount int) {
wr := newTestWriteRequest(rowsCount, 10)
pushBlockLen := 0
pushBlock := func(block []byte) {
if pushBlockLen > 0 {
panic(fmt.Errorf("BUG: pushBlock called multiple times; pushBlockLen=%d at first call, len(block)=%d at second call", pushBlockLen, len(block)))
func testPushWriteRequest(t *testing.T, rowsCount, expectedBlockLenProm, expectedBlockLenVM int) {
f := func(isVMRemoteWrite bool, expectedBlockLen int, tolerancePrc float64) {
t.Helper()
wr := newTestWriteRequest(rowsCount, 20)
pushBlockLen := 0
pushBlock := func(block []byte) {
if pushBlockLen > 0 {
panic(fmt.Errorf("BUG: pushBlock called multiple times; pushBlockLen=%d at first call, len(block)=%d at second call", pushBlockLen, len(block)))
}
pushBlockLen = len(block)
}
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
if math.Abs(float64(pushBlockLen-expectedBlockLen)/float64(expectedBlockLen)*100) > tolerancePrc {
t.Fatalf("unexpected block len for rowsCount=%d, isVMRemoteWrite=%v; got %d bytes; expecting %d bytes +- %.0f%%",
rowsCount, isVMRemoteWrite, pushBlockLen, expectedBlockLen, tolerancePrc)
}
pushBlockLen = len(block)
}
pushWriteRequest(wr, pushBlock)
b := prompbmarshal.MarshalWriteRequest(nil, wr)
zb := snappy.Encode(nil, b)
maxPushBlockLen := len(zb)
minPushBlockLen := maxPushBlockLen / 2
if pushBlockLen < minPushBlockLen {
t.Fatalf("unexpected block len after pushWriteRequest; got %d bytes; must be at least %d bytes", pushBlockLen, minPushBlockLen)
}
if pushBlockLen > maxPushBlockLen {
t.Fatalf("unexpected block len after pushWriteRequest; got %d bytes; must be smaller or equal to %d bytes", pushBlockLen, maxPushBlockLen)
}
// Check Prometheus remote write
f(false, expectedBlockLenProm, 0)
// Check VictoriaMetrics remote write
f(true, expectedBlockLenVM, 15)
}
func newTestWriteRequest(seriesCount, labelsCount int) *prompbmarshal.WriteRequest {

View File

@@ -28,9 +28,9 @@ import (
)
var (
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support Prometheus remote_write API. "+
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol "+
"or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteMultitenantURLs = flagutil.NewArrayString("remoteWrite.multitenantURL", "Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
@@ -62,10 +62,12 @@ var (
streamAggrConfig = flagutil.NewArrayString("remoteWrite.streamAggr.config", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See also -remoteWrite.streamAggr.keepInput")
"See also -remoteWrite.streamAggr.keepInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. "+
"By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. "+
"See https://docs.victoriametrics.com/stream-aggregation.html")
streamAggrDedupInterval = flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval", "Input samples are de-duplicated with this interval before being aggregated. "+
"Only the last sample per each time series per each interval is aggregated if the interval is greater than zero")
)
var (
@@ -473,6 +475,7 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_pending_inmemory_blocks{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetInmemoryQueueLen())
})
var c *client
switch remoteWriteURL.Scheme {
case "http", "https":
@@ -493,7 +496,7 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
}
pss := make([]*pendingSeries, pssLen)
for i := range pss {
pss[i] = newPendingSeries(fq.MustWriteBlock, sf, rd)
pss[i] = newPendingSeries(fq.MustWriteBlock, c.useVMProto, sf, rd)
}
rwctx := &remoteWriteCtx{
@@ -509,7 +512,8 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
// Initialize sas
sasFile := streamAggrConfig.GetOptionalArg(argIdx)
if sasFile != "" {
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternal)
dedupInterval := streamAggrDedupInterval.GetOptionalArgOrDefault(argIdx, 0)
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternal, dedupInterval)
if err != nil {
logger.Fatalf("cannot initialize stream aggregators from -remoteWrite.streamAggrFile=%q: %s", sasFile, err)
}

View File

@@ -1,7 +1,6 @@
package vmimport
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
@@ -12,6 +11,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,18 +31,11 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return stream.Parse(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return parser.ParseStream(r, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
})
}
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)

View File

@@ -114,6 +114,9 @@ vmalert-linux-arm64:
vmalert-linux-ppc64le:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmalert-linux-s390x:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmalert-linux-386:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -9,6 +9,13 @@ protocol and require `-remoteWrite.url` to be configured.
Vmalert is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
implementation and aims to be compatible with its syntax.
A [single-node](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmalert)
or [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmalert)
of VictoriaMetrics are capable of proxying requests to vmalert via `-vmalert.proxyURL` command-line flag.
Use this feature for the following cases:
* for proxying requests from [Grafana Alerting UI](https://grafana.com/docs/grafana/latest/alerting/);
* for accessing vmalert's UI through VictoriaMetrics Web interface.
## Features
* Integration with [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) TSDB;
@@ -21,7 +28,8 @@ implementation and aims to be compatible with its syntax.
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite);
* Recording and Alerting rules backfilling (aka `replay`). See [these docs](#rules-backfilling);
* Lightweight and without extra dependencies.
* Supports [reusable templates](#reusable-templates) for annotations.
* Supports [reusable templates](#reusable-templates) for annotations;
* Load of recording and alerting rules from local filesystem, GCS and S3.
## Limitations
@@ -397,6 +405,25 @@ The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz`
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### Reading rules from object storage
[Enterprise version](https://docs.victoriametrics.com/enterprise.html) of `vmalert` may read alerting and recording rules
from object storage:
- `./bin/vmalert -rule=s3://bucket/dir/alert.rules` would read rules from the given path at S3 bucket
- `./bin/vmalert -rule=gs://bucket/bir/alert.rules` would read rules from the given path at GCS bucket
S3 and GCS paths support only matching by prefix, e.g. `s3://bucket/dir/rule_` matches
all files with prefix `rule_` in the folder `dir`.
The following [command-line flags](#flags) can be used for fine-tuning access to S3 and GCS:
- `-s3.credsFilePath` - path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
- `-s3.configFilePath` - path to file with S3 configs. Configs are loaded from default location if not set.
- `-s3.configProfile` - profile name for S3 configs. If no set, the value of the environment variable will be loaded (`AWS_PROFILE` or `AWS_DEFAULT_PROFILE`).
- `-s3.customEndpoint` - custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set.
- `-s3.forcePathStyle` - prefixing endpoint with bucket name when set false, true by default.
### Topology examples
The following sections are showing how `vmalert` may be used and configured
@@ -594,6 +621,12 @@ can read the same rules configuration as normal, evaluate them on the given time
results via remote write to the configured storage. vmalert supports any PromQL/MetricsQL compatible
data source for backfilling.
Please note, that response caching may lead to unexpected results during and after backfilling process.
In order to avoid this you need to reset cache contents or disable caching when using backfilling
as described in [backfilling docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#backfilling).
See a blogpost about [Rules backfilling via vmalert](https://victoriametrics.com/blog/rules-replay/).
### How it works
In `replay` mode vmalert works as a cli-tool and exits immediately after work is done.
@@ -698,31 +731,42 @@ a review to the dashboard.
## Troubleshooting
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
the data is already present in configured `-datasource.url`:
### Data delay
Data delay is one of the most common issues with rules execution.
vmalert executes configured rules within certain intervals at specifics timestamps.
It expects that the data is already present in configured `-datasource.url` at the moment of time when rule is executed:
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
may get empty response from datasource and produce empty recording rules or reset alerts state:
may get empty response from the datasource and produce empty recording rules or reset alerts state:
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
This behavior is controlled by `-search.latencyOffset` command-line flag and the `latency_offset` query ag at `vmselect`.
Usually, this results into a 30s shift for recording rules results.
Note that too small value passed to `-search.latencyOffset` or to `latency_offest` query arg may lead to incomplete query results.
Try the following recommendations to reduce the chance of hitting the data delay issue:
Try the following recommendations in such cases:
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
are delivered to the datasource;
* Always configure group's `evaluationInterval` to be bigger or at least equal to
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution);
* Ensure that `[duration]` value is at least twice bigger than
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`.
If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
command-line flag to add a time shift for evaluations;
* If time intervals between datapoints in datasource are irregular or `>=5min` - try changing vmalert's
`-datasource.queryStep` command-line flag to specify how far search query can lookback for the recent datapoint.
The recommendation is to have the step at least two times bigger than `scrape_interval`, since
there are no guarantees that scrape will not fail.
command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
* If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
in datasource is inconsistent or `>=5min` - try changing vmalert's `-datasource.queryStep` command-line flag to specify
how far search query can lookback for the recent datapoint. The recommendation is to have the step
at least two times bigger than the resolution.
> Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
### Alerts state
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
@@ -745,6 +789,8 @@ HTTP request sent by vmalert to the `-datasource.url` during evaluation. If spec
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
moment when rule was evaluated.
### Debug mode
vmalert allows configuring more detailed logging for specific alerting rule. Just set `debug: true` in rule's configuration
and vmalert will start printing additional log messages:
```terminal
@@ -897,7 +943,13 @@ The shortlist of configuration flags is the following:
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
Address to listen for http connections (default ":8880")
Address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8880")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-insert.maxQueueDuration duration
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
@@ -914,6 +966,8 @@ The shortlist of configuration flags is the following:
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
@@ -1008,7 +1062,7 @@ The shortlist of configuration flags is the following:
-remoteRead.headers string
Optional HTTP headers to send with each request to the corresponding -remoteRead.url. For example, -remoteRead.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteRead.url. Multiple headers must be delimited by '^^': -remoteRead.headers='header1:value1^^header2:value2'
-remoteRead.ignoreRestoreErrors
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
Whether to ignore errors from remote storage when restoring alerts state on startup. DEPRECATED - this flag has no effect and will be removed in the next releases. (default true)
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.oauth2.clientID string
@@ -1085,8 +1139,8 @@ The shortlist of configuration flags is the following:
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. For example, if -remoteWrite.url=http://127.0.0.1:8428 is specified, then the alerts state will be written to http://127.0.0.1:8428/api/v1/write . See also -remoteWrite.disablePathAppend, '-remoteWrite.showURL'.
-replay.disableProgressBar
Whether to disable rendering progress bars during the replay. Progress bar rendering might be verbose or break the logs parsing, so it is recommended to be disabled when not used in interactive mode.
-replay.maxDatapointsPerQuery int
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
-replay.maxDatapointsPerQuery /query_range
Max number of data points expected in one request. It affects the max time range for every /query_range request during the replay. The higher the value, the less requests will be made during replay. (default 1000)
-replay.ruleRetryAttempts int
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
-replay.rulesDelay duration
@@ -1096,18 +1150,24 @@ The shortlist of configuration flags is the following:
-replay.timeTo string
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Path to the files with alerting and/or recording rules.
Supports hierarchical patterns and regexpes.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
-rule="dir/*.yaml" -rule="/*.yaml" -rule="gcs://vmalert-rules/tenant_%{TENANT_ID}/prod".
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
Supports an array of values separated by comma or specified via multiple flags.
-rule.configCheckInterval duration
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead
-rule.maxResolveDuration duration
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
-rule.resendDelay duration
Minimum amount of time to wait before resending an alert to notifier
-rule.templates array
@@ -1124,6 +1184,18 @@ The shortlist of configuration flags is the following:
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
Whether to validate annotation and label templates (default true)
-s3.configFilePath string
Path to file with S3 configs. Configs are loaded from default location if not set.
See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.configProfile string
Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.credsFilePath string
Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.customEndpoint string
Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.forcePathStyle
Prefixing endpoint with bucket name when set false, true by default. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default true)
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -421,7 +421,9 @@ func (ar *AlertingRule) UpdateWith(r Rule) error {
ar.Labels = nr.Labels
ar.Annotations = nr.Annotations
ar.EvalInterval = nr.EvalInterval
ar.Debug = nr.Debug
ar.q = nr.q
ar.state = nr.state
return nil
}
@@ -498,6 +500,7 @@ func (ar *AlertingRule) ToAPI() APIRule {
LastSamples: lastState.samples,
MaxUpdates: ar.state.size(),
Updates: ar.state.getAll(),
Debug: ar.Debug,
// encode as strings to avoid rounding in JSON
ID: fmt.Sprintf("%d", ar.ID()),
@@ -604,54 +607,59 @@ func alertForToTimeSeries(a *notifier.Alert, timestamp int64) prompbmarshal.Time
return newTimeSeries([]float64{float64(a.ActiveAt.Unix())}, []int64{timestamp}, labels)
}
// Restore restores the state of active alerts basing on previously written time series.
// Restore restores only ActiveAt field. Field State will be always Pending and supposed
// to be updated on next Exec, as well as Value field.
// Only rules with For > 0 will be restored.
func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookback time.Duration, labels map[string]string) error {
if q == nil {
return fmt.Errorf("querier is nil")
// Restore restores the value of ActiveAt field for active alerts,
// based on previously written time series `alertForStateMetricName`.
// Only rules with For > 0 can be restored.
func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, ts time.Time, lookback time.Duration) error {
if ar.For < 1 {
return nil
}
ts := time.Now()
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res, err
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
if len(ar.alerts) < 1 {
return nil
}
// account for external labels in filter
var labelsFilter string
for k, v := range labels {
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
}
expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])",
alertForStateMetricName, ar.Name, labelsFilter, int(lookback.Seconds()))
qMetrics, _, err := q.Query(ctx, expr, ts)
if err != nil {
return err
}
for _, m := range qMetrics {
ls := &labelSet{
origin: make(map[string]string, len(m.Labels)),
processed: make(map[string]string, len(m.Labels)),
for _, a := range ar.alerts {
if a.Restored || a.State != notifier.StatePending {
continue
}
for _, l := range m.Labels {
if l.Name == "__name__" {
continue
}
ls.origin[l.Name] = l.Value
ls.processed[l.Name] = l.Value
var labelsFilter []string
for k, v := range a.Labels {
labelsFilter = append(labelsFilter, fmt.Sprintf("%s=%q", k, v))
}
a, err := ar.newAlert(m, ls, time.Unix(int64(m.Values[0]), 0), qFn)
sort.Strings(labelsFilter)
expr := fmt.Sprintf("last_over_time(%s{%s}[%ds])",
alertForStateMetricName, strings.Join(labelsFilter, ","), int(lookback.Seconds()))
ar.logDebugf(ts, nil, "restoring alert state via query %q", expr)
qMetrics, _, err := q.Query(ctx, expr, ts)
if err != nil {
return fmt.Errorf("failed to create alert: %w", err)
return err
}
a.ID = hash(ls.processed)
a.State = notifier.StatePending
if len(qMetrics) < 1 {
ar.logDebugf(ts, nil, "no response was received from restore query")
continue
}
// only one series expected in response
m := qMetrics[0]
// __name__ supposed to be alertForStateMetricName
m.DelLabel("__name__")
// we assume that restore query contains all label matchers,
// so all received labels will match anyway if their number is equal.
if len(m.Labels) != len(a.Labels) {
ar.logDebugf(ts, nil, "state restore query returned not expected label-set %v", m.Labels)
continue
}
a.ActiveAt = time.Unix(int64(m.Values[0]), 0)
a.Restored = true
ar.alerts[a.ID] = a
logger.Infof("alert %q (%d) restored to state at %v", a.Name, a.ID, a.ActiveAt)
}
return nil

View File

@@ -6,12 +6,15 @@ import (
"reflect"
"sort"
"strings"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
func TestAlertingRule_ToTimeSeries(t *testing.T) {
@@ -502,118 +505,156 @@ func TestAlertingRule_ExecRange(t *testing.T) {
}
}
func TestAlertingRule_Restore(t *testing.T) {
testCases := []struct {
rule *AlertingRule
metrics []datasource.Metric
expAlerts map[uint64]*notifier.Alert
}{
{
newTestRuleWithLabels("no extra labels"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
),
},
map[uint64]*notifier.Alert{
hash(nil): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("metric labels"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
alertNameLabel, "metric labels",
alertGroupNameLabel, "groupID",
"foo", "bar",
"namespace", "baz",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{
alertNameLabel: "metric labels",
alertGroupNameLabel: "groupID",
"foo": "bar",
"namespace": "baz",
}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("rule labels", "source", "vm"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
"foo", "bar",
"namespace", "baz",
// extra labels set by rule
"source", "vm",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{
"foo": "bar",
"namespace": "baz",
"source": "vm",
}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("multiple alerts"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-1",
),
metricWithValueAndLabels(t, float64(time.Now().Truncate(2*time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-2",
),
metricWithValueAndLabels(t, float64(time.Now().Truncate(3*time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-3",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{"host": "localhost-1"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
hash(map[string]string{"host": "localhost-2"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(2 * time.Hour)},
hash(map[string]string{"host": "localhost-3"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(3 * time.Hour)},
},
},
func TestGroup_Restore(t *testing.T) {
defaultTS := time.Now()
fqr := &fakeQuerierWithRegistry{}
fn := func(rules []config.Rule, expAlerts map[uint64]*notifier.Alert) {
t.Helper()
defer fqr.reset()
for _, r := range rules {
fqr.set(r.Expr, metricWithValueAndLabels(t, 0, "__name__", r.Alert))
}
fg := newGroup(config.Group{Name: "TestRestore", Rules: rules}, fqr, time.Second, nil)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
nts := func() []notifier.Notifier { return []notifier.Notifier{&fakeNotifier{}} }
fg.start(context.Background(), nts, nil, fqr)
wg.Done()
}()
fg.close()
wg.Wait()
gotAlerts := make(map[uint64]*notifier.Alert)
for _, rs := range fg.Rules {
alerts := rs.(*AlertingRule).alerts
for k, v := range alerts {
if !v.Restored {
// set not restored alerts to predictable timestamp
v.ActiveAt = defaultTS
}
gotAlerts[k] = v
}
}
if len(gotAlerts) != len(expAlerts) {
t.Fatalf("expected %d alerts; got %d", len(expAlerts), len(gotAlerts))
}
for key, exp := range expAlerts {
got, ok := gotAlerts[key]
if !ok {
t.Fatalf("expected to have key %d", key)
}
if got.State != notifier.StatePending {
t.Fatalf("expected state %d; got %d", notifier.StatePending, got.State)
}
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
}
}
fakeGroup := Group{Name: "TestRule_Exec"}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if err := tc.rule.Restore(context.TODO(), fq, time.Hour, nil); err != nil {
t.Fatalf("unexpected err: %s", err)
}
if len(tc.rule.alerts) != len(tc.expAlerts) {
t.Fatalf("expected %d alerts; got %d", len(tc.expAlerts), len(tc.rule.alerts))
}
for key, exp := range tc.expAlerts {
got, ok := tc.rule.alerts[key]
if !ok {
t.Fatalf("expected to have key %d", key)
}
if got.State != exp.State {
t.Fatalf("expected state %d; got %d", exp.State, got.State)
}
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
}
stateMetric := func(name string, value time.Time, labels ...string) datasource.Metric {
labels = append(labels, "__name__", alertForStateMetricName)
labels = append(labels, alertNameLabel, name)
labels = append(labels, alertGroupNameLabel, "TestRestore")
return metricWithValueAndLabels(t, float64(value.Unix()), labels...)
}
// one active alert, no previous state
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
})
fqr.reset()
// one active alert with state restore
ts := time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// two rules, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("foo", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
{Alert: "bar", Expr: "bar", For: promutils.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// two rules, two active alerts, two with state restored
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("bar", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
{Alert: "bar", Expr: "bar", For: promutils.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// one active alert but wrong state restore
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertname="bar",alertgroup="TestRestore"}[3600s])`,
stateMetric("wrong alert", ts))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
})
// one active alert with labels
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
ActiveAt: ts,
},
})
// one active alert with restore labels missmatch
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev", "team", "foo"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
ActiveAt: defaultTS,
},
})
}
}
func TestAlertingRule_Exec_Negative(t *testing.T) {

View File

@@ -5,8 +5,6 @@ import (
"fmt"
"hash/fnv"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
@@ -201,21 +199,37 @@ func (r *Rule) Validate() error {
// ValidateTplFn must validate the given annotations
type ValidateTplFn func(annotations map[string]string) error
// ParseSilent parses rule configs from given file patterns without emitting logs
func ParseSilent(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
files, err := readFromFS(pathPatterns, true)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
}
return parse(files, validateTplFn, validateExpressions)
}
// Parse parses rule configs from given file patterns
func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
var fp []string
for _, pattern := range pathPatterns {
matches, err := filepath.Glob(pattern)
if err != nil {
return nil, fmt.Errorf("error reading file pattern %s: %w", pattern, err)
}
fp = append(fp, matches...)
files, err := readFromFS(pathPatterns, false)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
}
groups, err := parse(files, validateTplFn, validateExpressions)
if err != nil {
return nil, fmt.Errorf("failed to parse %s: %s", pathPatterns, err)
}
if len(groups) < 1 {
logger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))
}
return groups, nil
}
func parse(files map[string][]byte, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
errGroup := new(utils.ErrGroup)
var groups []Group
for _, file := range fp {
for file, data := range files {
uniqueGroups := map[string]struct{}{}
gr, err := parseFile(file)
gr, err := parseConfig(data)
if err != nil {
errGroup.Add(fmt.Errorf("failed to parse file %q: %w", file, err))
continue
@@ -237,20 +251,19 @@ func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressio
if err := errGroup.Err(); err != nil {
return nil, err
}
if len(groups) < 1 {
logger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))
}
sort.SliceStable(groups, func(i, j int) bool {
if groups[i].File != groups[j].File {
return groups[i].File < groups[j].File
}
return groups[i].Name < groups[j].Name
})
return groups, nil
}
func parseFile(path string) ([]Group, error) {
data, err := os.ReadFile(path)
func parseConfig(data []byte) ([]Group, error) {
data, err := envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("error reading alert rule file %q: %w", path, err)
}
data, err = envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("cannot expand environment vars in %q: %w", path, err)
return nil, fmt.Errorf("cannot expand environment vars: %w", err)
}
g := struct {
Groups []Group `yaml:"groups"`

108
app/vmalert/config/fs.go Normal file
View File

@@ -0,0 +1,108 @@
package config
import (
"fmt"
"strings"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config/fslocal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// FS represent a file system abstract for reading files.
type FS interface {
// Init initializes FS.
Init() error
// String must return human-readable representation of FS.
String() string
// List returns the list of file names which will be read via Read fn
List() ([]string, error)
// Read returns a list of read files in form of a map
// where key is a file name and value is a content of read file.
// Read must be called only after the successful Init call.
Read(files []string) (map[string][]byte, error)
}
var (
fsRegistryMu sync.Mutex
fsRegistry = make(map[string]FS)
)
// readFromFS parses the given path list and inits FS for each item.
// Once inited, readFromFS will try to read and return files from each FS.
// readFromFS returns an error if at least one FS failed to init.
// The function can be called multiple times but each unique path
// will be inited only once.
// If silent == true, readFromFS will not emit any logs.
//
// It is allowed to mix different FS types in path list.
func readFromFS(paths []string, silent bool) (map[string][]byte, error) {
var err error
result := make(map[string][]byte)
for _, path := range paths {
fsRegistryMu.Lock()
fs, ok := fsRegistry[path]
if !ok {
fs, err = newFS(path)
if err != nil {
fsRegistryMu.Unlock()
return nil, fmt.Errorf("error while parsing path %q: %w", path, err)
}
if err := fs.Init(); err != nil {
fsRegistryMu.Unlock()
return nil, fmt.Errorf("error while initializing path %q: %w", path, err)
}
fsRegistry[path] = fs
}
fsRegistryMu.Unlock()
list, err := fs.List()
if err != nil {
return nil, fmt.Errorf("failed to list files from %q", fs)
}
if !silent {
logger.Infof("found %d files to read from %q", len(list), fs)
}
if len(list) < 1 {
continue
}
files, err := fs.Read(list)
if err != nil {
return nil, fmt.Errorf("error while reading files from %q: %w", fs, err)
}
for k, v := range files {
if _, ok := result[k]; ok {
return nil, fmt.Errorf("duplicate found for file name %q: file names must be unique", k)
}
result[k] = v
}
}
return result, nil
}
// newFS creates FS based on the give path.
// Supported file systems are: fs
func newFS(path string) (FS, error) {
scheme := "fs"
n := strings.Index(path, "://")
if n >= 0 {
scheme = path[:n]
path = path[n+len("://"):]
}
if len(path) == 0 {
return nil, fmt.Errorf("path cannot be empty")
}
switch scheme {
case "fs":
return &fslocal.FS{Pattern: path}, nil
default:
return nil, fmt.Errorf("unsupported scheme %q", scheme)
}
}

View File

@@ -0,0 +1,39 @@
package config
import (
"strings"
"testing"
)
func TestNewFS(t *testing.T) {
f := func(path, expStr string) {
t.Helper()
fs, err := newFS(path)
if err != nil {
t.Fatalf("unexpected err: %s", err)
}
if fs.String() != expStr {
t.Fatalf("expected FS %q; got %q", expStr, fs.String())
}
}
f("/foo/bar", "Local FS{MatchPattern: \"/foo/bar\"}")
f("fs:///foo/bar", "Local FS{MatchPattern: \"/foo/bar\"}")
}
func TestNewFSNegative(t *testing.T) {
f := func(path, expErr string) {
t.Helper()
_, err := newFS(path)
if err == nil {
t.Fatalf("expected to have err: %s", expErr)
}
if !strings.Contains(err.Error(), expErr) {
t.Fatalf("expected to have err %q; got %q instead", expErr, err)
}
}
f("", "path cannot be empty")
f("fs://", "path cannot be empty")
f("foobar://baz", `unsupported scheme "foobar"`)
}

View File

@@ -0,0 +1,49 @@
package fslocal
import (
"fmt"
"os"
"path/filepath"
)
// FS represents a local file system
type FS struct {
// Pattern is used for matching one or multiple files.
// The pattern may describe hierarchical names such as
// /usr/*/bin/ed (assuming the Separator is '/').
Pattern string
}
// Init verifies that configured Pattern is correct
func (fs *FS) Init() error {
_, err := filepath.Glob(fs.Pattern)
return err
}
// String implements Stringer interface
func (fs *FS) String() string {
return fmt.Sprintf("Local FS{MatchPattern: %q}", fs.Pattern)
}
// List returns the list of file names which will be read via Read fn
func (fs *FS) List() ([]string, error) {
matches, err := filepath.Glob(fs.Pattern)
if err != nil {
return nil, fmt.Errorf("error while matching files via pattern %s: %w", fs.Pattern, err)
}
return matches, nil
}
// Read returns a map of read files where
// key is the file name and value is file's content.
func (fs *FS) Read(files []string) (map[string][]byte, error) {
result := make(map[string][]byte)
for _, path := range files {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("error while reading file %q: %w", path, err)
}
result[path] = data
}
return result, nil
}

View File

@@ -72,6 +72,15 @@ func (m *Metric) AddLabel(key, value string) {
m.Labels = append(m.Labels, Label{Name: key, Value: value})
}
// DelLabel deletes the given label from the label set
func (m *Metric) DelLabel(key string) {
for i, l := range m.Labels {
if l.Name == key {
m.Labels = append(m.Labels[:i], m.Labels[i+1:]...)
}
}
}
// Label returns the given label value.
// If label is missing empty string will be returned
func (m *Metric) Label(key string) string {

View File

@@ -174,7 +174,7 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
}
func (s *VMStorage) newRequestPOST() (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
req, err := http.NewRequest(http.MethodPost, s.datasourceURL, nil)
if err != nil {
return nil, err
}

View File

@@ -2,6 +2,7 @@ package main
import (
"context"
"errors"
"fmt"
"hash/fnv"
"net/url"
@@ -44,6 +45,9 @@ type Group struct {
// channel accepts new Group obj
// which supposed to update current group
updateCh chan *Group
// evalCancel stores the cancel fn for interrupting
// rules evaluation. Used on groups update() and close().
evalCancel context.CancelFunc
metrics *groupMetrics
}
@@ -158,23 +162,23 @@ func (g *Group) ID() uint64 {
}
// Restore restores alerts state for group rules
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, lookback time.Duration, labels map[string]string) error {
labels = mergeLabels(g.Name, "", labels, g.Labels)
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, ts time.Time, lookback time.Duration) error {
for _, rule := range g.Rules {
rr, ok := rule.(*AlertingRule)
ar, ok := rule.(*AlertingRule)
if !ok {
continue
}
if rr.For < 1 {
if ar.For < 1 {
continue
}
// ignore QueryParams on purpose, because they could contain
// query filters. This may affect the restore procedure.
q := qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: g.Type.String(),
Headers: g.Headers,
DataSourceType: g.Type.String(),
EvaluationInterval: g.Interval,
QueryParams: g.Params,
Headers: g.Headers,
Debug: ar.Debug,
})
if err := rr.Restore(ctx, q, lookback, labels); err != nil {
if err := ar.Restore(ctx, q, ts, lookback); err != nil {
return fmt.Errorf("error while restoring rule %q: %w", rule, err)
}
}
@@ -233,11 +237,24 @@ func (g *Group) updateWith(newGroup *Group) error {
return nil
}
// interruptEval interrupts in-flight rules evaluations
// within the group. It is expected that g.evalCancel
// will be repopulated after the call.
func (g *Group) interruptEval() {
g.mu.RLock()
defer g.mu.RUnlock()
if g.evalCancel != nil {
g.evalCancel()
}
}
func (g *Group) close() {
if g.doneCh == nil {
return
}
close(g.doneCh)
g.interruptEval()
<-g.finishedCh
g.metrics.iterationDuration.Unregister()
@@ -251,14 +268,9 @@ func (g *Group) close() {
var skipRandSleepOnGroupStart bool
func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *remotewrite.Client) {
func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *remotewrite.Client, rr datasource.QuerierBuilder) {
defer func() { close(g.finishedCh) }()
e := &executor{
rw: rw,
notifiers: nts,
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label)}
// Spread group rules evaluation over time in order to reduce load on VictoriaMetrics.
if !skipRandSleepOnGroupStart {
randSleep := uint64(float64(g.Interval) * (float64(g.ID()) / (1 << 64)))
@@ -279,11 +291,16 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
}
}
e := &executor{
rw: rw,
notifiers: nts,
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label)}
evalTS := time.Now()
logger.Infof("group %q started; interval=%v; concurrency=%d", g.Name, g.Interval, g.Concurrency)
eval := func(ts time.Time) {
eval := func(ctx context.Context, ts time.Time) {
g.metrics.iterationTotal.Inc()
start := time.Now()
@@ -305,10 +322,26 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
g.LastEvaluation = start
}
eval(evalTS)
evalCtx, cancel := context.WithCancel(ctx)
g.mu.Lock()
g.evalCancel = cancel
g.mu.Unlock()
defer g.evalCancel()
eval(evalCtx, evalTS)
t := time.NewTicker(g.Interval)
defer t.Stop()
// restore the rules state after the first evaluation
// so only active alerts can be restored.
if rr != nil {
err := g.Restore(ctx, rr, evalTS, *remoteReadLookBack)
if err != nil {
logger.Errorf("error while restoring ruleState for group %q: %s", g.Name, err)
}
}
for {
select {
case <-ctx.Done():
@@ -319,6 +352,14 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
return
case ng := <-g.updateCh:
g.mu.Lock()
// it is expected that g.evalCancel will be evoked
// somewhere else to unblock group from the rules evaluation.
// we recreate the evalCtx and g.evalCancel, so it can
// be called again.
evalCtx, cancel = context.WithCancel(ctx)
g.evalCancel = cancel
err := g.updateWith(ng)
if err != nil {
logger.Errorf("group %q: failed to update: %s", g.Name, err)
@@ -343,7 +384,7 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
}
evalTS = evalTS.Add((missed + 1) * g.Interval)
eval(evalTS)
eval(evalCtx, evalTS)
}
}
}
@@ -417,6 +458,11 @@ func (e *executor) exec(ctx context.Context, rule Rule, ts time.Time, resolveDur
tss, err := rule.Exec(ctx, ts, limit)
if err != nil {
if errors.Is(err, context.Canceled) {
// the context can be cancelled on graceful shutdown
// or on group update. So no need to handle the error as usual.
return nil
}
execErrors.Inc()
return fmt.Errorf("rule %q: failed to execute: %w", rule, err)
}

View File

@@ -209,7 +209,7 @@ func TestGroupStart(t *testing.T) {
fs.add(m1)
fs.add(m2)
go func() {
g.start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil)
g.start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil, fs)
close(finished)
}()
@@ -474,3 +474,31 @@ func TestFaultyRW(t *testing.T) {
t.Fatalf("expected to get an error from faulty RW client, got nil instead")
}
}
func TestCloseWithEvalInterruption(t *testing.T) {
groups, err := config.Parse([]string{"config/testdata/rules/rules1-good.rules"}, notifier.ValidateTemplates, true)
if err != nil {
t.Fatalf("failed to parse rules: %s", err)
}
const delay = time.Second * 2
fq := &fakeQuerierWithDelay{delay: delay}
const evalInterval = time.Millisecond
g := newGroup(groups[0], fq, evalInterval, nil)
go g.start(context.Background(), nil, nil, nil)
time.Sleep(evalInterval * 20)
go func() {
g.close()
}()
deadline := time.Tick(delay / 2)
select {
case <-deadline:
t.Fatalf("deadline for close exceeded")
case <-g.finishedCh:
}
}

View File

@@ -61,6 +61,67 @@ func (fq *fakeQuerier) Query(_ context.Context, _ string, _ time.Time) ([]dataso
return cp, req, nil
}
type fakeQuerierWithRegistry struct {
sync.Mutex
registry map[string][]datasource.Metric
}
func (fqr *fakeQuerierWithRegistry) set(key string, metrics ...datasource.Metric) {
fqr.Lock()
if fqr.registry == nil {
fqr.registry = make(map[string][]datasource.Metric)
}
fqr.registry[key] = metrics
fqr.Unlock()
}
func (fqr *fakeQuerierWithRegistry) reset() {
fqr.Lock()
fqr.registry = nil
fqr.Unlock()
}
func (fqr *fakeQuerierWithRegistry) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fqr
}
func (fqr *fakeQuerierWithRegistry) QueryRange(ctx context.Context, q string, _, _ time.Time) ([]datasource.Metric, error) {
req, _, err := fqr.Query(ctx, q, time.Now())
return req, err
}
func (fqr *fakeQuerierWithRegistry) Query(_ context.Context, expr string, _ time.Time) ([]datasource.Metric, *http.Request, error) {
fqr.Lock()
defer fqr.Unlock()
req, _ := http.NewRequest(http.MethodPost, "foo.com", nil)
metrics, ok := fqr.registry[expr]
if !ok {
return nil, req, nil
}
cp := make([]datasource.Metric, len(metrics))
copy(cp, metrics)
return cp, req, nil
}
type fakeQuerierWithDelay struct {
fakeQuerier
delay time.Duration
}
func (fqd *fakeQuerierWithDelay) Query(ctx context.Context, expr string, ts time.Time) ([]datasource.Metric, *http.Request, error) {
timer := time.NewTimer(fqd.delay)
select {
case <-ctx.Done():
case <-timer.C:
}
return fqd.fakeQuerier.Query(ctx, expr, ts)
}
func (fqd *fakeQuerierWithDelay) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fqd
}
type fakeNotifier struct {
sync.Mutex
alerts []notifier.Alert

View File

@@ -28,13 +28,19 @@ import (
)
var (
rulePath = flagutil.NewArrayString("rule", `Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
rulePath = flagutil.NewArrayString("rule", `Path to the files with alerting and/or recording rules.
Supports hierarchical patterns and regexpes.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
-rule="dir/*.yaml" -rule="/*.yaml" -rule="gcs://vmalert-rules/tenant_%{TENANT_ID}/prod".
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
`)
ruleTemplatesPath = flagutil.NewArrayString("rule.templates", `Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
@@ -49,13 +55,16 @@ absolute path to all .tpl files in root.`)
configCheckInterval = flag.Duration("configCheckInterval", 0, "Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
validateTemplates = flag.Bool("rule.validateTemplates", true, "Whether to validate annotation and label templates")
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
"which is by default equal to 3 evaluation intervals of the parent group.")
"which by default is 4 times evaluationInterval of the parent group.")
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param.")
@@ -71,7 +80,7 @@ absolute path to all .tpl files in root.`)
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup. DEPRECATED - this flag has no effect and will be removed in the next releases.")
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
@@ -92,6 +101,10 @@ func main() {
logger.Init()
pushmetrics.Init()
if !*remoteReadIgnoreRestoreErrors {
logger.Warnf("flag `remoteRead.ignoreRestoreErrors` is deprecated and will be removed in next releases.")
}
err := templates.Load(*ruleTemplatesPath, true)
if err != nil {
logger.Fatalf("failed to parse %q: %s", *ruleTemplatesPath, err)
@@ -170,7 +183,7 @@ func main() {
go configReload(ctx, manager, groupsCfg, sighupCh)
rh := &requestHandler{m: manager}
go httpserver.Serve(*httpListenAddr, rh.handler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, rh.handler)
sig := procutil.WaitForSigterm()
logger.Infof("service received signal %s", sig)
@@ -332,7 +345,7 @@ func configReload(ctx context.Context, m *manager, groupsCfg []config.Group, sig
logger.Errorf("failed to load new templates: %s", err)
continue
}
newGroupsCfg, err := config.Parse(*rulePath, validateTplFn, *validateExpressions)
newGroupsCfg, err := config.ParseSilent(*rulePath, validateTplFn, *validateExpressions)
if err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)

View File

@@ -82,24 +82,18 @@ func (m *manager) close() {
m.wg.Wait()
}
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) error {
if restore && m.rr != nil {
err := group.Restore(ctx, m.rr, *remoteReadLookBack, m.labels)
if err != nil {
if !*remoteReadIgnoreRestoreErrors {
return fmt.Errorf("failed to restore ruleState for group %q: %w", group.Name, err)
}
logger.Errorf("error while restoring ruleState for group %q: %s", group.Name, err)
}
}
func (m *manager) startGroup(ctx context.Context, g *Group, restore bool) error {
m.wg.Add(1)
id := group.ID()
id := g.ID()
go func() {
group.start(ctx, m.notifiers, m.rw)
m.wg.Done()
defer m.wg.Done()
if restore {
g.start(ctx, m.notifiers, m.rw, m.rr)
} else {
g.start(ctx, m.notifiers, m.rw, nil)
}
}()
m.groups[id] = group
m.groups[id] = g
return nil
}
@@ -153,6 +147,7 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
}
for _, ng := range groupsRegistry {
if err := m.startGroup(ctx, ng, restore); err != nil {
m.groupsMu.Unlock()
return err
}
}
@@ -166,6 +161,7 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
old.updateCh <- new
wg.Done()
}(item.old, item.new)
item.old.interruptEval()
}
wg.Wait()
}

View File

@@ -64,17 +64,18 @@ func TestManagerUpdateConcurrent(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
go func(n int) {
defer wg.Done()
r := rand.New(rand.NewSource(int64(n)))
for i := 0; i < iterations; i++ {
rnd := rand.Intn(len(paths))
rnd := r.Intn(len(paths))
cfg, err := config.Parse([]string{paths[rnd]}, notifier.ValidateTemplates, true)
if err != nil { // update can fail and this is expected
continue
}
_ = m.update(context.Background(), cfg, false)
}
}()
}(i)
}
wg.Wait()
}

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -41,7 +41,7 @@ type Alert struct {
LastSent time.Time
// Value stores the value returned from evaluating expression from Expr field
Value float64
// ID is the unique identifer for the Alert
// ID is the unique identifier for the Alert
ID uint64
// Restored is true if Alert was restored after restart
Restored bool

View File

@@ -64,7 +64,7 @@ func (am *AlertManager) send(ctx context.Context, alerts []Alert) error {
b := &bytes.Buffer{}
writeamRequest(b, alerts, am.argFunc, am.relabelConfigs)
req, err := http.NewRequest("POST", am.addr, b)
req, err := http.NewRequest(http.MethodPost, am.addr, b)
if err != nil {
return err
}

View File

@@ -29,7 +29,7 @@ type Config struct {
// ConsulSDConfigs contains list of settings for service discovery via Consul
// see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
ConsulSDConfigs []consul.SDConfig `yaml:"consul_sd_configs,omitempty"`
// DNSSDConfigs ontains list of settings for service discovery via DNS.
// DNSSDConfigs contains list of settings for service discovery via DNS.
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
DNSSDConfigs []dns.SDConfig `yaml:"dns_sd_configs,omitempty"`

View File

@@ -162,14 +162,15 @@ consul_sd_configs:
wg := sync.WaitGroup{}
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
go func(n int) {
defer wg.Done()
r := rand.New(rand.NewSource(int64(n)))
for i := 0; i < iterations; i++ {
rnd := rand.Intn(len(paths))
rnd := r.Intn(len(paths))
_ = cw.reload(paths[rnd]) // update can fail and this is expected
_ = cw.notifiers()
}
}()
}(i)
}
wg.Wait()
}

View File

@@ -225,7 +225,7 @@ func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
func (c *Client) send(ctx context.Context, data []byte) error {
r := bytes.NewReader(data)
req, err := http.NewRequest("POST", c.addr, r)
req, err := http.NewRequest(http.MethodPost, c.addr, r)
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}

View File

@@ -27,12 +27,13 @@ func TestClient_Push(t *testing.T) {
if err != nil {
t.Fatalf("failed to create client: %s", err)
}
r := rand.New(rand.NewSource(1))
const rowsN = 1e4
var sent int
for i := 0; i < rowsN; i++ {
s := prompbmarshal.TimeSeries{
Samples: []prompbmarshal.Sample{{
Value: rand.Float64(),
Value: r.Float64(),
Timestamp: time.Now().Unix(),
}},
}

View File

@@ -26,7 +26,7 @@ var (
"and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule."+
"Keep it equal or bigger than -remoteWrite.flushInterval.")
replayMaxDatapoints = flag.Int("replay.maxDatapointsPerQuery", 1e3,
"Max number of data points expected in one request. The higher the value, the less requests will be made during replay.")
"Max number of data points expected in one request. It affects the max time range for every `/query_range` request during the replay. The higher the value, the less requests will be made during replay.")
replayRuleRetryAttempts = flag.Int("replay.ruleRetryAttempts", 5,
"Defines how many retries to make before giving up on rule if request for it returns an error.")
disableProgressBar = flag.Bool("replay.disableProgressBar", false, "Whether to disable rendering progress bars during the replay. "+

View File

@@ -37,8 +37,6 @@ type ruleState struct {
sync.RWMutex
entries []ruleStateEntry
cur int
// disabled defines whether ruleState tracks ruleStateEntry
disabled bool
}
type ruleStateEntry struct {
@@ -61,7 +59,7 @@ type ruleStateEntry struct {
func newRuleState(size int) *ruleState {
if size < 1 {
return &ruleState{disabled: true}
size = 1
}
return &ruleState{
entries: make([]ruleStateEntry, size),
@@ -69,10 +67,6 @@ func newRuleState(size int) *ruleState {
}
func (s *ruleState) getLast() ruleStateEntry {
if s.disabled {
return ruleStateEntry{}
}
s.RLock()
defer s.RUnlock()
return s.entries[s.cur]
@@ -85,10 +79,6 @@ func (s *ruleState) size() int {
}
func (s *ruleState) getAll() []ruleStateEntry {
if s.disabled {
return nil
}
entries := make([]ruleStateEntry, 0)
s.RLock()
@@ -111,10 +101,6 @@ func (s *ruleState) getAll() []ruleStateEntry {
}
func (s *ruleState) add(e ruleStateEntry) {
if s.disabled {
return
}
s.Lock()
defer s.Unlock()

View File

@@ -14,13 +14,13 @@ func TestRule_stateDisabled(t *testing.T) {
}
state.add(ruleStateEntry{at: time.Now()})
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
}
state.add(ruleStateEntry{at: time.Now()})
state.add(ruleStateEntry{at: time.Now()})
if len(state.getAll()) != 0 {
if len(state.getAll()) != 1 {
// state should store at least one update at any circumstances
t.Fatalf("expected for state to have %d entries; got %d",
0, len(state.getAll()),
1, len(state.getAll()),
)
}
}

View File

@@ -225,7 +225,7 @@ func templateFuncs() textTpl.FuncMap {
"toLower": strings.ToLower,
// crlfEscape replaces '\n' and '\r' chars with `\\n` and `\\r`.
// This funcion is deprectated.
// This function is deprecated.
//
// It is better to use quotesEscape, jsonEscape, queryEscape or pathEscape instead -
// these functions properly escape `\n` and `\r` chars according to their purpose.

View File

@@ -57,7 +57,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/", "/vmalert", "/vmalert/":
if r.Method != "GET" {
if r.Method != http.MethodGet {
httpserver.Errorf(w, r, "path %q supports only GET method", r.URL.Path)
return false
}

View File

@@ -40,9 +40,9 @@
for _, g := range groups {
for _, r := range g.Rules {
if r.LastError != "" {
rNotOk[g.Name]++
rNotOk[g.ID]++
} else {
rOk[g.Name]++
rOk[g.ID]++
}
}
}
@@ -50,11 +50,11 @@
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
{% for _, g := range groups %}
<div class="group-heading{% if rNotOk[g.Name] > 0 %} alert-danger{% endif %}" data-bs-target="rules-{%s g.ID %}">
<div class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{% endif %}" data-bs-target="rules-{%s g.ID %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s)</a>
{% if rNotOk[g.Name] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.Name] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.Name] %}</span>
{% if rNotOk[g.ID] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.ID] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.ID] %}</span>
<p class="fs-6 fw-lighter">{%s g.File %}</p>
{% if len(g.Params) > 0 %}
<div class="fs-6 fw-lighter">Extra params
@@ -427,6 +427,16 @@
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Debug
</div>
<div class="col">
{%v rule.Debug %}
</div>
</div>
</div>
{% endif %}
<div class="container border-bottom p-2">
<div class="row">

View File

@@ -171,9 +171,9 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
for _, g := range groups {
for _, r := range g.Rules {
if r.LastError != "" {
rNotOk[g.Name]++
rNotOk[g.ID]++
} else {
rOk[g.Name]++
rOk[g.ID]++
}
}
}
@@ -189,7 +189,7 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
qw422016.N().S(`
<div class="group-heading`)
//line app/vmalert/web.qtpl:53
if rNotOk[g.Name] > 0 {
if rNotOk[g.ID] > 0 {
//line app/vmalert/web.qtpl:53
qw422016.N().S(` alert-danger`)
//line app/vmalert/web.qtpl:53
@@ -230,11 +230,11 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
qw422016.N().S(`s)</a>
`)
//line app/vmalert/web.qtpl:56
if rNotOk[g.Name] > 0 {
if rNotOk[g.ID] > 0 {
//line app/vmalert/web.qtpl:56
qw422016.N().S(`<span class="badge bg-danger" title="Number of rules with status Error">`)
//line app/vmalert/web.qtpl:56
qw422016.N().D(rNotOk[g.Name])
qw422016.N().D(rNotOk[g.ID])
//line app/vmalert/web.qtpl:56
qw422016.N().S(`</span> `)
//line app/vmalert/web.qtpl:56
@@ -243,7 +243,7 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
qw422016.N().S(`
<span class="badge bg-success" title="Number of rules withs status Ok">`)
//line app/vmalert/web.qtpl:57
qw422016.N().D(rOk[g.Name])
qw422016.N().D(rOk[g.ID])
//line app/vmalert/web.qtpl:57
qw422016.N().S(`</span>
<p class="fs-6 fw-lighter">`)
@@ -1313,10 +1313,24 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Debug
</div>
<div class="col">
`)
//line app/vmalert/web.qtpl:436
qw422016.E().V(rule.Debug)
//line app/vmalert/web.qtpl:436
qw422016.N().S(`
</div>
</div>
</div>
`)
//line app/vmalert/web.qtpl:430
//line app/vmalert/web.qtpl:440
}
//line app/vmalert/web.qtpl:430
//line app/vmalert/web.qtpl:440
qw422016.N().S(`
<div class="container border-bottom p-2">
<div class="row">
@@ -1325,17 +1339,17 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div>
<div class="col">
<a target="_blank" href="`)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.E().S(prefix)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.N().S(`groups#group-`)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.E().S(rule.GroupID)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.E().S(rule.GroupID)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.N().S(`</a>
</div>
</div>
@@ -1343,13 +1357,13 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<br>
<div class="display-6 pb-3">Last `)
//line app/vmalert/web.qtpl:443
//line app/vmalert/web.qtpl:453
qw422016.N().D(len(rule.Updates))
//line app/vmalert/web.qtpl:443
//line app/vmalert/web.qtpl:453
qw422016.N().S(`/`)
//line app/vmalert/web.qtpl:443
//line app/vmalert/web.qtpl:453
qw422016.N().D(rule.MaxUpdates)
//line app/vmalert/web.qtpl:443
//line app/vmalert/web.qtpl:453
qw422016.N().S(` updates</span>:</div>
<table class="table table-striped table-hover table-sm">
<thead>
@@ -1364,201 +1378,201 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<tbody>
`)
//line app/vmalert/web.qtpl:456
//line app/vmalert/web.qtpl:466
for _, u := range rule.Updates {
//line app/vmalert/web.qtpl:456
//line app/vmalert/web.qtpl:466
qw422016.N().S(`
<tr`)
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
if u.err != nil {
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
}
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
qw422016.N().S(`>
<td>
<span class="badge bg-primary rounded-pill me-3" title="Updated at">`)
//line app/vmalert/web.qtpl:459
//line app/vmalert/web.qtpl:469
qw422016.E().S(u.time.Format(time.RFC3339))
//line app/vmalert/web.qtpl:459
//line app/vmalert/web.qtpl:469
qw422016.N().S(`</span>
</td>
<td class="text-center" wi>`)
//line app/vmalert/web.qtpl:461
//line app/vmalert/web.qtpl:471
qw422016.N().D(u.samples)
//line app/vmalert/web.qtpl:461
//line app/vmalert/web.qtpl:471
qw422016.N().S(`</td>
<td class="text-center">`)
//line app/vmalert/web.qtpl:462
//line app/vmalert/web.qtpl:472
qw422016.N().FPrec(u.duration.Seconds(), 3)
//line app/vmalert/web.qtpl:462
//line app/vmalert/web.qtpl:472
qw422016.N().S(`s</td>
<td class="text-center">`)
//line app/vmalert/web.qtpl:463
//line app/vmalert/web.qtpl:473
qw422016.E().S(u.at.Format(time.RFC3339))
//line app/vmalert/web.qtpl:463
//line app/vmalert/web.qtpl:473
qw422016.N().S(`</td>
<td>
<textarea class="curl-area" rows="1" onclick="this.focus();this.select()">`)
//line app/vmalert/web.qtpl:465
//line app/vmalert/web.qtpl:475
qw422016.E().S(u.curl)
//line app/vmalert/web.qtpl:465
//line app/vmalert/web.qtpl:475
qw422016.N().S(`</textarea>
</td>
</tr>
</li>
`)
//line app/vmalert/web.qtpl:469
//line app/vmalert/web.qtpl:479
if u.err != nil {
//line app/vmalert/web.qtpl:469
//line app/vmalert/web.qtpl:479
qw422016.N().S(`
<tr`)
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
if u.err != nil {
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
}
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
qw422016.N().S(`>
<td colspan="5">
<span class="alert-danger">`)
//line app/vmalert/web.qtpl:472
//line app/vmalert/web.qtpl:482
qw422016.E().V(u.err)
//line app/vmalert/web.qtpl:472
//line app/vmalert/web.qtpl:482
qw422016.N().S(`</span>
</td>
</tr>
`)
//line app/vmalert/web.qtpl:475
//line app/vmalert/web.qtpl:485
}
//line app/vmalert/web.qtpl:475
//line app/vmalert/web.qtpl:485
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:476
//line app/vmalert/web.qtpl:486
}
//line app/vmalert/web.qtpl:476
//line app/vmalert/web.qtpl:486
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:478
//line app/vmalert/web.qtpl:488
tpl.StreamFooter(qw422016, r)
//line app/vmalert/web.qtpl:478
//line app/vmalert/web.qtpl:488
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
}
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
func WriteRuleDetails(qq422016 qtio422016.Writer, r *http.Request, rule APIRule) {
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
StreamRuleDetails(qw422016, r, rule)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
}
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
func RuleDetails(r *http.Request, rule APIRule) string {
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
WriteRuleDetails(qb422016, r, rule)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
return qs422016
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
}
//line app/vmalert/web.qtpl:483
//line app/vmalert/web.qtpl:493
func streambadgeState(qw422016 *qt422016.Writer, state string) {
//line app/vmalert/web.qtpl:483
//line app/vmalert/web.qtpl:493
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:485
//line app/vmalert/web.qtpl:495
badgeClass := "bg-warning text-dark"
if state == "firing" {
badgeClass = "bg-danger"
}
//line app/vmalert/web.qtpl:489
//line app/vmalert/web.qtpl:499
qw422016.N().S(`
<span class="badge `)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.E().S(badgeClass)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.E().S(state)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
}
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
func writebadgeState(qq422016 qtio422016.Writer, state string) {
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
streambadgeState(qw422016, state)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
}
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
func badgeState(state string) string {
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
writebadgeState(qb422016, state)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
return qs422016
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
}
//line app/vmalert/web.qtpl:493
//line app/vmalert/web.qtpl:503
func streambadgeRestored(qw422016 *qt422016.Writer) {
//line app/vmalert/web.qtpl:493
//line app/vmalert/web.qtpl:503
qw422016.N().S(`
<span class="badge bg-warning text-dark" title="Alert state was restored after the service restart from remote storage">restored</span>
`)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
}
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
func writebadgeRestored(qq422016 qtio422016.Writer) {
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
streambadgeRestored(qw422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
}
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
func badgeRestored() string {
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
writebadgeRestored(qb422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
return qs422016
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
}

View File

@@ -113,18 +113,20 @@ type APIRule struct {
// Additional fields
// Type of the rule: recording or alerting
// DatasourceType of the rule: prometheus or graphite
DatasourceType string `json:"datasourceType"`
LastSamples int `json:"lastSamples"`
// ID is a unique Alert's ID within a group
ID string `json:"id"`
// GroupID is an unique Group's ID
GroupID string `json:"group_id"`
// Debug shows whether debug mode is enabled
Debug bool `json:"debug"`
// MaxUpdates is the max number of recorded ruleStateEntry objects
MaxUpdates int `json:"max_updates_entries"`
// Updates contains the ordered list of recorded ruleStateEntry objects
Updates []ruleStateEntry `json:"updates"`
Updates []ruleStateEntry `json:"-"`
}
// WebLink returns a link to the alert which can be used in UI.

View File

@@ -84,6 +84,9 @@ vmauth-linux-arm64:
vmauth-linux-ppc64le:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmauth-linux-s390x:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmauth-linux-386:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -28,7 +28,36 @@ accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.co
## Load balancing
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls.
In the latter case `vmauth` balances load among the configured urls in least-loaded round-robin manner.
`vmauth` retries failing `GET` requests across the configured list of urls.
This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes
in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
## Concurrency limiting
`vmauth` limits the number of concurrent requests it can proxy according to the following command-line flags:
- `-maxConcurrentRequests` limits the global number of concurrent requests `vmauth` can serve across all the configured users.
- `-maxConcurrentPerUserRequests` limits the number of concurrent requests `vmauth` can serve per each configured user.
It is also possible to set individual limits on the number of concurrent requests per each user
with the `max_concurrent_requests` option - see [auth config example](#auth-config).
`vmauth` responds with `429 Too Many Requests` HTTP error when the number of concurrent requests exceeds the configured limits.
The following [metrics](#monitoring) related to concurrency limits are exposed by `vmauth`:
- `vmauth_concurrent_requests_capacity` - the global limit on the number of concurrent requests `vmauth` can serve.
It is set via `-maxConcurrentRequests` command-line flag.
- `vmauth_concurrent_requests_current` - the current number of concurrent requests `vmauth` processes.
- `vmauth_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
because of the global concurrency limit has been reached.
- `vmauth_user_concurrent_requests_capacity{username="..."}` - the limit on the number of concurrent requests for the given `username`.
- `vmauth_user_concurrent_requests_current{username="..."}` - the current number of concurrent requests for the given `username`.
- `vmauth_user_concurrent_requests_limit_reached_total{username="foo"}` - the number of requests rejected with `429 Too Many Requests` error
because of the concurrency limit has been reached for the given `username`.
## Auth config
@@ -55,26 +84,27 @@ users:
headers:
- "X-Scope-OrgID: foobar"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://localhost:8428 .
# are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
#
# The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
# Excess concurrent requests are rejected with 429 HTTP status code.
# See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
max_concurrent_requests: 10
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# are proxied to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
- username: "local-single-node2"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -84,10 +114,8 @@ users:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
@@ -261,7 +289,11 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8427")
TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8427")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-logInvalidAuthTokens
Whether to log requests with invalid auth tokens. Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
-loggerDisableTimestamps
@@ -280,8 +312,12 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentPerUserRequests int
The maximum number of concurrent requests vmauth can process per each configured user. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option in per-user config (default 300)
-maxConcurrentRequests int
The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options (default 1000)
-maxIdleConnsPerBackend int
The maximum number of idle connections vmauth can open per each backend host (default 100)
The maximum number of idle connections vmauth can open per each backend host. See also -maxConcurrentRequests (default 100)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
@@ -301,6 +337,8 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Supports an array of values separated by comma or specified via multiple flags.
-reloadAuthKey string
Auth key for /-/reload http endpoint. It must be passed as authKey=...
-responseTimeout duration
The timeout for receiving a response from backend (default 5m0s)
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -13,6 +13,7 @@ import (
"sync/atomic"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
@@ -32,17 +33,43 @@ type AuthConfig struct {
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMap []URLMap `yaml:"url_map,omitempty"`
Headers []Header `yaml:"headers,omitempty"`
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
Headers []Header `yaml:"headers,omitempty"`
MaxConcurrentRequests int `yaml:"max_concurrent_requests,omitempty"`
concurrencyLimitCh chan struct{}
concurrencyLimitReached *metrics.Counter
requests *metrics.Counter
}
func (ui *UserInfo) beginConcurrencyLimit() error {
select {
case ui.concurrencyLimitCh <- struct{}{}:
return nil
default:
ui.concurrencyLimitReached.Inc()
return fmt.Errorf("cannot handle more than %d concurrent requests from user %s", ui.getMaxConcurrentRequests(), ui.name())
}
}
func (ui *UserInfo) endConcurrencyLimit() {
<-ui.concurrencyLimitCh
}
func (ui *UserInfo) getMaxConcurrentRequests() int {
mcr := ui.MaxConcurrentRequests
if mcr <= 0 {
mcr = *maxConcurrentPerUserRequests
}
return mcr
}
// Header is `Name: Value` http header, which must be added to the proxied request.
type Header struct {
Name string
@@ -83,16 +110,77 @@ type SrcPath struct {
re *regexp.Regexp
}
// URLPrefix represents pased `url_prefix`
// URLPrefix represents passed `url_prefix`
type URLPrefix struct {
n uint32
urls []*url.URL
n uint32
bus []*backendURL
}
func (up *URLPrefix) getNextURL() *url.URL {
type backendURL struct {
brokenDeadline uint64
concurrentRequests int32
url *url.URL
}
func (bu *backendURL) isBroken() bool {
ct := fasttime.UnixTimestamp()
return ct < atomic.LoadUint64(&bu.brokenDeadline)
}
func (bu *backendURL) setBroken() {
deadline := fasttime.UnixTimestamp() + 3
atomic.StoreUint64(&bu.brokenDeadline, deadline)
}
func (bu *backendURL) put() {
atomic.AddInt32(&bu.concurrentRequests, -1)
}
func (up *URLPrefix) getBackendsCount() int {
return len(up.bus)
}
// getLeastLoadedBackendURL returns the backendURL with the minimum number of concurrent requests.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
bus := up.bus
if len(bus) == 1 {
// Fast path - return the only backend url.
bu := bus[0]
atomic.AddInt32(&bu.concurrentRequests, 1)
return bu
}
// Slow path - select other backend urls.
n := atomic.AddUint32(&up.n, 1)
idx := n % uint32(len(up.urls))
return up.urls[idx]
for i := uint32(0); i < uint32(len(bus)); i++ {
idx := (n + i) % uint32(len(bus))
bu := bus[idx]
if bu.isBroken() {
continue
}
if atomic.CompareAndSwapInt32(&bu.concurrentRequests, 0, 1) {
// Fast path - return the backend with zero concurrently executed requests.
return bu
}
}
// Slow path - return the backend with the minimum number of concurrently executed requests.
buMin := bus[n%uint32(len(bus))]
minRequests := atomic.LoadInt32(&buMin.concurrentRequests)
for _, bu := range bus {
if bu.isBroken() {
continue
}
if n := atomic.LoadInt32(&bu.concurrentRequests); n < minRequests {
buMin = bu
minRequests = n
}
}
atomic.AddInt32(&buMin.concurrentRequests, 1)
return buMin
}
// UnmarshalYAML unmarshals up from yaml.
@@ -121,31 +209,33 @@ func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
default:
return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
}
pus := make([]*url.URL, len(urls))
bus := make([]*backendURL, len(urls))
for i, u := range urls {
pu, err := url.Parse(u)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
}
pus[i] = pu
bus[i] = &backendURL{
url: pu,
}
}
up.urls = pus
up.bus = bus
return nil
}
// MarshalYAML marshals up to yaml.
func (up *URLPrefix) MarshalYAML() (interface{}, error) {
var b []byte
if len(up.urls) == 1 {
u := up.urls[0].String()
if len(up.bus) == 1 {
u := up.bus[0].url.String()
b = strconv.AppendQuote(b, u)
return string(b), nil
}
b = append(b, '[')
for i, pu := range up.urls {
u := pu.String()
for i, bu := range up.bus {
u := bu.url.String()
b = strconv.AppendQuote(b, u)
if i+1 < len(up.urls) {
if i+1 < len(up.bus) {
b = append(b, ',')
}
}
@@ -284,7 +374,7 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
return nil, err
}
}
for _, e := range ui.URLMap {
for _, e := range ui.URLMaps {
if len(e.SrcPaths) == 0 {
return nil, fmt.Errorf("missing `src_paths` in `url_map`")
}
@@ -295,32 +385,47 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
return nil, err
}
}
if len(ui.URLMap) == 0 && ui.URLPrefix == nil {
if len(ui.URLMaps) == 0 && ui.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix`")
}
name := ui.name()
if ui.BearerToken != "" {
name := "bearer_token"
if ui.Name != "" {
name = ui.Name
}
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
}
if ui.Username != "" {
name := ui.Username
if ui.Name != "" {
name = ui.Name
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
}
mcr := ui.getMaxConcurrentRequests()
ui.concurrencyLimitCh = make(chan struct{}, mcr)
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_concurrent_requests_limit_reached_total{username=%q}`, name))
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_capacity{username=%q}`, name), func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
byAuthToken[at1] = ui
byAuthToken[at2] = ui
}
return byAuthToken, nil
}
func (ui *UserInfo) name() string {
if ui.Name != "" {
return ui.Name
}
if ui.Username != "" {
return ui.Username
}
if ui.BearerToken != "" {
return "bearer_token"
}
return ""
}
func getAuthTokens(bearerToken, username, password string) (string, string) {
if bearerToken != "" {
// Accept the bearerToken as Basic Auth username with empty password
@@ -342,12 +447,12 @@ func getAuthToken(bearerToken, username, password string) string {
}
func (up *URLPrefix) sanitize() error {
for i, pu := range up.urls {
puNew, err := sanitizeURLPrefix(pu)
for _, bu := range up.bus {
puNew, err := sanitizeURLPrefix(bu.url)
if err != nil {
return err
}
up.urls[i] = puNew
bu.url = puNew
}
return nil
}

View File

@@ -218,11 +218,13 @@ users:
- username: foo
password: bar
url_prefix: http://aaa:343/bbb
max_concurrent_requests: 5
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
MaxConcurrentRequests: 5,
},
})
@@ -278,7 +280,7 @@ users:
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
BearerToken: "foo",
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
@@ -304,7 +306,7 @@ users:
},
getAuthToken("", "foo", ""): {
BearerToken: "foo",
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
@@ -390,15 +392,17 @@ func mustParseURL(u string) *URLPrefix {
}
func mustParseURLs(us []string) *URLPrefix {
pus := make([]*url.URL, len(us))
bus := make([]*backendURL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
pus[i] = pu
bus[i] = &backendURL{
url: pu,
}
}
return &URLPrefix{
urls: pus,
bus: bus,
}
}

View File

@@ -18,26 +18,27 @@ users:
headers:
- "X-Scope-OrgID: foobar"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://localhost:8428 .
# are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
#
# The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
# Excess concurrent requests are rejected with 429 HTTP status code.
# See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
max_concurrent_requests: 10
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# are proxied to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
- username: "local-single-node2"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -47,10 +48,8 @@ users:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write

View File

@@ -3,8 +3,10 @@ package main
import (
"flag"
"fmt"
"io"
"net"
"net/http"
"net/http/httputil"
"net/textproto"
"net/url"
"os"
"strings"
@@ -12,20 +14,32 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host. "+
"See also -maxConcurrentRequests")
responseTimeout = flag.Duration("responseTimeout", 5*time.Minute, "The timeout for receiving a response from backend")
maxConcurrentRequests = flag.Int("maxConcurrentRequests", 1000, "The maximum number of concurrent requests vmauth can process. Other requests are rejected with "+
"'429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options")
maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
"in per-user config")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
)
@@ -41,7 +55,7 @@ func main() {
logger.Infof("starting vmauth at %q...", *httpListenAddr)
startTime := time.Now()
initAuthConfig()
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
logger.Infof("started vmauth in %.3f seconds", time.Since(startTime).Seconds())
sig := procutil.WaitForSigterm()
@@ -79,44 +93,175 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
// See https://docs.influxdata.com/influxdb/v2.0/api/
authToken = strings.Replace(authToken, "Token", "Bearer", 1)
}
ac := authConfig.Load().(map[string]*UserInfo)
ui := ac[authToken]
if ui == nil {
invalidAuthTokenRequests.Inc()
err := fmt.Errorf("cannot find the provided auth token %q in config", authToken)
if *logInvalidAuthTokens {
httpserver.Errorf(w, r, "cannot find the provided auth token %q in config", authToken)
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
} else {
errStr := fmt.Sprintf("cannot find the provided auth token %q in config", authToken)
http.Error(w, errStr, http.StatusBadRequest)
http.Error(w, err.Error(), http.StatusUnauthorized)
}
return true
}
ui.requests.Inc()
targetURL, headers, err := createTargetURL(ui, r.URL)
if err != nil {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
// Limit the concurrency of requests to backends
concurrencyLimitOnce.Do(concurrencyLimitInit)
select {
case concurrencyLimitCh <- struct{}{}:
if err := ui.beginConcurrencyLimit(); err != nil {
handleConcurrencyLimitError(w, r, err)
<-concurrencyLimitCh
return true
}
default:
concurrentRequestsLimitReached.Inc()
err := fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh))
handleConcurrencyLimitError(w, r, err)
return true
}
r.Header.Set("vm-target-url", targetURL.String())
for _, h := range headers {
r.Header.Set(h.Name, h.Value)
}
proxyRequest(w, r)
processRequest(w, r, ui)
ui.endConcurrencyLimit()
<-concurrencyLimitCh
return true
}
func proxyRequest(w http.ResponseWriter, r *http.Request) {
defer func() {
err := recover()
if err == nil || err == http.ErrAbortHandler {
// Suppress http.ErrAbortHandler panic.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
u := normalizeURL(r.URL)
up, headers, err := ui.getURLPrefixAndHeaders(u)
if err != nil {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
return
}
maxAttempts := up.getBackendsCount()
for i := 0; i < maxAttempts; i++ {
bu := up.getLeastLoadedBackendURL()
targetURL := mergeURLs(bu.url, u)
ok := tryProcessingRequest(w, r, targetURL, headers)
bu.put()
if ok {
return
}
// Forward other panics to the caller.
panic(err)
}()
getReverseProxy().ServeHTTP(w, r)
bu.setBroken()
}
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("all the backends for the user %q are unavailable", ui.name()),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, headers []Header) bool {
// This code has been copied from net/http/httputil/reverseproxy.go
req := sanitizeRequestHeaders(r)
req.URL = targetURL
for _, h := range headers {
req.Header.Set(h.Name, h.Value)
}
transportOnce.Do(transportInit)
res, err := transport.RoundTrip(req)
if err != nil {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
if r.Method == http.MethodPost || r.Method == http.MethodPut {
// It is impossible to retry POST and PUT requests,
// since we already proxied the request body to the backend.
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot proxy the request to %q: %w", targetURL, err),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
return true
}
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying the request to %q: %s", remoteAddr, requestURI, targetURL, err)
return false
}
removeHopHeaders(res.Header)
copyHeader(w.Header(), res.Header)
w.WriteHeader(res.StatusCode)
copyBuf := copyBufPool.Get()
copyBuf.B = bytesutil.ResizeNoCopyNoOverallocate(copyBuf.B, 16*1024)
_, err = io.CopyBuffer(w, res.Body, copyBuf.B)
copyBufPool.Put(copyBuf)
_ = res.Body.Close()
if err != nil && !netutil.IsTrivialNetworkError(err) {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
return true
}
return true
}
var copyBufPool bytesutil.ByteBufferPool
func copyHeader(dst, src http.Header) {
for k, vv := range src {
for _, v := range vv {
dst.Add(k, v)
}
}
}
func sanitizeRequestHeaders(r *http.Request) *http.Request {
// This code has been copied from net/http/httputil/reverseproxy.go
req := r.Clone(r.Context())
removeHopHeaders(req.Header)
if clientIP, _, err := net.SplitHostPort(req.RemoteAddr); err == nil {
// If we aren't the first proxy retain prior
// X-Forwarded-For information as a comma+space
// separated list and fold multiple headers into one.
prior := req.Header["X-Forwarded-For"]
if len(prior) > 0 {
clientIP = strings.Join(prior, ", ") + ", " + clientIP
}
req.Header.Set("X-Forwarded-For", clientIP)
}
return req
}
func removeHopHeaders(h http.Header) {
// remove hop-by-hop headers listed in the "Connection" header of h.
// See RFC 7230, section 6.1
for _, f := range h["Connection"] {
for _, sf := range strings.Split(f, ",") {
if sf = textproto.TrimString(sf); sf != "" {
h.Del(sf)
}
}
}
// Remove hop-by-hop headers to the backend. Especially
// important is "Connection" because we want a persistent
// connection, regardless of what the client sent to us.
for _, key := range hopHeaders {
h.Del(key)
}
}
// Hop-by-hop headers. These are removed when sent to the backend.
// As of RFC 7230, hop-by-hop headers are required to appear in the
// Connection header field. These are the headers defined by the
// obsoleted RFC 2616 (section 13.5.1) and are used for backward
// compatibility.
var hopHeaders = []string{
"Connection",
"Proxy-Connection", // non-standard but still sent by libcurl and rejected by e.g. google
"Keep-Alive",
"Proxy-Authenticate",
"Proxy-Authorization",
"Te", // canonicalized version of "TE"
"Trailer", // not Trailers per URL above; https://www.rfc-editor.org/errata_search.php?eid=4522
"Transfer-Encoding",
"Upgrade",
}
var (
@@ -126,43 +271,41 @@ var (
)
var (
reverseProxy *httputil.ReverseProxy
reverseProxyOnce sync.Once
transport *http.Transport
transportOnce sync.Once
)
func getReverseProxy() *httputil.ReverseProxy {
reverseProxyOnce.Do(initReverseProxy)
return reverseProxy
func transportInit() {
tr := http.DefaultTransport.(*http.Transport).Clone()
tr.ResponseHeaderTimeout = *responseTimeout
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
transport = tr
}
// initReverseProxy must be called after flag.Parse(), since it uses command-line flags.
func initReverseProxy() {
reverseProxy = &httputil.ReverseProxy{
Director: func(r *http.Request) {
targetURL := r.Header.Get("vm-target-url")
target, err := url.Parse(targetURL)
if err != nil {
logger.Panicf("BUG: unexpected error when parsing targetURL=%q: %s", targetURL, err)
}
r.URL = target
},
Transport: func() *http.Transport {
tr := http.DefaultTransport.(*http.Transport).Clone()
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
return tr
}(),
FlushInterval: time.Second,
ErrorLog: logger.StdErrorLogger(),
}
var (
concurrencyLimitCh chan struct{}
concurrencyLimitOnce sync.Once
)
func concurrencyLimitInit() {
concurrencyLimitCh = make(chan struct{}, *maxConcurrentRequests)
_ = metrics.NewGauge("vmauth_concurrent_requests_capacity", func() float64 {
return float64(*maxConcurrentRequests)
})
_ = metrics.NewGauge("vmauth_concurrent_requests_current", func() float64 {
return float64(len(concurrencyLimitCh))
})
}
var concurrentRequestsLimitReached = metrics.NewCounter("vmauth_concurrent_requests_limit_reached_total")
func usage() {
const s = `
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
@@ -171,3 +314,12 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
`
flagutil.Usage(s)
}
func handleConcurrencyLimitError(w http.ResponseWriter, r *http.Request, err error) {
w.Header().Add("Retry-After", "10")
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusTooManyRequests,
}
httpserver.Errorf(w, r, "%s", err)
}

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -7,11 +7,6 @@ import (
"strings"
)
func (up *URLPrefix) mergeURLs(requestURI *url.URL) *url.URL {
pu := up.getNextURL()
return mergeURLs(pu, requestURI)
}
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
targetURL := *uiURL
targetURL.Path += requestURI.Path
@@ -35,12 +30,27 @@ func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
return &targetURL
}
func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
func (ui *UserInfo) getURLPrefixAndHeaders(u *url.URL) (*URLPrefix, []Header, error) {
for _, e := range ui.URLMaps {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix, e.Headers, nil
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix, ui.Headers, nil
}
missingRouteRequests.Inc()
return nil, nil, fmt.Errorf("missing route for %q", u.String())
}
func normalizeURL(uOrig *url.URL) *url.URL {
u := *uOrig
// Prevent from attacks with using `..` in r.URL.Path
u.Path = path.Clean(u.Path)
if !strings.HasSuffix(u.Path, "/") && strings.HasSuffix(uOrig.Path, "/") {
// The path.Clean() removes traling slash.
// The path.Clean() removes trailing slash.
// Return it back if needed.
// This should fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1752
u.Path += "/"
@@ -52,16 +62,5 @@ func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1554
u.Path = ""
}
for _, e := range ui.URLMap {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix.mergeURLs(&u), e.Headers, nil
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix.mergeURLs(&u), ui.Headers, nil
}
missingRouteRequests.Inc()
return nil, nil, fmt.Errorf("missing route for %q", u.String())
return &u
}

View File

@@ -13,10 +13,14 @@ func TestCreateTargetURLSuccess(t *testing.T) {
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, headers, err := createTargetURL(ui, u)
u = normalizeURL(u)
up, headers, err := ui.getURLPrefixAndHeaders(u)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
bu := up.getLeastLoadedBackendURL()
target := mergeURLs(bu.url, u)
bu.put()
if target.String() != expectedTarget {
t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
}
@@ -54,7 +58,7 @@ func TestCreateTargetURLSuccess(t *testing.T) {
// Complex routing with `url_map`
ui := &UserInfo{
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
@@ -86,7 +90,7 @@ func TestCreateTargetURLSuccess(t *testing.T) {
// Complex routing regexp paths in `url_map`
ui = &UserInfo{
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query(_range)?", "/api/v1/label/[^/]+/values"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
@@ -119,20 +123,21 @@ func TestCreateTargetURLFailure(t *testing.T) {
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, headers, err := createTargetURL(ui, u)
u = normalizeURL(u)
up, headers, err := ui.getURLPrefixAndHeaders(u)
if err == nil {
t.Fatalf("expecting non-nil error")
}
if target != nil {
t.Fatalf("unexpected target=%q; want empty string", target)
if up != nil {
t.Fatalf("unexpected non-empty up=%#v", up)
}
if headers != nil {
t.Fatalf("unexpected headers=%q; want empty string", headers)
t.Fatalf("unexpected non-empty headers=%q", headers)
}
}
f(&UserInfo{}, "/foo/bar")
f(&UserInfo{
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: mustParseURL("http://foobar/baz"),

View File

@@ -75,6 +75,9 @@ vmbackup-linux-arm64:
vmbackup-linux-ppc64le:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmbackup-linux-s390x:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmbackup-linux-386:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -51,27 +51,27 @@ func main() {
if len(*snapshotCreateURL) > 0 {
// create net/url object
createUrl, err := url.Parse(*snapshotCreateURL)
createURL, err := url.Parse(*snapshotCreateURL)
if err != nil {
logger.Fatalf("cannot parse snapshotCreateURL: %s", err)
}
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", createUrl.Redacted())
logger.Infof("Snapshot create url %s", createURL.Redacted())
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
if err != nil {
logger.Fatalf("Failed to set snapshot.deleteURL flag: %v", err)
}
}
deleteUrl, err := url.Parse(*snapshotCreateURL)
deleteURL, err := url.Parse(*snapshotDeleteURL)
if err != nil {
logger.Fatalf("cannot parse snapshotDeleteURL: %s", err)
}
logger.Infof("Snapshot delete url %s", deleteUrl.Redacted())
logger.Infof("Snapshot delete url %s", deleteURL.Redacted())
name, err := snapshot.Create(createUrl.String())
name, err := snapshot.Create(createURL.String())
if err != nil {
logger.Fatalf("cannot create snapshot: %s", err)
}
@@ -81,7 +81,7 @@ func main() {
}
defer func() {
err := snapshot.Delete(deleteUrl.String(), name)
err := snapshot.Delete(deleteURL.String(), name)
if err != nil {
logger.Fatalf("cannot delete snapshot: %s", err)
}
@@ -93,7 +93,7 @@ func main() {
logger.Fatalf("invalid -snapshotName=%q: %s", *snapshotName, err)
}
go httpserver.Serve(*httpListenAddr, nil)
go httpserver.Serve(*httpListenAddr, false, nil)
srcFS, err := newSrcFS()
if err != nil {

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -78,6 +78,9 @@ vmctl-linux-arm64:
vmctl-linux-ppc64le:
APP_NAME=vmctl CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmctl-linux-s390x:
APP_NAME=vmctl CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmctl-linux-386:
APP_NAME=vmctl CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -483,6 +483,10 @@ Processing ranges: 8798 / 8798 [████████████████
2022/10/19 16:45:37 Total time: 1m19.406283424s
```
Migrating big volumes of data may result in remote read client reaching the timeout.
Consider increasing the value of `--remote-read-http-timeout` (default `5m`) command-line flag when seeing
timeouts or `context canceled` errors.
### Filtering
The filtering consists of two parts: by labels and time.
@@ -733,21 +737,33 @@ or higher.
See `./vmctl vm-native --help` for details and full list of flags.
In this mode `vmctl` acts as a proxy between two VM instances, where time series filtering is done by "source" (`src`)
and processing is done by "destination" (`dst`). Because of that, `vmctl` doesn't actually know how much data will be
processed and can't show the progress bar. It will show the current processing speed and total number of processed bytes:
Migration in `vm-native` mode takes two steps:
1. Explore the list of the metrics to migrate via `/api/v1/series` API;
2. Migrate explored metrics one-by-one.
```
./vmctl vm-native --vm-native-src-addr=http://localhost:8528 \
--vm-native-dst-addr=http://localhost:8428 \
--vm-native-filter-match='{job="vmagent"}' \
--vm-native-filter-time-start='2020-01-01T20:07:00Z'
./vmctl vm-native \
--vm-native-src-addr=http://127.0.0.1:8481/select/0/prometheus \
--vm-native-dst-addr=http://localhost:8428 \
--vm-native-filter-time-start='2022-11-20T00:00:00Z' \
--vm-native-filter-match='{__name__=~"vm_cache_.*"}'
VictoriaMetrics Native import mode
Initing export pipe from "http://localhost:8528" with filters:
filter: match[]={job="vmagent"}
Initing import process to "http://localhost:8428":
Total: 336.75 KiB ↖ Speed: 454.46 KiB p/s
2020/10/13 17:04:59 Total time: 952.143376ms
2023/03/02 09:22:02 Initing import process from "http://127.0.0.1:8481/select/0/prometheus/api/v1/export/native" to "http://localhost:8428/api/v1/import/native" with filter
filter: match[]={__name__=~"vm_cache_.*"}
start: 2022-11-20T00:00:00Z
2023/03/02 09:22:02 Exploring metrics...
Found 9 metrics to import. Continue? [Y/n]
2023/03/02 09:22:04 Requests to make: 9
Requests to make: 9 / 9 [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%
2023/03/02 09:22:06 Import finished!
2023/03/02 09:22:06 VictoriaMetrics importer stats:
time spent while importing: 3.632638875s;
total bytes: 7.8 MB;
bytes/s: 2.1 MB;
requests: 9;
requests retries: 0;
2023/03/02 09:22:06 Total time: 3.633127625s
```
Importing tips:
@@ -755,6 +771,7 @@ Importing tips:
1. Migrating big volumes of data may result in reaching the safety limits on `src` side.
Please verify that `-search.maxExportDuration` and `-search.maxExportSeries` were set with
proper values for `src`. If hitting the limits, follow the recommendations [here](https://docs.victoriametrics.com/#how-to-export-data-in-native-format).
If hitting `the number of matching timeseries exceeds...` error, adjust filters to match less time series or update `-search.maxSeries` command-line flag on vmselect/vmsingle;
2. Migrating all the metrics from one VM to another may collide with existing application metrics
(prefixed with `vm_`) at destination and lead to confusion when using
[official Grafana dashboards](https://grafana.com/orgs/victoriametrics/dashboards).
@@ -766,71 +783,64 @@ Instead, use [relabeling in VictoriaMetrics](https://github.com/VictoriaMetrics/
5. When importing in or from cluster version remember to use correct [URL format](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format)
and specify `accountID` param.
6. When migrating large volumes of data it might be useful to use `--vm-native-step-interval` flag to split single process into smaller steps.
7. `vmctl` supports `--vm-concurrency` which controls the number of concurrent workers that process the input from source query results.
Please note that each import request can load up to a single vCPU core on VictoriaMetrics. So try to set it according
to allocated CPU resources of your VictoriaMetrics installation.
8. `vmctl` supports `--vm-native-src-headers` and `--vm-native-dst-headers` which defines headers to send with each request
to the corresponding source address.
9. `vmctl` supports `--vm-native-disable-http-keep-alive` to allow `vmctl` to use non-persistent HTTP connections to avoid
error `use of closed network connection` when run a longer export.
10. Migrating data with overlapping time range for destination data can produce duplicates series at destination.
To avoid duplicates on the destination set `-dedup.minScrapeInterval=1ms` for `vmselect` and `vmstorage`.
This will instruct `vmselect` and `vmstorage` to ignore duplicates with match timestamps.
In this mode `vmctl` acts as a proxy between two VM instances, where time series filtering is done by "source" (`src`)
and processing is done by "destination" (`dst`). So no extra memory or CPU resources required on `vmctl` side. Only
`src` and `dst` resource matter.
#### Using time-based chunking of migration
It is possible split migration process into set of smaller batches based on time. This is especially useful when migrating large volumes of data as this adds indication of progress and ability to restore process from certain point in case of failure.
It is possible split migration process into set of smaller batches based on time. This is especially useful when
migrating large volumes of data as this adds indication of progress and ability to restore process from certain point
in case of failure.
To use this you need to specify `--vm-native-step-interval` flag. Supported values are: `month`, `day`, `hour`.
Note that in order to use this it is required `--vm-native-filter-time-start` to be set to calculate time ranges for export process.
To use this you need to specify `--vm-native-step-interval` flag. Supported values are: `month`, `day`, `hour`, `minute`.
Note that in order to use this it is required `--vm-native-filter-time-start` to be set to calculate time ranges for
export process.
Every range is being processed independently, which means that:
- after range processing is finished all data within range is migrated
- if process fails on one of stages it is guaranteed that data of prior stages is already written, so it is possible to restart process starting from failed range
- if process fails on one of stages it is guaranteed that data of prior stages is already written,
so it is possible to restart process starting from failed range.
It is recommended using the `month` step when migrating the data over multiple months, since the migration with `day` and `hour` steps may take longer time to complete
because of additional overhead.
It is recommended using the `month` step when migrating the data over multiple months,
since the migration with `day` and `hour` steps may take longer time to complete because of additional overhead.
Usage example:
```console
./vmctl vm-native
--vm-native-filter-time-start 2022-06-17T00:07:00Z \
--vm-native-filter-time-end 2022-10-03T00:07:00Z \
--vm-native-src-addr http://localhost:8428 \
--vm-native-dst-addr http://localhost:8528 \
--vm-native-step-interval=month
./vmctl vm-native \
--vm-native-src-addr=http://127.0.0.1:8481/select/0/prometheus \
--vm-native-dst-addr=http://localhost:8428 \
--vm-native-filter-time-start='2022-11-20T00:00:00Z' \
--vm-native-step-interval=month \
--vm-native-filter-match='{__name__=~"vm_cache_.*"}'
VictoriaMetrics Native import mode
2022/08/30 19:48:24 Processing range 1/5: 2022-06-17T00:07:00Z - 2022-06-30T23:59:59Z
2022/08/30 19:48:24 Initing export pipe from "http://localhost:8428" with filters:
filter: match[]={__name__!=""}
start: 2022-06-17T00:07:00Z
end: 2022-06-30T23:59:59Z
Initing import process to "http://localhost:8428":
2022/08/30 19:48:24 Import finished!
Total: 16 B ↗ Speed: 28.89 KiB p/s
2022/08/30 19:48:24 Processing range 2/5: 2022-07-01T00:00:00Z - 2022-07-31T23:59:59Z
2022/08/30 19:48:24 Initing export pipe from "http://localhost:8428" with filters:
filter: match[]={__name__!=""}
start: 2022-07-01T00:00:00Z
end: 2022-07-31T23:59:59Z
Initing import process to "http://localhost:8428":
2022/08/30 19:48:24 Import finished!
Total: 16 B ↗ Speed: 164.35 KiB p/s
2022/08/30 19:48:24 Processing range 3/5: 2022-08-01T00:00:00Z - 2022-08-31T23:59:59Z
2022/08/30 19:48:24 Initing export pipe from "http://localhost:8428" with filters:
filter: match[]={__name__!=""}
start: 2022-08-01T00:00:00Z
end: 2022-08-31T23:59:59Z
Initing import process to "http://localhost:8428":
2022/08/30 19:48:24 Import finished!
Total: 16 B ↗ Speed: 191.42 KiB p/s
2022/08/30 19:48:24 Processing range 4/5: 2022-09-01T00:00:00Z - 2022-09-30T23:59:59Z
2022/08/30 19:48:24 Initing export pipe from "http://localhost:8428" with filters:
filter: match[]={__name__!=""}
start: 2022-09-01T00:00:00Z
end: 2022-09-30T23:59:59Z
Initing import process to "http://localhost:8428":
2022/08/30 19:48:24 Import finished!
Total: 16 B ↗ Speed: 141.04 KiB p/s
2022/08/30 19:48:24 Processing range 5/5: 2022-10-01T00:00:00Z - 2022-10-03T00:07:00Z
2022/08/30 19:48:24 Initing export pipe from "http://localhost:8428" with filters:
filter: match[]={__name__!=""}
start: 2022-10-01T00:00:00Z
end: 2022-10-03T00:07:00Z
Initing import process to "http://localhost:8428":
2022/08/30 19:48:24 Import finished!
Total: 16 B ↗ Speed: 186.32 KiB p/s
2022/08/30 19:48:24 Total time: 12.680582ms
2023/03/02 09:18:05 Initing import process from "http://127.0.0.1:8481/select/0/prometheus/api/v1/export/native" to "http://localhost:8428/api/v1/import/native" with filter
filter: match[]={__name__=~"vm_cache_.*"}
start: 2022-11-20T00:00:00Z
2023/03/02 09:18:05 Exploring metrics...
Found 9 metrics to import. Continue? [Y/n]
2023/03/02 09:18:07 Selected time range will be split into 5 ranges according to "month" step. Requests to make: 45.
Requests to make: 45 / 45 [█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%
2023/03/02 09:18:12 Import finished!
2023/03/02 09:18:12 VictoriaMetrics importer stats:
time spent while importing: 7.111870667s;
total bytes: 7.7 MB;
bytes/s: 1.1 MB;
requests: 45;
requests retries: 0;
2023/03/02 09:18:12 Total time: 7.112405875s
```
#### Cluster-to-cluster migration mode
@@ -842,70 +852,41 @@ Cluster-to-cluster uses `/admin/tenants` endpoint (available starting from [v1.8
To use this mode you need to set `--vm-intercluster` flag to `true`, `--vm-native-src-addr` flag to 'http://vmselect:8481/' and `--vm-native-dst-addr` value to http://vminsert:8480/:
```console
./bin/vmctl vm-native --vm-intercluster=true --vm-native-src-addr=http://localhost:8481/ --vm-native-dst-addr=http://172.17.0.3:8480/
./vmctl vm-native --vm-native-src-addr=http://127.0.0.1:8481/ \
--vm-native-dst-addr=http://127.0.0.1:8480/ \
--vm-native-filter-match='{__name__="vm_app_uptime_seconds"}' \
--vm-native-filter-time-start='2023-02-01T00:00:00Z' \
--vm-native-step-interval=day \
--vm-intercluster
VictoriaMetrics Native import mode
2022/12/05 21:20:06 Discovered tenants: [123:1 12812919:1 1289198:1 1289:1283 12:1 1:0 1:1 1:1231231 1:1271727 1:12819 1:281 812891298:1]
2022/12/05 21:20:06 Initing export pipe from "http://localhost:8481/select/123:1/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/123:1/prometheus/api/v1/import/native":
Total: 61.13 MiB ↖ Speed: 2.05 MiB p/s
Total: 61.13 MiB ↗ Speed: 2.30 MiB p/s
2022/12/05 21:20:33 Initing export pipe from "http://localhost:8481/select/12812919:1/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/12812919:1/prometheus/api/v1/import/native":
Total: 43.14 MiB ↘ Speed: 1.86 MiB p/s
Total: 43.14 MiB ↙ Speed: 2.36 MiB p/s
2022/12/05 21:20:51 Initing export pipe from "http://localhost:8481/select/1289198:1/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1289198:1/prometheus/api/v1/import/native":
Total: 16.64 MiB ↗ Speed: 2.66 MiB p/s
Total: 16.64 MiB ↘ Speed: 2.19 MiB p/s
2022/12/05 21:20:59 Initing export pipe from "http://localhost:8481/select/1289:1283/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1289:1283/prometheus/api/v1/import/native":
Total: 43.33 MiB ↙ Speed: 1.94 MiB p/s
Total: 43.33 MiB ↖ Speed: 2.35 MiB p/s
2022/12/05 21:21:18 Initing export pipe from "http://localhost:8481/select/12:1/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/12:1/prometheus/api/v1/import/native":
Total: 63.78 MiB ↙ Speed: 1.96 MiB p/s
Total: 63.78 MiB ↖ Speed: 2.28 MiB p/s
2022/12/05 21:21:46 Initing export pipe from "http://localhost:8481/select/1:0/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1:0/prometheus/api/v1/import/native":
2022/12/05 21:21:46 Import finished!
Total: 330 B ↗ Speed: 3.53 MiB p/s
2022/12/05 21:21:46 Initing export pipe from "http://localhost:8481/select/1:1/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1:1/prometheus/api/v1/import/native":
Total: 63.81 MiB ↙ Speed: 1.96 MiB p/s
Total: 63.81 MiB ↖ Speed: 2.28 MiB p/s
2022/12/05 21:22:14 Initing export pipe from "http://localhost:8481/select/1:1231231/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1:1231231/prometheus/api/v1/import/native":
Total: 63.84 MiB ↙ Speed: 1.93 MiB p/s
Total: 63.84 MiB ↖ Speed: 2.29 MiB p/s
2022/12/05 21:22:42 Initing export pipe from "http://localhost:8481/select/1:1271727/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1:1271727/prometheus/api/v1/import/native":
Total: 54.37 MiB ↘ Speed: 1.90 MiB p/s
Total: 54.37 MiB ↙ Speed: 2.37 MiB p/s
2022/12/05 21:23:05 Initing export pipe from "http://localhost:8481/select/1:12819/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1:12819/prometheus/api/v1/import/native":
Total: 17.01 MiB ↙ Speed: 1.75 MiB p/s
Total: 17.01 MiB ↖ Speed: 2.15 MiB p/s
2022/12/05 21:23:13 Initing export pipe from "http://localhost:8481/select/1:281/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/1:281/prometheus/api/v1/import/native":
Total: 63.89 MiB ↘ Speed: 1.90 MiB p/s
Total: 63.89 MiB ↙ Speed: 2.29 MiB p/s
2022/12/05 21:23:42 Initing export pipe from "http://localhost:8481/select/812891298:1/prometheus/api/v1/export/native" with filters:
filter: match[]={__name__!=""}
Initing import process to "http://172.17.0.3:8480/insert/812891298:1/prometheus/api/v1/import/native":
Total: 63.84 MiB ↖ Speed: 1.99 MiB p/s
Total: 63.84 MiB ↗ Speed: 2.26 MiB p/s
2022/12/05 21:24:10 Total time: 4m4.1466565s
2023/02/28 10:41:42 Discovering tenants...
2023/02/28 10:41:42 The following tenants were discovered: [0:0 1:0 2:0 3:0 4:0]
2023/02/28 10:41:42 Initing import process from "http://127.0.0.1:8481/select/0:0/prometheus/api/v1/export/native" to "http://127.0.0.1:8480/insert/0:0/prometheus/api/v1/import/native" with filter
filter: match[]={__name__="vm_app_uptime_seconds"}
start: 2023-02-01T00:00:00Z for tenant 0:0
2023/02/28 10:41:42 Exploring metrics...
2023/02/28 10:41:42 Found 1 metrics to import
2023/02/28 10:41:42 Selected time range will be split into 28 ranges according to "day" step.
Requests to make for tenant 0:0: 28 / 28 [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%
2023/02/28 10:41:45 Initing import process from "http://127.0.0.1:8481/select/1:0/prometheus/api/v1/export/native" to "http://127.0.0.1:8480/insert/1:0/prometheus/api/v1/import/native" with filter
filter: match[]={__name__="vm_app_uptime_seconds"}
start: 2023-02-01T00:00:00Z for tenant 1:0
2023/02/28 10:41:45 Exploring metrics...
2023/02/28 10:41:45 Found 1 metrics to import
2023/02/28 10:41:45 Selected time range will be split into 28 ranges according to "day" step. Requests to make: 28
Requests to make for tenant 1:0: 28 / 28 [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 100.00%
...
2023/02/28 10:42:49 Import finished!
2023/02/28 10:42:49 VictoriaMetrics importer stats:
time spent while importing: 1m6.714210417s;
total bytes: 39.7 MB;
bytes/s: 594.4 kB;
requests: 140;
requests retries: 0;
2023/02/28 10:42:49 Total time: 1m7.147971417s
```
## Verifying exported blocks from VictoriaMetrics
@@ -972,6 +953,7 @@ a sign of network issues or VM being overloaded. See the logs during import for
By default `vmctl` waits confirmation from user before starting the import. If this is unwanted
behavior and no user interaction required - pass `-s` flag to enable "silence" mode:
See below the example of `vm-native` migration process:
```
-s Whether to run in silent mode. If set to true no confirmation prompts will appear. (default: false)
```

222
app/vmctl/auth/auth.go Normal file
View File

@@ -0,0 +1,222 @@
package auth
import (
"encoding/base64"
"fmt"
"net/http"
"strings"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
)
// HTTPClientConfig represents http client config.
type HTTPClientConfig struct {
BasicAuth *BasicAuthConfig
BearerToken string
Headers string
}
// NewConfig creates auth config for the given hcc.
func (hcc *HTTPClientConfig) NewConfig() (*Config, error) {
opts := &Options{
BasicAuth: hcc.BasicAuth,
BearerToken: hcc.BearerToken,
Headers: hcc.Headers,
}
return opts.NewConfig()
}
// BasicAuthConfig represents basic auth config.
type BasicAuthConfig struct {
Username string
Password string
PasswordFile string
}
// ConfigOptions options which helps build Config
type ConfigOptions func(config *HTTPClientConfig)
// Generate returns Config based on the given params
func Generate(filterOptions ...ConfigOptions) (*Config, error) {
authCfg := &HTTPClientConfig{}
for _, option := range filterOptions {
option(authCfg)
}
return authCfg.NewConfig()
}
// WithBasicAuth returns AuthConfigOptions and initialized BasicAuthConfig based on given params
func WithBasicAuth(username, password string) ConfigOptions {
return func(config *HTTPClientConfig) {
if username != "" || password != "" {
config.BasicAuth = &BasicAuthConfig{
Username: username,
Password: password,
}
}
}
}
// WithBearer returns AuthConfigOptions and set BearerToken or BearerTokenFile based on given params
func WithBearer(token string) ConfigOptions {
return func(config *HTTPClientConfig) {
if token != "" {
config.BearerToken = token
}
}
}
// WithHeaders returns AuthConfigOptions and set Headers based on the given params
func WithHeaders(headers string) ConfigOptions {
return func(config *HTTPClientConfig) {
if headers != "" {
config.Headers = headers
}
}
}
// Config is auth config.
type Config struct {
getAuthHeader func() string
authHeaderLock sync.Mutex
authHeader string
authHeaderDeadline uint64
headers []keyValue
authDigest string
}
// SetHeaders sets the configured ac headers to req.
func (ac *Config) SetHeaders(req *http.Request, setAuthHeader bool) {
reqHeaders := req.Header
for _, h := range ac.headers {
reqHeaders.Set(h.key, h.value)
}
if setAuthHeader {
if ah := ac.GetAuthHeader(); ah != "" {
reqHeaders.Set("Authorization", ah)
}
}
}
// GetAuthHeader returns optional `Authorization: ...` http header.
func (ac *Config) GetAuthHeader() string {
f := ac.getAuthHeader
if f == nil {
return ""
}
ac.authHeaderLock.Lock()
defer ac.authHeaderLock.Unlock()
if fasttime.UnixTimestamp() > ac.authHeaderDeadline {
ac.authHeader = f()
// Cache the authHeader for a second.
ac.authHeaderDeadline = fasttime.UnixTimestamp() + 1
}
return ac.authHeader
}
type authContext struct {
// getAuthHeader must return <value> for 'Authorization: <value>' http request header
getAuthHeader func() string
// authDigest must contain the digest for the used authorization
// The digest must be changed whenever the original config changes.
authDigest string
}
func (ac *authContext) initFromBasicAuthConfig(ba *BasicAuthConfig) error {
if ba.Username == "" {
return fmt.Errorf("missing `username` in `basic_auth` section")
}
if ba.Password != "" {
ac.getAuthHeader = func() string {
// See https://en.wikipedia.org/wiki/Basic_access_authentication
token := ba.Username + ":" + ba.Password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
}
ac.authDigest = fmt.Sprintf("basic(username=%q, password=%q)", ba.Username, ba.Password)
return nil
}
return nil
}
func (ac *authContext) initFromBearerToken(bearerToken string) error {
ac.getAuthHeader = func() string {
return "Bearer " + bearerToken
}
ac.authDigest = fmt.Sprintf("bearer(token=%q)", bearerToken)
return nil
}
// Options contain options, which must be passed to NewConfig.
type Options struct {
// BasicAuth contains optional BasicAuthConfig.
BasicAuth *BasicAuthConfig
// BearerToken contains optional bearer token.
BearerToken string
// Headers contains optional http request headers in the form 'Foo: bar'.
Headers string
}
// NewConfig creates auth config from the given opts.
func (opts *Options) NewConfig() (*Config, error) {
var ac authContext
if opts.BasicAuth != nil {
if ac.getAuthHeader != nil {
return nil, fmt.Errorf("cannot use both `authorization` and `basic_auth`")
}
if err := ac.initFromBasicAuthConfig(opts.BasicAuth); err != nil {
return nil, err
}
}
if opts.BearerToken != "" {
if ac.getAuthHeader != nil {
return nil, fmt.Errorf("cannot simultaneously use `authorization`, `basic_auth` and `bearer_token`")
}
if err := ac.initFromBearerToken(opts.BearerToken); err != nil {
return nil, err
}
}
headers, err := parseHeaders(opts.Headers)
if err != nil {
return nil, err
}
c := &Config{
getAuthHeader: ac.getAuthHeader,
headers: headers,
authDigest: ac.authDigest,
}
return c, nil
}
type keyValue struct {
key string
value string
}
func parseHeaders(headers string) ([]keyValue, error) {
if len(headers) == 0 {
return nil, nil
}
var headersSplitByDelimiter = strings.Split(headers, "^^")
kvs := make([]keyValue, len(headersSplitByDelimiter))
for i, h := range headersSplitByDelimiter {
n := strings.IndexByte(h, ':')
if n < 0 {
return nil, fmt.Errorf(`missing ':' in header %q; expecting "key: value" format`, h)
}
kv := &kvs[i]
kv.key = strings.TrimSpace(h[:n])
kv.value = strings.TrimSpace(h[n+1:])
}
return kvs, nil
}

View File

@@ -0,0 +1,61 @@
package backoff
import (
"context"
"errors"
"fmt"
"math"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
const (
backoffRetries = 5
backoffFactor = 1.7
backoffMinDuration = time.Second
)
// retryableFunc describes call back which will repeat on errors
type retryableFunc func() error
// ErrBadRequest is an error returned on bad request
var ErrBadRequest = errors.New("bad request")
// Backoff describes object with backoff policy params
type Backoff struct {
retries int
factor float64
minDuration time.Duration
}
// New initialize backoff object
func New() *Backoff {
return &Backoff{
retries: backoffRetries,
factor: backoffFactor,
minDuration: backoffMinDuration,
}
}
// Retry process retries until all attempts are completed
func (b *Backoff) Retry(ctx context.Context, cb retryableFunc) (uint64, error) {
var attempt uint64
for i := 0; i < b.retries; i++ {
// @TODO we should use context to cancel retries
err := cb()
if err == nil {
return attempt, nil
}
if errors.Is(err, ErrBadRequest) || errors.Is(err, context.Canceled) {
logger.Errorf("unrecoverable error: %s", err)
return attempt, err // fail fast if not recoverable
}
attempt++
backoff := float64(b.minDuration) * math.Pow(b.factor, float64(i))
dur := time.Duration(backoff)
logger.Errorf("got error: %s on attempt: %d; will retry in %v", err, attempt, dur)
time.Sleep(time.Duration(backoff))
}
return attempt, fmt.Errorf("execution failed after %d retry attempts", b.retries)
}

View File

@@ -0,0 +1,96 @@
package backoff
import (
"context"
"fmt"
"testing"
"time"
)
func TestRetry_Do(t *testing.T) {
counter := 0
tests := []struct {
name string
backoffRetries int
backoffFactor float64
backoffMinDuration time.Duration
retryableFunc retryableFunc
ctx context.Context
withCancel bool
want uint64
wantErr bool
}{
{
name: "return bad request",
retryableFunc: func() error {
return ErrBadRequest
},
ctx: context.Background(),
want: 0,
wantErr: true,
},
{
name: "empty retries values",
retryableFunc: func() error {
time.Sleep(time.Millisecond * 100)
return nil
},
ctx: context.Background(),
want: 0,
wantErr: false,
},
{
name: "only one retry test",
backoffRetries: 5,
backoffFactor: 1.7,
backoffMinDuration: time.Millisecond * 10,
retryableFunc: func() error {
t := time.NewTicker(time.Millisecond * 5)
defer t.Stop()
for range t.C {
counter++
if counter%2 == 0 {
return fmt.Errorf("got some error")
}
if counter%3 == 0 {
return nil
}
}
return nil
},
ctx: context.Background(),
want: 1,
wantErr: false,
},
{
name: "all retries failed test",
backoffRetries: 5,
backoffFactor: 0.1,
backoffMinDuration: time.Millisecond * 10,
retryableFunc: func() error {
t := time.NewTicker(time.Millisecond * 5)
defer t.Stop()
for range t.C {
return fmt.Errorf("got some error")
}
return nil
},
ctx: context.Background(),
want: 5,
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
r := New()
got, err := r.Retry(tt.ctx, tt.retryableFunc)
if (err != nil) != tt.wantErr {
t.Errorf("Retry() error = %v, wantErr %v", err, tt.wantErr)
return
}
if got != tt.want {
t.Errorf("Retry() got = %v, want %v", got, tt.want)
}
})
}
}

View File

@@ -325,13 +325,19 @@ const (
vmNativeFilterTimeEnd = "vm-native-filter-time-end"
vmNativeStepInterval = "vm-native-step-interval"
vmNativeSrcAddr = "vm-native-src-addr"
vmNativeSrcUser = "vm-native-src-user"
vmNativeSrcPassword = "vm-native-src-password"
vmNativeDisableHTTPKeepAlive = "vm-native-disable-http-keep-alive"
vmNativeDstAddr = "vm-native-dst-addr"
vmNativeDstUser = "vm-native-dst-user"
vmNativeDstPassword = "vm-native-dst-password"
vmNativeSrcAddr = "vm-native-src-addr"
vmNativeSrcUser = "vm-native-src-user"
vmNativeSrcPassword = "vm-native-src-password"
vmNativeSrcHeaders = "vm-native-src-headers"
vmNativeSrcBearerToken = "vm-native-src-bearer-token"
vmNativeDstAddr = "vm-native-dst-addr"
vmNativeDstUser = "vm-native-dst-user"
vmNativeDstPassword = "vm-native-dst-password"
vmNativeDstHeaders = "vm-native-dst-headers"
vmNativeDstBearerToken = "vm-native-dst-bearer-token"
)
var (
@@ -344,8 +350,9 @@ var (
Value: `{__name__!=""}`,
},
&cli.StringFlag{
Name: vmNativeFilterTimeStart,
Usage: "The time filter may contain either unix timestamp in seconds or RFC3339 values. E.g. '2020-01-01T20:07:00Z'",
Name: vmNativeFilterTimeStart,
Usage: "The time filter may contain either unix timestamp in seconds or RFC3339 values. E.g. '2020-01-01T20:07:00Z'",
Required: true,
},
&cli.StringFlag{
Name: vmNativeFilterTimeEnd,
@@ -355,6 +362,11 @@ var (
Name: vmNativeStepInterval,
Usage: fmt.Sprintf("Split export data into chunks. Requires setting --%s. Valid values are '%s','%s','%s','%s'.", vmNativeFilterTimeStart, stepper.StepMonth, stepper.StepDay, stepper.StepHour, stepper.StepMinute),
},
&cli.BoolFlag{
Name: vmNativeDisableHTTPKeepAlive,
Usage: "Disable HTTP persistent connections for requests made to VictoriaMetrics components during export",
Value: false,
},
&cli.StringFlag{
Name: vmNativeSrcAddr,
Usage: "VictoriaMetrics address to perform export from. \n" +
@@ -372,6 +384,16 @@ var (
Usage: "VictoriaMetrics password for basic auth",
EnvVars: []string{"VM_NATIVE_SRC_PASSWORD"},
},
&cli.StringFlag{
Name: vmNativeSrcHeaders,
Usage: "Optional HTTP headers to send with each request to the corresponding source address. \n" +
"For example, --vm-native-src-headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding source address. \n" +
"Multiple headers must be delimited by '^^': --vm-native-src-headers='header1:value1^^header2:value2'",
},
&cli.StringFlag{
Name: vmNativeSrcBearerToken,
Usage: "Optional bearer auth token to use for the corresponding `--vm-native-src-addr`",
},
&cli.StringFlag{
Name: vmNativeDstAddr,
Usage: "VictoriaMetrics address to perform import to. \n" +
@@ -389,6 +411,16 @@ var (
Usage: "VictoriaMetrics password for basic auth",
EnvVars: []string{"VM_NATIVE_DST_PASSWORD"},
},
&cli.StringFlag{
Name: vmNativeDstHeaders,
Usage: "Optional HTTP headers to send with each request to the corresponding destination address. \n" +
"For example, --vm-native-dst-headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding destination address. \n" +
"Multiple headers must be delimited by '^^': --vm-native-dst-headers='header1:value1^^header2:value2'",
},
&cli.StringFlag{
Name: vmNativeDstBearerToken,
Usage: "Optional bearer auth token to use for the corresponding `--vm-native-dst-addr`",
},
&cli.StringSliceFlag{
Name: vmExtraLabel,
Value: nil,
@@ -406,6 +438,11 @@ var (
fmt.Sprintf(" In this mode --%s flag format is: 'http://vmselect:8481/'. --%s flag format is: http://vminsert:8480/. \n", vmNativeSrcAddr, vmNativeDstAddr) +
" TenantID will be appended automatically after discovering tenants from src.",
},
&cli.UintFlag{
Name: vmConcurrency,
Usage: "Number of workers concurrently performing import requests to VM",
Value: 2,
},
}
)
@@ -485,7 +522,7 @@ var (
},
&cli.DurationFlag{
Name: remoteReadHTTPTimeout,
Usage: "Timeout defines timeout for HTTP write request to remote storage",
Usage: "Timeout defines timeout for HTTP requests made by remote read client",
},
&cli.StringFlag{
Name: remoteReadHeaders,

View File

@@ -162,7 +162,8 @@ func (c *Client) Explore() ([]*Series, error) {
for _, s := range series {
fields, ok := mFields[s.Measurement]
if !ok {
return nil, fmt.Errorf("can't find field keys for measurement %q", s.Measurement)
log.Printf("skip measurement %q since it has no fields", s.Measurement)
continue
}
for _, field := range fields {
is := &Series{

View File

@@ -11,6 +11,9 @@ import (
"syscall"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/backoff"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/native"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/remoteread"
"github.com/urfave/cli/v2"
@@ -20,7 +23,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native/stream"
)
func main() {
@@ -35,9 +38,6 @@ func main() {
Name: "vmctl",
Usage: "VictoriaMetrics command-line tool",
Version: buildinfo.Version,
// Disable `-version` flag to avoid conflict with lib/buildinfo flags
// see https://github.com/urfave/cli/issues/1560
HideVersion: true,
Commands: []*cli.Command{
{
Name: "opentsdb",
@@ -65,7 +65,7 @@ func main() {
// disable progress bars since openTSDB implementation
// does not use progress bar pool
vmCfg.DisableProgressBar = true
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -100,7 +100,7 @@ func main() {
}
vmCfg := initConfigVM(c)
importer, err = vm.NewImporter(vmCfg)
importer, err = vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -137,7 +137,7 @@ func main() {
vmCfg := initConfigVM(c)
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -163,7 +163,7 @@ func main() {
fmt.Println("Prometheus import mode")
vmCfg := initConfigVM(c)
importer, err = vm.NewImporter(vmCfg)
importer, err = vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -192,7 +192,7 @@ func main() {
{
Name: "vm-native",
Usage: "Migrate time series between VictoriaMetrics installations via native binary format",
Flags: vmNativeFlags,
Flags: mergeFlags(globalFlags, vmNativeFlags),
Action: func(c *cli.Context) error {
fmt.Println("VictoriaMetrics Native import mode")
@@ -200,28 +200,51 @@ func main() {
return fmt.Errorf("flag %q can't be empty", vmNativeFilterMatch)
}
var srcExtraLabels []string
srcAddr := strings.Trim(c.String(vmNativeSrcAddr), "/")
srcAuthConfig, err := auth.Generate(
auth.WithBasicAuth(c.String(vmNativeSrcUser), c.String(vmNativeSrcPassword)),
auth.WithBearer(c.String(vmNativeSrcBearerToken)),
auth.WithHeaders(c.String(vmNativeSrcHeaders)))
if err != nil {
return fmt.Errorf("error initilize auth config for source: %s", srcAddr)
}
dstAddr := strings.Trim(c.String(vmNativeDstAddr), "/")
dstExtraLabels := c.StringSlice(vmExtraLabel)
dstAuthConfig, err := auth.Generate(
auth.WithBasicAuth(c.String(vmNativeDstUser), c.String(vmNativeDstPassword)),
auth.WithBearer(c.String(vmNativeDstBearerToken)),
auth.WithHeaders(c.String(vmNativeDstHeaders)))
if err != nil {
return fmt.Errorf("error initilize auth config for destination: %s", dstAddr)
}
p := vmNativeProcessor{
rateLimit: c.Int64(vmRateLimit),
interCluster: c.Bool(vmInterCluster),
filter: filter{
match: c.String(vmNativeFilterMatch),
timeStart: c.String(vmNativeFilterTimeStart),
timeEnd: c.String(vmNativeFilterTimeEnd),
chunk: c.String(vmNativeStepInterval),
filter: native.Filter{
Match: c.String(vmNativeFilterMatch),
TimeStart: c.String(vmNativeFilterTimeStart),
TimeEnd: c.String(vmNativeFilterTimeEnd),
Chunk: c.String(vmNativeStepInterval),
},
src: &vmNativeClient{
addr: strings.Trim(c.String(vmNativeSrcAddr), "/"),
user: c.String(vmNativeSrcUser),
password: c.String(vmNativeSrcPassword),
src: &native.Client{
AuthCfg: srcAuthConfig,
Addr: srcAddr,
ExtraLabels: srcExtraLabels,
DisableHTTPKeepAlive: c.Bool(vmNativeDisableHTTPKeepAlive),
},
dst: &vmNativeClient{
addr: strings.Trim(c.String(vmNativeDstAddr), "/"),
user: c.String(vmNativeDstUser),
password: c.String(vmNativeDstPassword),
extraLabels: c.StringSlice(vmExtraLabel),
dst: &native.Client{
AuthCfg: dstAuthConfig,
Addr: dstAddr,
ExtraLabels: dstExtraLabels,
DisableHTTPKeepAlive: c.Bool(vmNativeDisableHTTPKeepAlive),
},
backoff: backoff.New(),
cc: c.Int(vmConcurrency),
}
return p.run(ctx)
return p.run(ctx, c.Bool(globalSilent))
},
},
{
@@ -247,7 +270,7 @@ func main() {
return cli.Exit(fmt.Errorf("cannot open exported block at path=%q err=%w", blockPath, err), 1)
}
var blocksCount uint64
if err := parser.ParseStream(f, isBlockGzipped, func(block *parser.Block) error {
if err := stream.Parse(f, isBlockGzipped, func(block *stream.Block) error {
atomic.AddUint64(&blocksCount, 1)
return nil
}); err != nil {

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

186
app/vmctl/native/client.go Normal file
View File

@@ -0,0 +1,186 @@
package native
import (
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/auth"
)
const (
nativeTenantsAddr = "admin/tenants"
nativeSeriesAddr = "api/v1/series"
nameLabel = "__name__"
)
// Client is an HTTP client for exporting and importing
// time series via native protocol.
type Client struct {
AuthCfg *auth.Config
Addr string
ExtraLabels []string
DisableHTTPKeepAlive bool
}
// LabelValues represents series from api/v1/series response
type LabelValues map[string]string
// Response represents response from api/v1/series
type Response struct {
Status string `json:"status"`
Series []LabelValues `json:"data"`
}
// Explore finds series by provided filter from api/v1/series
func (c *Client) Explore(ctx context.Context, f Filter, tenantID string) (map[string]struct{}, error) {
url := fmt.Sprintf("%s/%s", c.Addr, nativeSeriesAddr)
if tenantID != "" {
url = fmt.Sprintf("%s/select/%s/prometheus/%s", c.Addr, tenantID, nativeSeriesAddr)
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return nil, fmt.Errorf("cannot create request to %q: %s", url, err)
}
params := req.URL.Query()
if f.TimeStart != "" {
params.Set("start", f.TimeStart)
}
if f.TimeEnd != "" {
params.Set("end", f.TimeEnd)
}
params.Set("match[]", f.Match)
req.URL.RawQuery = params.Encode()
resp, err := c.do(req, http.StatusOK)
if err != nil {
return nil, fmt.Errorf("series request failed: %s", err)
}
var response Response
if err := json.NewDecoder(resp.Body).Decode(&response); err != nil {
return nil, fmt.Errorf("cannot decode series response: %s", err)
}
if err := resp.Body.Close(); err != nil {
return nil, fmt.Errorf("cannot close series response body: %s", err)
}
names := make(map[string]struct{})
for _, series := range response.Series {
// TODO: consider tweaking /api/v1/series API to return metric names only
// this could make explore response much lighter.
for key, value := range series {
if key != nameLabel {
continue
}
if _, ok := names[value]; ok {
continue
}
names[value] = struct{}{}
}
}
return names, nil
}
// ImportPipe uses pipe reader in request to process data
func (c *Client) ImportPipe(ctx context.Context, dstURL string, pr *io.PipeReader) error {
req, err := http.NewRequestWithContext(ctx, http.MethodPost, dstURL, pr)
if err != nil {
return fmt.Errorf("cannot create import request to %q: %s", c.Addr, err)
}
importResp, err := c.do(req, http.StatusNoContent)
if err != nil {
return fmt.Errorf("import request failed: %s", err)
}
if err := importResp.Body.Close(); err != nil {
return fmt.Errorf("cannot close import response body: %s", err)
}
return nil
}
// ExportPipe makes request by provided filter and return io.ReadCloser which can be used to get data
func (c *Client) ExportPipe(ctx context.Context, url string, f Filter) (io.ReadCloser, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return nil, fmt.Errorf("cannot create request to %q: %s", c.Addr, err)
}
params := req.URL.Query()
params.Set("match[]", f.Match)
if f.TimeStart != "" {
params.Set("start", f.TimeStart)
}
if f.TimeEnd != "" {
params.Set("end", f.TimeEnd)
}
req.URL.RawQuery = params.Encode()
// disable compression since it is meaningless for native format
req.Header.Set("Accept-Encoding", "identity")
resp, err := c.do(req, http.StatusOK)
if err != nil {
return nil, fmt.Errorf("export request failed: %w", err)
}
return resp.Body, nil
}
// GetSourceTenants discovers tenants by provided filter
func (c *Client) GetSourceTenants(ctx context.Context, f Filter) ([]string, error) {
u := fmt.Sprintf("%s/%s", c.Addr, nativeTenantsAddr)
req, err := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
if err != nil {
return nil, fmt.Errorf("cannot create request to %q: %s", u, err)
}
params := req.URL.Query()
if f.TimeStart != "" {
params.Set("start", f.TimeStart)
}
if f.TimeEnd != "" {
params.Set("end", f.TimeEnd)
}
req.URL.RawQuery = params.Encode()
resp, err := c.do(req, http.StatusOK)
if err != nil {
return nil, fmt.Errorf("tenants request failed: %s", err)
}
var r struct {
Tenants []string `json:"data"`
}
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
return nil, fmt.Errorf("cannot decode tenants response: %s", err)
}
if err := resp.Body.Close(); err != nil {
return nil, fmt.Errorf("cannot close tenants response body: %s", err)
}
return r.Tenants, nil
}
func (c *Client) do(req *http.Request, expSC int) (*http.Response, error) {
if c.AuthCfg != nil {
c.AuthCfg.SetHeaders(req, true)
}
var httpClient = &http.Client{Transport: &http.Transport{DisableKeepAlives: c.DisableHTTPKeepAlive}}
resp, err := httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("unexpected error when performing request: %w", err)
}
if resp.StatusCode != expSC {
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("failed to read response body for status code %d: %s", resp.StatusCode, err)
}
return nil, fmt.Errorf("unexpected response code %d: %s", resp.StatusCode, string(body))
}
return resp, err
}

View File

@@ -0,0 +1,22 @@
package native
import "fmt"
// Filter represents request filter
type Filter struct {
Match string
TimeStart string
TimeEnd string
Chunk string
}
func (f Filter) String() string {
s := fmt.Sprintf("\n\tfilter: match[]=%s", f.Match)
if f.TimeStart != "" {
s += fmt.Sprintf("\n\tstart: %s", f.TimeStart)
}
if f.TimeEnd != "" {
s += fmt.Sprintf("\n\tend: %s", f.TimeEnd)
}
return s
}

View File

@@ -102,6 +102,7 @@ func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
if err != nil {
return fmt.Errorf("failed to read block: %s", err)
}
var it chunkenc.Iterator
for ss.Next() {
var name string
var labels []vm.LabelPair
@@ -123,7 +124,7 @@ func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
var timestamps []int64
var values []float64
it := series.Iterator()
it = series.Iterator(it)
for {
typ := it.Next()
if typ == chunkenc.ValNone {

View File

@@ -1,6 +1,7 @@
package main
import (
"context"
"os"
"testing"
"time"
@@ -58,7 +59,7 @@ func Test_prometheusProcessor_run(t *testing.T) {
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}
@@ -95,7 +96,7 @@ func Test_prometheusProcessor_run(t *testing.T) {
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}

View File

@@ -110,6 +110,7 @@ func TestRemoteRead(t *testing.T) {
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
ctx := context.Background()
remoteReadServer := remote_read_integration.NewRemoteReadServer(t)
defer remoteReadServer.Close()
remoteWriteServer := remote_read_integration.NewRemoteWriteServer(t)
@@ -139,7 +140,7 @@ func TestRemoteRead(t *testing.T) {
tt.vmCfg.Addr = remoteWriteServer.URL()
importer, err := vm.NewImporter(tt.vmCfg)
importer, err := vm.NewImporter(ctx, tt.vmCfg)
if err != nil {
t.Fatalf("failed to create VM importer: %s", err)
}
@@ -156,7 +157,6 @@ func TestRemoteRead(t *testing.T) {
cc: 1,
}
ctx := context.Background()
err = rmp.run(ctx, true, false)
if err != nil {
t.Fatalf("failed to run remote read processor: %s", err)
@@ -263,6 +263,7 @@ func TestSteamRemoteRead(t *testing.T) {
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
ctx := context.Background()
remoteReadServer := remote_read_integration.NewRemoteReadStreamServer(t)
defer remoteReadServer.Close()
remoteWriteServer := remote_read_integration.NewRemoteWriteServer(t)
@@ -292,7 +293,7 @@ func TestSteamRemoteRead(t *testing.T) {
tt.vmCfg.Addr = remoteWriteServer.URL()
importer, err := vm.NewImporter(tt.vmCfg)
importer, err := vm.NewImporter(ctx, tt.vmCfg)
if err != nil {
t.Fatalf("failed to create VM importer: %s", err)
}
@@ -309,7 +310,6 @@ func TestSteamRemoteRead(t *testing.T) {
cc: 1,
}
ctx := context.Background()
err = rmp.run(ctx, true, false)
if err != nil {
t.Fatalf("failed to run remote read processor: %s", err)

View File

@@ -21,7 +21,7 @@ import (
)
const (
defaultReadTimeout = 30 * time.Second
defaultReadTimeout = 5 * time.Minute
remoteReadPath = "/api/v1/read"
healthPath = "/-/healthy"
)
@@ -157,7 +157,7 @@ func (c *Client) do(req *http.Request) (*http.Response, error) {
// Ping checks the health of the read source
func (c *Client) Ping() error {
url := c.addr + healthPath
req, err := http.NewRequest("GET", url, nil)
req, err := http.NewRequest(http.MethodGet, url, nil)
if err != nil {
return fmt.Errorf("cannot create request to %q: %s", url, err)
}
@@ -174,7 +174,7 @@ func (c *Client) Ping() error {
func (c *Client) fetch(ctx context.Context, data []byte, streamCb StreamCallback) error {
r := bytes.NewReader(data)
url := c.addr + remoteReadPath
req, err := http.NewRequest("POST", url, r)
req, err := http.NewRequest(http.MethodPost, url, r)
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}

View File

@@ -15,6 +15,7 @@ import (
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/prompb"
"github.com/prometheus/prometheus/storage/remote"
"github.com/prometheus/prometheus/tsdb/chunks"
)
const (
@@ -154,7 +155,7 @@ func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
t.Fatalf("error unmarshal read request: %s", err)
}
var chunks []prompb.Chunk
var chks []prompb.Chunk
ctx := context.Background()
for idx, r := range req.Queries {
startTs := r.StartTimestampMs
@@ -171,9 +172,10 @@ func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
}
ss := q.Select(false, nil, matchers...)
var iter chunks.Iterator
for ss.Next() {
series := ss.At()
iter := series.Iterator()
iter = series.Iterator(iter)
labels := remote.MergeLabels(labelsToLabelsProto(series.Labels()), nil)
frameBytesLeft := maxBytesInFrame
@@ -190,14 +192,14 @@ func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
t.Fatalf("error found not populated chunk returned by SeriesSet at ref: %v", chunk.Ref)
}
chunks = append(chunks, prompb.Chunk{
chks = append(chks, prompb.Chunk{
MinTimeMs: chunk.MinTime,
MaxTimeMs: chunk.MaxTime,
Type: prompb.Chunk_Encoding(chunk.Chunk.Encoding()),
Data: chunk.Chunk.Bytes(),
})
frameBytesLeft -= chunks[len(chunks)-1].Size()
frameBytesLeft -= chks[len(chks)-1].Size()
// We are fine with minor inaccuracy of max bytes per frame. The inaccuracy will be max of full chunk size.
isNext = iter.Next()
@@ -207,7 +209,7 @@ func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
resp := &prompb.ChunkedReadResponse{
ChunkedSeries: []*prompb.ChunkedSeries{
{Labels: labels, Chunks: chunks},
{Labels: labels, Chunks: chks},
},
QueryIndex: int64(idx),
}
@@ -220,7 +222,7 @@ func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
if _, err := stream.Write(b); err != nil {
t.Fatalf("error write to stream: %s", err)
}
chunks = chunks[:0]
chks = chks[:0]
rrs.storage.Reset()
}
if err := iter.Err(); err != nil {

View File

@@ -3,15 +3,16 @@ package vm
import (
"bufio"
"compress/gzip"
"context"
"errors"
"fmt"
"io"
"math"
"net/http"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/backoff"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/barpool"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/limiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
@@ -75,7 +76,8 @@ type Importer struct {
wg sync.WaitGroup
once sync.Once
s *stats
s *stats
backoff *backoff.Backoff
}
// ResetStats resets im stats.
@@ -107,7 +109,7 @@ func AddExtraLabelsToImportPath(path string, extraLabels []string) (string, erro
}
// NewImporter creates new Importer for the given cfg.
func NewImporter(cfg Config) (*Importer, error) {
func NewImporter(ctx context.Context, cfg Config) (*Importer, error) {
if cfg.Concurrency < 1 {
return nil, fmt.Errorf("concurrency can't be lower than 1")
}
@@ -136,6 +138,7 @@ func NewImporter(cfg Config) (*Importer, error) {
close: make(chan struct{}),
input: make(chan *TimeSeries, cfg.Concurrency*4),
errors: make(chan *ImportError, cfg.Concurrency),
backoff: backoff.New(),
}
if err := im.Ping(); err != nil {
return nil, fmt.Errorf("ping to %q failed: %s", addr, err)
@@ -154,7 +157,7 @@ func NewImporter(cfg Config) (*Importer, error) {
}
go func(bar *pb.ProgressBar) {
defer im.wg.Done()
im.startWorker(bar, cfg.BatchSize, cfg.SignificantFigures, cfg.RoundDigits)
im.startWorker(ctx, bar, cfg.BatchSize, cfg.SignificantFigures, cfg.RoundDigits)
}(bar)
}
im.ResetStats()
@@ -205,7 +208,7 @@ func (im *Importer) Close() {
})
}
func (im *Importer) startWorker(bar *pb.ProgressBar, batchSize, significantFigures, roundDigits int) {
func (im *Importer) startWorker(ctx context.Context, bar *pb.ProgressBar, batchSize, significantFigures, roundDigits int) {
var batch []*TimeSeries
var dataPoints int
var waitForBatch time.Time
@@ -219,7 +222,9 @@ func (im *Importer) startWorker(bar *pb.ProgressBar, batchSize, significantFigur
exitErr := &ImportError{
Batch: batch,
}
if err := im.Import(batch); err != nil {
retryableFunc := func() error { return im.Import(batch) }
_, err := im.backoff.Retry(ctx, retryableFunc)
if err != nil {
exitErr.Err = err
}
im.errors <- exitErr
@@ -249,7 +254,7 @@ func (im *Importer) startWorker(bar *pb.ProgressBar, batchSize, significantFigur
im.s.idleDuration += time.Since(waitForBatch)
im.s.Unlock()
if err := im.flush(batch); err != nil {
if err := im.flush(ctx, batch); err != nil {
im.errors <- &ImportError{
Batch: batch,
Err: err,
@@ -264,36 +269,22 @@ func (im *Importer) startWorker(bar *pb.ProgressBar, batchSize, significantFigur
}
}
const (
// TODO: make configurable
backoffRetries = 5
backoffFactor = 1.7
backoffMinDuration = time.Second
)
func (im *Importer) flush(b []*TimeSeries) error {
var err error
for i := 0; i < backoffRetries; i++ {
err = im.Import(b)
if err == nil {
return nil
}
if errors.Is(err, ErrBadRequest) {
return err // fail fast if not recoverable
}
im.s.Lock()
im.s.retries++
im.s.Unlock()
backoff := float64(backoffMinDuration) * math.Pow(backoffFactor, float64(i))
time.Sleep(time.Duration(backoff))
func (im *Importer) flush(ctx context.Context, b []*TimeSeries) error {
retryableFunc := func() error { return im.Import(b) }
attempts, err := im.backoff.Retry(ctx, retryableFunc)
if err != nil {
return fmt.Errorf("import failed with %d retries: %s", attempts, err)
}
return fmt.Errorf("import failed with %d retries: %s", backoffRetries, err)
im.s.Lock()
im.s.retries = attempts
im.s.Unlock()
return nil
}
// Ping sends a ping to im.addr.
func (im *Importer) Ping() error {
url := fmt.Sprintf("%s/health", im.addr)
req, err := http.NewRequest("GET", url, nil)
req, err := http.NewRequest(http.MethodGet, url, nil)
if err != nil {
return fmt.Errorf("cannot create request to %q: %s", im.addr, err)
}
@@ -317,7 +308,7 @@ func (im *Importer) Import(tsBatch []*TimeSeries) error {
}
pr, pw := io.Pipe()
req, err := http.NewRequest("POST", im.importPath, pr)
req, err := http.NewRequest(http.MethodPost, im.importPath, pr)
if err != nil {
return fmt.Errorf("cannot create request to %q: %s", im.addr, err)
}

View File

@@ -2,177 +2,125 @@ package main
import (
"context"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"sync"
"time"
"github.com/cheggaaa/pb/v3"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/backoff"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/limiter"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/native"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/stepper"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/cheggaaa/pb/v3"
)
type vmNativeProcessor struct {
filter filter
rateLimit int64
filter native.Filter
dst *vmNativeClient
src *vmNativeClient
dst *native.Client
src *native.Client
backoff *backoff.Backoff
s *stats
rateLimit int64
interCluster bool
}
type vmNativeClient struct {
addr string
user string
password string
extraLabels []string
}
type filter struct {
match string
timeStart string
timeEnd string
chunk string
}
func (f filter) String() string {
s := fmt.Sprintf("\n\tfilter: match[]=%s", f.match)
if f.timeStart != "" {
s += fmt.Sprintf("\n\tstart: %s", f.timeStart)
}
if f.timeEnd != "" {
s += fmt.Sprintf("\n\tend: %s", f.timeEnd)
}
return s
cc int
}
const (
nativeExportAddr = "api/v1/export/native"
nativeImportAddr = "api/v1/import/native"
nativeTenantsAddr = "admin/tenants"
nativeBarTpl = `Total: {{counters . }} {{ cycle . "↖" "↗" "↘" "↙" }} Speed: {{speed . }} {{string . "suffix"}}`
nativeExportAddr = "api/v1/export/native"
nativeImportAddr = "api/v1/import/native"
nativeBarTpl = `{{ blue "%s:" }} {{ counters . }} {{ bar . "[" "█" (cycle . "█") "▒" "]" }} {{ percent . }}`
)
func (p *vmNativeProcessor) run(ctx context.Context) error {
if p.filter.chunk == "" {
return p.runWithFilter(ctx, p.filter)
func (p *vmNativeProcessor) run(ctx context.Context, silent bool) error {
if p.cc == 0 {
p.cc = 1
}
p.s = &stats{
startTime: time.Now(),
}
startOfRange, err := time.Parse(time.RFC3339, p.filter.timeStart)
start, err := time.Parse(time.RFC3339, p.filter.TimeStart)
if err != nil {
return fmt.Errorf("failed to parse %s, provided: %s, expected format: %s, error: %v", vmNativeFilterTimeStart, p.filter.timeStart, time.RFC3339, err)
return fmt.Errorf("failed to parse %s, provided: %s, expected format: %s, error: %w",
vmNativeFilterTimeStart, p.filter.TimeStart, time.RFC3339, err)
}
var endOfRange time.Time
if p.filter.timeEnd != "" {
endOfRange, err = time.Parse(time.RFC3339, p.filter.timeEnd)
end := time.Now().In(start.Location())
if p.filter.TimeEnd != "" {
end, err = time.Parse(time.RFC3339, p.filter.TimeEnd)
if err != nil {
return fmt.Errorf("failed to parse %s, provided: %s, expected format: %s, error: %v", vmNativeFilterTimeEnd, p.filter.timeEnd, time.RFC3339, err)
return fmt.Errorf("failed to parse %s, provided: %s, expected format: %s, error: %w",
vmNativeFilterTimeEnd, p.filter.TimeEnd, time.RFC3339, err)
}
} else {
endOfRange = time.Now()
}
ranges, err := stepper.SplitDateRange(startOfRange, endOfRange, p.filter.chunk)
if err != nil {
return fmt.Errorf("failed to create date ranges for the given time filters: %v", err)
}
for rangeIdx, r := range ranges {
formattedStartTime := r[0].Format(time.RFC3339)
formattedEndTime := r[1].Format(time.RFC3339)
log.Printf("Processing range %d/%d: %s - %s \n", rangeIdx+1, len(ranges), formattedStartTime, formattedEndTime)
f := filter{
match: p.filter.match,
timeStart: formattedStartTime,
timeEnd: formattedEndTime,
}
err := p.runWithFilter(ctx, f)
ranges := [][]time.Time{{start, end}}
if p.filter.Chunk != "" {
ranges, err = stepper.SplitDateRange(start, end, p.filter.Chunk)
if err != nil {
log.Printf("processing failed for range %d/%d: %s - %s \n", rangeIdx+1, len(ranges), formattedStartTime, formattedEndTime)
return err
return fmt.Errorf("failed to create date ranges for the given time filters: %w", err)
}
}
tenants := []string{""}
if p.interCluster {
log.Printf("Discovering tenants...")
tenants, err = p.src.GetSourceTenants(ctx, p.filter)
if err != nil {
return fmt.Errorf("failed to get tenants: %w", err)
}
question := fmt.Sprintf("The following tenants were discovered: %s.\n Continue?", tenants)
if !silent && !prompt(question) {
return nil
}
}
for _, tenantID := range tenants {
err := p.runBackfilling(ctx, tenantID, ranges, silent)
if err != nil {
return fmt.Errorf("migration failed: %s", err)
}
}
log.Println("Import finished!")
log.Print(p.s)
return nil
}
func (p *vmNativeProcessor) runWithFilter(ctx context.Context, f filter) error {
nativeImportAddr, err := vm.AddExtraLabelsToImportPath(nativeImportAddr, p.dst.extraLabels)
func (p *vmNativeProcessor) do(ctx context.Context, f native.Filter, srcURL, dstURL string) error {
retryableFunc := func() error { return p.runSingle(ctx, f, srcURL, dstURL) }
attempts, err := p.backoff.Retry(ctx, retryableFunc)
p.s.Lock()
p.s.retries += attempts
p.s.Unlock()
if err != nil {
return fmt.Errorf("failed to add labels to import path: %s", err)
}
if !p.interCluster {
srcURL := fmt.Sprintf("%s/%s", p.src.addr, nativeExportAddr)
dstURL := fmt.Sprintf("%s/%s", p.dst.addr, nativeImportAddr)
return p.runSingle(ctx, f, srcURL, dstURL)
}
tenants, err := p.getSourceTenants(ctx, f)
if err != nil {
return fmt.Errorf("failed to get source tenants: %s", err)
}
log.Printf("Discovered tenants: %v", tenants)
for _, tenant := range tenants {
// src and dst expected formats: http://vminsert:8480/ and http://vmselect:8481/
srcURL := fmt.Sprintf("%s/select/%s/prometheus/%s", p.src.addr, tenant, nativeExportAddr)
dstURL := fmt.Sprintf("%s/insert/%s/prometheus/%s", p.dst.addr, tenant, nativeImportAddr)
if err := p.runSingle(ctx, f, srcURL, dstURL); err != nil {
return fmt.Errorf("failed to migrate data for tenant %q: %s", tenant, err)
}
return fmt.Errorf("failed to migrate from %s to %s (retry attempts: %d): %w\nwith fileter %s", srcURL, dstURL, attempts, err, f)
}
return nil
}
func (p *vmNativeProcessor) runSingle(ctx context.Context, f filter, srcURL, dstURL string) error {
log.Printf("Initing export pipe from %q with filters: %s\n", srcURL, f)
func (p *vmNativeProcessor) runSingle(ctx context.Context, f native.Filter, srcURL, dstURL string) error {
exportReader, err := p.exportPipe(ctx, srcURL, f)
exportReader, err := p.src.ExportPipe(ctx, srcURL, f)
if err != nil {
return fmt.Errorf("failed to init export pipe: %s", err)
return fmt.Errorf("failed to init export pipe: %w", err)
}
pr, pw := io.Pipe()
sync := make(chan struct{})
done := make(chan struct{})
go func() {
defer func() { close(sync) }()
req, err := http.NewRequestWithContext(ctx, "POST", dstURL, pr)
if err != nil {
log.Fatalf("cannot create import request to %q: %s", p.dst.addr, err)
}
importResp, err := p.dst.do(req, http.StatusNoContent)
if err != nil {
log.Fatalf("import request failed: %s", err)
}
if err := importResp.Body.Close(); err != nil {
log.Fatalf("cannot close import response body: %s", err)
}
}()
fmt.Printf("Initing import process to %q:\n", dstURL)
pool := pb.NewPool()
bar := pb.ProgressBarTemplate(nativeBarTpl).New(0)
pool.Add(bar)
barReader := bar.NewProxyReader(exportReader)
if err := pool.Start(); err != nil {
log.Printf("error start process bars pool: %s", err)
return err
}
defer func() {
bar.Finish()
if err := pool.Stop(); err != nil {
fmt.Printf("failed to stop barpool: %+v\n", err)
defer func() { close(done) }()
if err := p.dst.ImportPipe(ctx, dstURL, pr); err != nil {
logger.Errorf("error initialize import pipe: %s", err)
return
}
}()
@@ -182,95 +130,176 @@ func (p *vmNativeProcessor) runSingle(ctx context.Context, f filter, srcURL, dst
w = limiter.NewWriteLimiter(pw, rl)
}
_, err = io.Copy(w, barReader)
written, err := io.Copy(w, exportReader)
if err != nil {
return fmt.Errorf("failed to write into %q: %s", p.dst.addr, err)
return fmt.Errorf("failed to write into %q: %s", p.dst.Addr, err)
}
p.s.Lock()
p.s.bytes += uint64(written)
p.s.requests++
p.s.Unlock()
if err := pw.Close(); err != nil {
return err
}
<-sync
<-done
log.Println("Import finished!")
return nil
}
func (p *vmNativeProcessor) getSourceTenants(ctx context.Context, f filter) ([]string, error) {
u := fmt.Sprintf("%s/%s", p.src.addr, nativeTenantsAddr)
req, err := http.NewRequestWithContext(ctx, "GET", u, nil)
func (p *vmNativeProcessor) runBackfilling(ctx context.Context, tenantID string, ranges [][]time.Time, silent bool) error {
exportAddr := nativeExportAddr
srcURL := fmt.Sprintf("%s/%s", p.src.Addr, exportAddr)
importAddr, err := vm.AddExtraLabelsToImportPath(nativeImportAddr, p.dst.ExtraLabels)
if err != nil {
return nil, fmt.Errorf("cannot create request to %q: %s", u, err)
return fmt.Errorf("failed to add labels to import path: %s", err)
}
dstURL := fmt.Sprintf("%s/%s", p.dst.Addr, importAddr)
if p.interCluster {
srcURL = fmt.Sprintf("%s/select/%s/prometheus/%s", p.src.Addr, tenantID, exportAddr)
dstURL = fmt.Sprintf("%s/insert/%s/prometheus/%s", p.dst.Addr, tenantID, importAddr)
}
params := req.URL.Query()
if f.timeStart != "" {
params.Set("start", f.timeStart)
barPrefix := "Requests to make"
initMessage := "Initing import process from %q to %q with filter %s"
initParams := []interface{}{srcURL, dstURL, p.filter.String()}
if p.interCluster {
barPrefix = fmt.Sprintf("Requests to make for tenant %s", tenantID)
initMessage = "Initing import process from %q to %q with filter %s for tenant %s"
initParams = []interface{}{srcURL, dstURL, p.filter.String(), tenantID}
}
if f.timeEnd != "" {
params.Set("end", f.timeEnd)
}
req.URL.RawQuery = params.Encode()
resp, err := p.src.do(req, http.StatusOK)
fmt.Println("") // extra line for better output formatting
log.Printf(initMessage, initParams...)
log.Printf("Exploring metrics...")
metrics, err := p.src.Explore(ctx, p.filter, tenantID)
if err != nil {
return nil, fmt.Errorf("tenants request failed: %s", err)
return fmt.Errorf("cannot get metrics from source %s: %w", p.src.Addr, err)
}
var r struct {
Tenants []string `json:"data"`
}
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
return nil, fmt.Errorf("cannot decode tenants response: %s", err)
if len(metrics) == 0 {
return fmt.Errorf("no metrics found")
}
if err := resp.Body.Close(); err != nil {
return nil, fmt.Errorf("cannot close tenants response body: %s", err)
}
return r.Tenants, nil
}
func (p *vmNativeProcessor) exportPipe(ctx context.Context, url string, f filter) (io.ReadCloser, error) {
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, fmt.Errorf("cannot create request to %q: %s", p.src.addr, err)
}
params := req.URL.Query()
params.Set("match[]", f.match)
if f.timeStart != "" {
params.Set("start", f.timeStart)
}
if f.timeEnd != "" {
params.Set("end", f.timeEnd)
}
req.URL.RawQuery = params.Encode()
// disable compression since it is meaningless for native format
req.Header.Set("Accept-Encoding", "identity")
resp, err := p.src.do(req, http.StatusOK)
if err != nil {
return nil, fmt.Errorf("export request failed: %s", err)
}
return resp.Body, nil
}
func (c *vmNativeClient) do(req *http.Request, expSC int) (*http.Response, error) {
if c.user != "" {
req.SetBasicAuth(c.user, c.password)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return nil, fmt.Errorf("unexpected error when performing request: %s", err)
}
if resp.StatusCode != expSC {
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("failed to read response body for status code %d: %s", resp.StatusCode, err)
foundSeriesMsg := fmt.Sprintf("Found %d metrics to import", len(metrics))
if !p.interCluster {
// do not prompt for intercluster because there could be many tenants,
// and we don't want to interrupt the process when moving to the next tenant.
question := foundSeriesMsg + ". Continue?"
if !silent && !prompt(question) {
return nil
}
return nil, fmt.Errorf("unexpected response code %d: %s", resp.StatusCode, string(body))
} else {
log.Print(foundSeriesMsg)
}
return resp, err
processingMsg := fmt.Sprintf("Requests to make: %d", len(metrics)*len(ranges))
if len(ranges) > 1 {
processingMsg = fmt.Sprintf("Selected time range will be split into %d ranges according to %q step. %s", len(ranges), p.filter.Chunk, processingMsg)
}
log.Print(processingMsg)
var bar *pb.ProgressBar
if !silent {
bar = pb.ProgressBarTemplate(fmt.Sprintf(nativeBarTpl, barPrefix)).New(len(metrics) * len(ranges))
bar.Start()
defer bar.Finish()
}
filterCh := make(chan native.Filter)
errCh := make(chan error, p.cc)
var wg sync.WaitGroup
for i := 0; i < p.cc; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for f := range filterCh {
if err := p.do(ctx, f, srcURL, dstURL); err != nil {
errCh <- err
return
}
if bar != nil {
bar.Increment()
}
}
}()
}
// any error breaks the import
for s := range metrics {
for _, times := range ranges {
select {
case <-ctx.Done():
return fmt.Errorf("context canceled")
case infErr := <-errCh:
return fmt.Errorf("native error: %s", infErr)
case filterCh <- native.Filter{
Match: fmt.Sprintf("{%s=%q}", nameLabel, s),
TimeStart: times[0].Format(time.RFC3339),
TimeEnd: times[1].Format(time.RFC3339),
}:
}
}
}
close(filterCh)
wg.Wait()
close(errCh)
for err := range errCh {
return fmt.Errorf("import process failed: %s", err)
}
return nil
}
// stats represents client statistic
// when processing data
type stats struct {
sync.Mutex
startTime time.Time
bytes uint64
requests uint64
retries uint64
}
func (s *stats) String() string {
s.Lock()
defer s.Unlock()
totalImportDuration := time.Since(s.startTime)
totalImportDurationS := totalImportDuration.Seconds()
bytesPerS := byteCountSI(0)
if s.bytes > 0 && totalImportDurationS > 0 {
bytesPerS = byteCountSI(int64(float64(s.bytes) / totalImportDurationS))
}
return fmt.Sprintf("VictoriaMetrics importer stats:\n"+
" time spent while importing: %v;\n"+
" total bytes: %s;\n"+
" bytes/s: %s;\n"+
" requests: %d;\n"+
" requests retries: %d;",
totalImportDuration,
byteCountSI(int64(s.bytes)), bytesPerS,
s.requests, s.retries)
}
func byteCountSI(b int64) string {
const unit = 1000
if b < unit {
return fmt.Sprintf("%d B", b)
}
div, exp := int64(unit), 0
for n := b / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB",
float64(b)/float64(div), "kMGTPE"[exp])
}

View File

@@ -5,6 +5,7 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/native"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/stepper"
)
@@ -27,10 +28,10 @@ const (
func Test_vmNativeProcessor_run(t *testing.T) {
t.Skip()
type fields struct {
filter filter
filter native.Filter
rateLimit int64
dst *vmNativeClient
src *vmNativeClient
dst *native.Client
src *native.Client
}
tests := []struct {
name string
@@ -41,16 +42,16 @@ func Test_vmNativeProcessor_run(t *testing.T) {
{
name: "simulate syscall.SIGINT",
fields: fields{
filter: filter{
match: matchFilter,
timeStart: timeStartFilter,
filter: native.Filter{
Match: matchFilter,
TimeStart: timeStartFilter,
},
rateLimit: 0,
dst: &vmNativeClient{
addr: dstAddr,
dst: &native.Client{
Addr: dstAddr,
},
src: &vmNativeClient{
addr: srcAddr,
src: &native.Client{
Addr: srcAddr,
},
},
closer: func(cancelFunc context.CancelFunc) {
@@ -62,16 +63,16 @@ func Test_vmNativeProcessor_run(t *testing.T) {
{
name: "simulate correct work",
fields: fields{
filter: filter{
match: matchFilter,
timeStart: timeStartFilter,
filter: native.Filter{
Match: matchFilter,
TimeStart: timeStartFilter,
},
rateLimit: 0,
dst: &vmNativeClient{
addr: dstAddr,
dst: &native.Client{
Addr: dstAddr,
},
src: &vmNativeClient{
addr: srcAddr,
src: &native.Client{
Addr: srcAddr,
},
},
closer: func(cancelFunc context.CancelFunc) {},
@@ -80,18 +81,18 @@ func Test_vmNativeProcessor_run(t *testing.T) {
{
name: "simulate correct work with chunking",
fields: fields{
filter: filter{
match: matchFilter,
timeStart: timeStartFilter,
timeEnd: timeEndFilter,
chunk: stepper.StepMonth,
filter: native.Filter{
Match: matchFilter,
TimeStart: timeStartFilter,
TimeEnd: timeEndFilter,
Chunk: stepper.StepMonth,
},
rateLimit: 0,
dst: &vmNativeClient{
addr: dstAddr,
dst: &native.Client{
Addr: dstAddr,
},
src: &vmNativeClient{
addr: srcAddr,
src: &native.Client{
Addr: srcAddr,
},
},
closer: func(cancelFunc context.CancelFunc) {},
@@ -110,7 +111,7 @@ func Test_vmNativeProcessor_run(t *testing.T) {
tt.closer(cancelFn)
if err := p.run(ctx); (err != nil) != tt.wantErr {
if err := p.run(ctx, true); (err != nil) != tt.wantErr {
t.Errorf("run() error = %v, wantErr %v", err, tt.wantErr)
}
})

View File

@@ -206,6 +206,56 @@ mwIDAQAB
```
This command will result in 3 keys loaded: 2 keys from files and 1 from command line.
### Using OpenID discovery endpoint for JWT signature verification
`vmgateway` supports using OpenID discovery endpoint for JWKS keys discovery.
In order to enable [OpenID discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) endpoint for JWT signature verification, you need to specify OpenID discovery endpoint URLs by using `auth.oidcDiscoveryEndpoints` flag.
When `auth.oidcDiscoveryEndpoints` is specified `vmageteway` will fetch JWKS keys from the specified endpoint and use them for JWT signature verification.
Example usage for tokens issued by Azure Active Directory:
```console
/bin/vmgateway -eula \
-enable.auth \
-write.url=http://localhost:8480 \
-read.url=http://localhost:8481 \
-auth.oidcDiscoveryEndpoints=https://login.microsoftonline.com/common/v2.0/.well-known/openid-configuration
```
Example usage for tokens issued by Google:
```console
/bin/vmgateway -eula \
-enable.auth \
-write.url=http://localhost:8480 \
-read.url=http://localhost:8481 \
-auth.oidcDiscoveryEndpoints=https://accounts.google.com/.well-known/openid-configuration
```
### Using JWKS endpoint for JWT signature verification
`vmgateway` supports using JWKS endpoint for JWT signature verification.
In order to enable JWKS endpoint for JWT signature verification, you need to specify JWKS endpoint URL by using `auth.jwksEndpoints` flag.
When `auth.jwksEndpoints` is specified `vmageteway` will fetch public keys from the specified endpoint and use them for JWT signature verification.
Example usage for tokens issued by Azure Active Directory:
```console
/bin/vmgateway -eula \
-enable.auth \
-write.url=http://localhost:8480 \
-read.url=http://localhost:8481 \
-auth.jwksEndpoints=https://login.microsoftonline.com/common/discovery/v2.0/keys
```
Example usage for tokens issued by Google:
```console
/bin/vmgateway -eula \
-enable.auth \
-write.url=http://localhost:8480 \
-read.url=http://localhost:8481 \
-auth.jwksEndpoints=https://www.googleapis.com/oauth2/v3/certs
```
## Configuration
The shortlist of configuration flags include the following:
@@ -213,6 +263,12 @@ The shortlist of configuration flags include the following:
```console
-auth.httpHeader string
HTTP header name to look for JWT authorization token (default "Authorization")
-auth.jwksEndpoints array
JWKS endpoints to fetch keys for JWT tokens signature verification
Supports an array of values separated by comma or specified via multiple flags.
-auth.oidcDiscoveryEndpoints array
OpenID Connect discovery endpoints to fetch keys for JWT tokens signature verification
Supports an array of values separated by comma or specified via multiple flags.
-auth.publicKeyFiles array
Path file with public key to verify JWT token signature
Supports an array of values separated by comma or specified via multiple flags.
@@ -304,7 +360,11 @@ The shortlist of configuration flags include the following:
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8431")
TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8431")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int

View File

@@ -146,7 +146,7 @@ func (ctx *InsertCtx) FlushBufs() error {
}
// There is no need in limiting the number of concurrent calls to vmstorage.AddRows() here,
// since the number of concurrent FlushBufs() calls should be already limited via writeconcurrencylimiter
// used at every ParseStream() call under lib/protoparser/*/streamparser.go
// used at every stream.Parse() call under lib/protoparser/*
err := vmstorage.AddRows(ctx.mrs)
ctx.Reset(0)
if err == nil {

View File

@@ -16,10 +16,12 @@ import (
var (
streamAggrConfig = flag.String("streamAggr.config", "", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See also -remoteWrite.streamAggr.keepInput")
"See also -remoteWrite.streamAggr.keepInput and -streamAggr.dedupInterval")
streamAggrKeepInput = flag.Bool("streamAggr.keepInput", false, "Whether to keep input samples after the aggregation with -streamAggr.config. "+
"By default the input is dropped after the aggregation, so only the aggregate data is stored. "+
"See https://docs.victoriametrics.com/stream-aggregation.html")
streamAggrDedupInterval = flag.Duration("streamAggr.dedupInterval", 0, "Input samples are de-duplicated with this interval before being aggregated. "+
"Only the last sample per each time series per each interval is aggregated if the interval is greater than zero")
)
// InitStreamAggr must be called after flag.Parse and before using the common package.
@@ -30,7 +32,7 @@ func InitStreamAggr() {
// Nothing to initialize
return
}
a, err := streamaggr.LoadFromFile(*streamAggrConfig, pushAggregateSeries)
a, err := streamaggr.LoadFromFile(*streamAggrConfig, pushAggregateSeries, *streamAggrDedupInterval)
if err != nil {
logger.Fatalf("cannot load -streamAggr.config=%q: %s", *streamAggrConfig, err)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -22,7 +23,7 @@ func InsertHandler(req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req, func(rows []parser.Row) error {
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -25,7 +26,7 @@ func InsertHandlerForHTTP(req *http.Request) error {
return err
}
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return stream.Parse(req.Body, ce, func(series []parser.Series) error {
return insertRows(series, extraLabels)
})
}

Some files were not shown because too many files have changed in this diff Show More