Compare commits

...

433 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
2d36dbcfa9 docs/CHANGELOG.md: cut v1.88.0 2023-02-24 17:54:21 -08:00
Aliaksandr Valialkin
8cfe4064b5 vendor: make vendor-update 2023-02-24 17:26:51 -08:00
Aliaksandr Valialkin
b4bf8dc2f0 docs: update -help output after e1c3267e34 2023-02-24 17:16:13 -08:00
Roman Khavronenko
e1c3267e34 vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 16:59:26 -08:00
Aliaksandr Valialkin
c33cc4322c docs/CHANGELOG.md: document v1.87.2 release 2023-02-24 16:14:28 -08:00
Aliaksandr Valialkin
6a88d5402d docs/CHANGELOG.md: document v1.79.9 release 2023-02-24 15:10:48 -08:00
Roman Khavronenko
dac21d874b metricsql: support optional 2nd argument for rollup functions (#3841)
* metricsql: support optional 2nd argument for rollup functions

Support optional 2nd argument `min`, `max` or `avg` for rollup functions:
 * rollup
 * rollup_delta
 * rollup_deriv
 * rollup_increase
 * rollup_rate
 * rollup_scrape_interval

 If second argument is passed, then rollup function will return only the selected aggregation type.
 This change can be useful for situations where only one type of rollup calculation is needed.
 For example, `rollup_rate(requests_total[5m], "max")`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 13:47:52 -08:00
Aliaksandr Valialkin
87aeeec3e8 docs/CHANGELOG.md: document d8eaa511b0 2023-02-24 12:42:02 -08:00
Zakhar Bessarab
d8eaa511b0 lib/{fs,mergeset,storage}: skip .must-remove. dirs when creating snapshot (#3858) (#3867) 2023-02-24 12:38:42 -08:00
Aliaksandr Valialkin
a2340e6c95 docs/CHANGELOG.md: typo fix: scrape scrape -> scrape 2023-02-24 12:33:29 -08:00
Aliaksandr Valialkin
c6ad3692ad lib/promscrape: follow-up for 43e104a83f
- Return immediately on context cancel during the backoff sleep.
  This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747

- Add a comment describing why the second attempt to obtain the response from remote side
  is perfromed immediately after the first attempt.

- Remove fasthttp dependency from lib/promscrape/discoveryutils

- Set context deadline before calling doRequestWithPossibleRetry().
  This simplifies the doRequestWithPossibleRetry() a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293
2023-02-24 12:20:42 -08:00
Zakhar Bessarab
43e104a83f fix: do not use exponential backoff for first retry of scrape request (#3824)
* fix: do not use exponential backoff for first retry of scrape request (#3293)

* lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update lib/promscrape/client.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-24 11:39:56 -08:00
Aliaksandr Valialkin
c87c7d1e29 app/vmselect/promql: measure the time required for calculating the aggregate function from the prepared source time series 2023-02-23 20:05:14 -08:00
Aliaksandr Valialkin
8b7a828c65 app/vmselect/vmui: make vmui-update after d4fc0ed874 2023-02-23 19:25:52 -08:00
Yury Molodov
d4fc0ed874 vmui: improve mobile ui (#3848)
* feat: improve mobile ui

* feat: improve mobile ui

* fix: change style server url

* fix: improve ExploreMetrics mobile

* fix: display global settings on all pages
2023-02-23 19:18:49 -08:00
Aliaksandr Valialkin
a02cf92fd1 docs: update --help descriptions after recent changes 2023-02-23 19:02:27 -08:00
Aliaksandr Valialkin
c734416f86 lib/protoparser: fix golangci-lint warning after f579cac297 2023-02-23 18:50:34 -08:00
Aliaksandr Valialkin
b285207aa7 app/vmselect: add -search.logQueryMemoryUsage command-line flag for logging queries, which take big amounts of memory
Thanks to @michal-kralik for initial attempts for this feature:

- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3651
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3715

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3553
2023-02-23 18:47:08 -08:00
Aliaksandr Valialkin
c080443fef app/vmagent: automatically detect whether the remote storage supports VictoriaMetrics remote write protocol
Substitute -remoteWrite.useVMProto with -remoteWrite.forcePromProto command-line flag,
which can be used for forcing Prometheus remote write protocol in cases when the remote storage
supports VictoriaMetrics remote write protocol.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-23 17:36:55 -08:00
Aliaksandr Valialkin
e688121de8 lib/promscrape/discovery/kuma: substitute blocking HTTP call with non-blocking HTTP call at discoveryutils.Client 2023-02-23 15:13:08 -08:00
Denys Holius
24915fd4bc Fix some typos and adds improvements for packer builds (#3825)
* update helper scripts to latest versions

* added missed command for initialisation variables from Makefile

* deployment/marketplace/vultr/helper-scripts/vultr-helper.sh: update helper script to latest version

* fixed typo for using VM_VERSION variable

* added an example of specifying the VM_VERSION and tokens for API's

* set packer logging to STDOUT by default
2023-02-23 12:42:55 +04:00
Aliaksandr Valialkin
637043a40e docs/CHANGELOG.md: document 6d019a3c37
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3830
2023-02-22 19:24:00 -08:00
Mattias Ängehov
6d019a3c37 Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832)
* Modify API version when running in Container App

* Handle expires on from token response

Response from IMDS does not always contain expires in value which is
currently used to get the token expiry time. An example resources that
doesn't provide it are Container Apps and App Service.

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>

* Fix client id parameter for user assigned identity

* Apply suggestions from code review

---------

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2023-02-22 19:19:53 -08:00
Aliaksandr Valialkin
510f78a96b all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 18:58:46 -08:00
my-git9
9dec3c8f80 chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:53:05 -08:00
Alexander Marshalov
aaef0bac00 fix interpolate function for filling only intermediate gaps (#3816) (#3857)
* fix interpolate function for filling only intermediate gaps (#3816)

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-22 18:38:43 -08:00
Yury Molodov
fc720a5a78 fix: change query params update (#3860) 2023-02-22 18:25:32 -08:00
Aliaksandr Valialkin
383bf9689e docs/CHANGELOG.md: document d2b92d3264
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
2023-02-22 17:51:52 -08:00
Aliaksandr Valialkin
9fbd45a22f lib/promscrape/discovery/kuma: follow-up for 317fef95f9
- Do not generate __meta_server label, since it is unavailable in Prometheus.
- Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md,
  so users could click it and read the docs without the need to search the corresponding docs.
- Remove kumaTarget struct, since it is easier generating labels for discovered targets
  directly from the response returned by Kuma. This simplifies the code.
- Store the generated labels for discovered targets inside atomic.Value. This allows reading them
  from concurrent goroutines without the need to use mutex.
- Use synchronouse requests to Kuma instead of long polling, since there is a little sense
  in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval.
- Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used.
- Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus.
- Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability.
- Remove unused fields from discoveryRequest and discoveryResponse.
- Update tests.
- Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config.
- Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback,
  since these are public types.

Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label,
since usually this label is set by the Prometheus itself to __address__ after the relabeling phase.
See https://www.robustperception.io/life-of-a-label/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3389

See https://github.com/prometheus/prometheus/issues/7919
and https://github.com/prometheus/prometheus/pull/8844
as a reference implementation in Prometheus
2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
eb08579452 lib/promscrape/discovery: add a comment explaining why duplicates are removed from the generated target labels 2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
4f649a0573 docs/CHANGELOG.md: document 110c3896e7
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3600
2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
f8811411fd app/vmagent/remotewrite: removed unneeded code in testPushWriteRequest after 57801660ab 2023-02-22 17:51:50 -08:00
Alexander Marshalov
b6845951a5 fixed typo in dns+srv documentation (#3861) 2023-02-23 02:40:00 +01:00
Zakhar Bessarab
d2b92d3264 lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3853)
* lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (see #3747)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: fix order of params for `doRequestWithPossibleRetry` to follow codestyle

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: accept deadline explicitly and extend passed context for local use

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-22 17:05:16 +01:00
Artem Navoiev
84c82e988d Add Dig Security case study
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 16:38:57 +01:00
Alexander Marshalov
317fef95f9 add kuma_sd_config for Kuma Control Plane targets discovery (#3389) (#3840) 2023-02-22 13:59:56 +01:00
Dmytro Kozlov
110c3896e7 app/vmctl: add retry backoff policy (#3844)
app/vmctl: move retries logic into a separate pkg
2023-02-22 13:06:55 +01:00
Alexander Marshalov
57801660ab Fix TestPushWriteRequest for remotewriteprotocol for pure-test (with native zstd) (#3850)
app/vmagent: fix TestPushWriteRequest for pure-test (with native zstd)

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-22 11:26:19 +01:00
Artem Navoiev
6b4ccc17c2 docs: fix typo OpentTSDB -> OpenTSDB (#3854)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 11:24:26 +01:00
Aliaksandr Valialkin
10cf6c9781 app/vmselect: allow zero value for -search.latencyOffset command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2061#issuecomment-1299109836

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/218
2023-02-21 18:06:55 -08:00
Aliaksandr Valialkin
836d56876a vendor: make vendor-update 2023-02-21 18:06:20 -08:00
Aliaksandr Valialkin
d59dc7616d go.mod: update github.com/VictoriaMetrics/fastcache from v1.12.0 to v1.12.1 2023-02-21 17:51:41 -08:00
Aliaksandr Valialkin
ffebc20f6d vendor: update github.com/VictoriaMetrics/fasthttp from v1.1.0 to v1.2.0
The v1.2.0 adds HostClient.DoCtx() function, which is needed by https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
for implementing fast canceling of pending requests to scrape targets on config update
2023-02-21 17:49:30 -08:00
Aliaksandr Valialkin
f04ec714c2 docs/Articles.md: mention rules backfilling via vmalert article
This is a follow-up for 5446ce0018
2023-02-21 17:44:08 -08:00
Roman Khavronenko
5446ce0018 docs: mention rules replay blogpost in vmalert docs (#3851)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-21 22:16:15 +01:00
panguicai
21546e6922 docs: update operator release name to be consistent with the following (#3845)
Signed-off-by: panguicai008 <1121906548@qq.com>
2023-02-21 16:57:54 +01:00
Aliaksandr Valialkin
7be8a8a37b docs/CHANGELOG.md: fix a link to VictoriaMetrics remote write protocol 2023-02-20 19:59:44 -08:00
Aliaksandr Valialkin
6c38702212 docs/Single-server-VictoriaMetrics.md: remove + from start and end query args, since curl substitutes them with whitespace and breaks the query 2023-02-20 19:32:09 -08:00
Aliaksandr Valialkin
ef1ef0598b docs/guides/migrate-from-influx.md: remove misleading doublequotes from the query example 2023-02-20 19:25:51 -08:00
Zakhar Bessarab
9d91d8fc91 vmgateway: add support of JWKS endpoint usage for JWT keys verification (#521) 2023-02-20 19:22:55 -08:00
Aliaksandr Valialkin
f4f1f2f976 docs/vmagent.md: remove the claim that VictoriaMetrics remote write protocol reduces the network bandwidth usage by up to 10x comparing to Prometheus remote write protocol
The 10x savings are reproduced only on artificial data.
The savings on production data are usually in the range 2x-4x.
2023-02-20 19:11:31 -08:00
Aliaksandr Valialkin
a678cbe62e docs/vmagent.md: mention that Mimir doesnt support backfilling 2023-02-20 19:11:30 -08:00
Aliaksandr Valialkin
76f2c70be3 app/vmagent: add support for VictoriaMetrics remote write protocol, which allows saving up to 10x on network bandwidth costs under high load
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-20 19:11:30 -08:00
Roman Khavronenko
74f122294d docs: update vmalert docs (#3843)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-20 16:39:31 +01:00
Corporte Gadfly
d5171c155c docs: typo fix (#3839) 2023-02-20 10:06:19 +01:00
Aliaksandr Valialkin
2cea923ff7 app/vmselect/promql: add share(q) aggregate function for normalizing results across multiple time series in [0..1] value range per each timestamp and aggregation group 2023-02-18 22:42:01 -08:00
Aliaksandr Valialkin
c86f1f1d1b app/vmselect/promql: add range_zscore(q) and range_trim_zscore(z, q) functions
These functions may be useful for dropping outliers at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759
2023-02-18 22:41:59 -08:00
Aliaksandr Valialkin
1030be91ae vendor: make vendor-update 2023-02-18 15:36:41 -08:00
Aliaksandr Valialkin
8bd2eee8e0 app/vmalert/README.md: sync with docs/vmalert.md after 6ef6f3a771 2023-02-18 15:21:24 -08:00
Aliaksandr Valialkin
406822a16c app/vmselect/promql: add range_mad(q) and range_trim_outliers(k, q) functions
These functions may help trimming outliers during query time
for the use case described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759
2023-02-18 15:19:10 -08:00
Aliaksandr Valialkin
c8467f37a9 vendor: update github.com/valyala/gozstd from v1.17.0 to v1.18.0 2023-02-18 15:19:10 -08:00
Haleygo
6ef6f3a771 vmalert: fix maxResolveDuration flag note (#3827)
Signed-off-by: Haleygo <hui.wang@daocloud.io>
2023-02-16 19:26:17 +01:00
Duc Tran
b58107c85a docs: fix links to alerts and alertmanager (#3829) 2023-02-16 19:25:10 +01:00
Aliaksandr Valialkin
87f1ed5d87 app/vmui: tooltip formatting enhancements according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706#issuecomment-1429980038 2023-02-14 23:37:26 -08:00
Aliaksandr Valialkin
11ce30820b all: update Go builder from Go1.20.0 to Go1.20.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.1+label%3ACherryPickApproved
2023-02-14 23:05:16 -08:00
Aliaksandr Valialkin
5123a61be9 vendor: make vendor-update 2023-02-13 11:14:17 -08:00
Aliaksandr Valialkin
5c4f5b83fc all: rename ParseStream -> stream.Parse
This is a follow-up for 057698f7fb
2023-02-13 10:52:05 -08:00
Aliaksandr Valialkin
ccdddf7996 lib/protoparser/promremotewrite: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:46:54 -08:00
Aliaksandr Valialkin
9be1398b92 lib/protoparser/native: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:43:05 -08:00
Aliaksandr Valialkin
8830607021 lib/protoparser/graphite: extract stream parsing code into a separate stream package 2023-02-13 10:32:36 -08:00
Aliaksandr Valialkin
a646841c07 lib/protoparser/csvimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:25:46 -08:00
Aliaksandr Valialkin
7568658c19 lib/protoparser/vmimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:20:19 -08:00
Aliaksandr Valialkin
af37717108 lib/protoparser/opentsdbhttp: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:16:03 -08:00
Aliaksandr Valialkin
7720d403c0 lib/protoparser/opentsdb: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:03:16 -08:00
Aliaksandr Valialkin
fe196e0b7a lib/protoparser/influx: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:58:52 -08:00
Aliaksandr Valialkin
f83d6d69b2 lib/protoparser/datadog: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:51:47 -08:00
Aliaksandr Valialkin
f3be9483f4 docs/CHANGELOG.md: improve the docs for 8ea02eaa8e 2023-02-13 09:41:46 -08:00
Roman Khavronenko
057698f7fb lib/protoparser/prometheus: move streamparser to subpackage (#3814)
`lib/protoparser/prometheus` is used by various applications,
such as `app/vmalert`. The recent change to the
`lib/protoparser/prometheus` package introduced a new dependency
of `lib/writeconcurrencylimiter` which exposes some metrics.
Because of the dependency, now all applications which have this
dependency also expose these metrics.

Creating a new `lib/protoparser/prometheus/stream` package helps
to remove these metrics from apps which use `lib/protoparser/prometheus`
as dependency.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 09:26:07 -08:00
Roman Khavronenko
a645a95bd6 docs: improve troubleshooting docs for vmalert (#3812)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 17:29:30 +01:00
Roman Khavronenko
934646ccf7 follow-up after d1cbc35cf6 (#3813)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 16:23:50 +01:00
Droxenator
8ea02eaa8e fixed opentsdbListenAddr timestamp conversion (#3810)
Co-authored-by: Andrei Ivanov <a.ivanov@corp.mail.ru>
2023-02-13 16:07:53 +01:00
Artem Navoiev
2e2b1ba87e change docs for VictoriaMetrics Managed (#3803)
* change docs for VictoriaMetrics Managedd

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* Update docs/managed-victoriametrics/user-managment.md

Co-authored-by: Max Golionko <8kirk8@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Max Golionko <8kirk8@gmail.com>
2023-02-13 13:29:03 +01:00
Oleksandr Redko
9fff48c3e3 app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
438b2e11bd app/vmauth: allow specifying max_concurrent_requests value on a per-user basis bigger than the -maxConcurrentPerUserRequests value 2023-02-11 20:53:08 -08:00
Aliaksandr Valialkin
e5070c0bcd docs/sd_configs.md: properly escape __address__ string 2023-02-11 14:45:54 -08:00
Aliaksandr Valialkin
c5b9f7f751 docs/sd_configs.md: document how the __address__ label is generated per each discovered target 2023-02-11 14:41:59 -08:00
Aliaksandr Valialkin
f9b3409ee3 lib/promscrape/discovery/openstack: use port 80 for the discovered target by default if it isnt specified in the config 2023-02-11 14:41:58 -08:00
Aliaksandr Valialkin
25a9017a72 app/vmui: show median instead of avg on graph tooltip and line legend
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706
2023-02-11 12:51:12 -08:00
Aliaksandr Valialkin
3ec8a4dc80 lib/{mergeset,storage}: allow at least 3 concurrent flushes during background merges on systems with 1 or 2 CPU cores
This should prevent from data ingestion slowdown and query performance degradation
on systems with small number of CPU cores (1 or 2), when big merge is performed.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
2023-02-11 12:08:52 -08:00
Aliaksandr Valialkin
4156e78e11 docs/Articles.md: add a link to https://www.techetio.com/2022/08/21/evaluating-backend-options-for-prometheus-metrics/ 2023-02-11 12:08:52 -08:00
Roman Khavronenko
b209d4ace0 dashboards: use median instead of avg (#3800)
`avg` can be affected by just one outlier, which may lead
to false conclusions. `median` is supposed to reflect
reality better by leveling outliers out.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-11 10:01:30 -08:00
Aliaksandr Valialkin
b11bdc46be app/vmselect/promql: add mad_over_time(m[d]) function
See https://github.com/prometheus/prometheus/issues/5514
2023-02-11 01:06:20 -08:00
Aliaksandr Valialkin
ed4492ddd5 all: update alpine base docker image from 1.17.1 to 1.17.2
See https://alpinelinux.org/posts/Alpine-3.17.2-released.html
2023-02-11 00:37:17 -08:00
Aliaksandr Valialkin
776391917f app/vmauth: improve load balancing by sending incoming requests to backends with the lowest number of concurrent requests
While at it, stop sending requests to unavailable backend for 3 seconds
before the next attempt. This should reduce the amounts of useless work
and the number of useless network packets when the backend is temporarily unavailable.
2023-02-11 00:30:31 -08:00
Aliaksandr Valialkin
f3625e4f3f app/vmauth: add -maxConcurrentPerUserRequests command-line option for limiting the number of concurrent requests on a per-user basis
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346
2023-02-10 21:58:21 -08:00
Aliaksandr Valialkin
70f8911ca7 app/vmauth: automatically retry failing GET requests on the remaining backends 2023-02-09 21:05:55 -08:00
Dmytro Kozlov
f582f9e8ab app/vmauth: add concurrent requests limit per auth record (#3749)
* app/vmauth: add concurent requests limit per auth record

* app/vmauth: added clarification comment

* app/vmauth: remove unused code

* app/vmauth: move read from limiter

* app/vmauth: fix text

* app/vmauth: fix comments

* - Clarify the docs for the max_concurrent_requests option at docs/vmauth.md
- Clarify the description of the change at docs/CHANGELOG.md
- Make sure that the -maxConcurrentRequests takes precedence over per-user max_concurrent_requests
- Update tests for verifying that the max_concurrent_requests option is parsed properly

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 20:03:01 -08:00
Aliaksandr Valialkin
73358571ee app/vmalert: follow-up after d3c64aae8768d58781ee7e358bd7f3d8e0eb836d
- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
2023-02-09 18:52:00 -08:00
Roman Khavronenko
14c20c1843 vmalert: support object storage for rules (#519)
* vmalert: support object storage for rules

Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.

* review fixes
2023-02-09 18:50:48 -08:00
Aliaksandr Valialkin
c12eb2cc7f deployment/docker: update VictoriaMetrics Docker images from v1.87.0 to v1.87.1 2023-02-09 15:53:08 -08:00
Aliaksandr Valialkin
291c41978e vendor: make vendor-update 2023-02-09 14:48:16 -08:00
Aliaksandr Valialkin
d4b97b69bf docs/CHANGELOG.md: document d621d50d4fb3b43a0bcb4419bee979f0192d38fe 2023-02-09 14:40:07 -08:00
Aliaksandr Valialkin
035a2b5ed5 all: skip issues with low severity at docker scan 2023-02-09 14:25:13 -08:00
Aliaksandr Valialkin
0e0095d350 all: run apk update && apk upgrade in base Alpine Docker image in order to get all the recent security fixes 2023-02-09 14:01:32 -08:00
Aliaksandr Valialkin
a42e3e8dfb docs/CHANGELOG.md: cut v1.87.1 and mark 1.87.x as LTS release 2023-02-09 11:20:57 -08:00
Zakhar Bessarab
f13a255918 lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3791)
* lib/promscrape: fix cancelling in-flight scrape requests during configuration reload when using `streamParse` mode (see #3747)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 11:13:06 -08:00
Aliaksandr Valialkin
513707a8c7 app/vmui: UX enhancements for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706
- Display `min` value additionally to `avg`, `max` and `last`
- Allow copy-n-pasting metric name with its labels from both legend and tooltup
2023-02-09 11:04:51 -08:00
Aliaksandr Valialkin
f40661e7b7 docs/vmagent.md: clarify that automatically generated metrics contain all the target-specific labels, including instance and job 2023-02-09 11:04:51 -08:00
Air
a1432e6b0a Possibly spelling in the Quick start 2023-02-09 15:50:20 +02:00
Yury Molodov
bff18cb5dd vmui: lazy loading predefined panels (#3795)
* fix: change logic lazy loading predefined panels

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 00:11:55 -08:00
Yury Molodov
e1063ce3c1 vmui: improve tenant selector (#3794)
* fix: change styles tenant selector (#3792)

* docs/CHANGELOG.md: document the change

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3792

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 00:08:59 -08:00
Aliaksandr Valialkin
46a521191f docs/CHANGELOG.md: document changes at v1.79.8 LTS release 2023-02-08 23:38:46 -08:00
Yury Molodov
8afc0aef8d vmui: add last/max/avg values (#3789)
* feat: add last/max/avg values (#3706)

* fix: change filter exclude values

* app/vmui: wip

- improve the visualization for avg/max/last values
- make getAvgFromArray() function resilient against inf/undefined/nil
- export getLastFromArray() function, which is resilient against inf/undefined/nil
- run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-08 22:41:20 -08:00
Aliaksandr Valialkin
114c14febf docs/CHANGELOG.md: document 75bcf86a31
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3740
2023-02-08 11:24:07 -08:00
Yury Molodov
75bcf86a31 fix: turn off the local dashboards(#3740) (#3793) 2023-02-08 11:13:15 -08:00
Aliaksandr Valialkin
a1ee679042 docs/CHANGELOG.md: add more context to the bugfix description in Nomad service discovery
See 146fd2eca3
2023-02-08 09:24:48 -08:00
Aliaksandr Valialkin
a8e88e74cc lib/backup/azremote: fix after upgrading github.com/Azure/azure-sdk-for-go/sdk/storage/azblob from v0.6.1 to v1.0.0 2023-02-08 09:18:23 -08:00
Aliaksandr Valialkin
c9d2934bb4 vendor: make vendor-update 2023-02-08 08:55:14 -08:00
Aliaksandr Valialkin
6c21b6ec09 docs/CHANGELOG.md: document the change at 67b01329a0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
2023-02-08 08:42:19 -08:00
Roman Khavronenko
e83f14210d Vmalert fixes (#3788)
* vmalert: use group's ID in UI to avoid collisions

Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: prevent disabling state updates tracking

The minimum number of update states to track is now set to 1.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: display `debug` field for rule in UI

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: exclude `updates` field from json marhsaling

This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 14:34:03 +01:00
Max Golionko
6495b62866 bump go to 1.20 in ci jobs (#3787) 2023-02-08 14:32:42 +01:00
Roman Khavronenko
86e47177dc docs: follow-up after 2e4bfcce63 (#3785)
2e4bfcce63

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 09:48:05 +01:00
Karan Sharma
146fd2eca3 sd/nomad: panic in nomad watcher because of nil map (#3784)
properly initialize url.Values
2023-02-08 09:43:29 +01:00
Aliaksandr Valialkin
67b01329a0 lib/writeconcurrencylimiter: initialize concurrencyLimitCh before exporting vm_concurrent_insert_capacity and vm_concurrent_insert_current metrics
This will result in proper calculations for the the alerting rule:

 avg_over_time(vm_concurrent_insert_current[1m]) >= vm_concurrent_insert_capacity

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
2023-02-07 11:08:17 -08:00
Aliaksandr Valialkin
f2be447270 Makefile: update golangci-lint from v1.50.1 to v1.51.1 2023-02-07 11:08:11 -08:00
Aliaksandr Valialkin
1901fbf19b docs/CHANGELOG.md: fix formatting for the change from 6fd10e8871 2023-02-07 09:34:57 -08:00
earthgecko
3a51a3bc42 Clarifications between standalone/cluster ingestion endpoints (#3771)
docs: clarifications between standalone/cluster ingestion endpoints

This is an attempt to make it a bit clearer to the user that the cluster version ingestion URLs are different from the standalone ones.  I have also changed the order of the list items to make it a bit clearer and hopefully stop the user simply inferring that `/prometheus/api/v1` is only related to Prometheus data.
2023-02-07 15:17:12 +01:00
Max Golionko
6f24fa2055 CI: speedup build by 2.4x. restore nightly build (#3772)
* setup docker buildx
* add snyk integration
* add go cache for docker build
* cancel redundant job if there is new commit into same PR or branch
2023-02-07 10:12:16 +08:00
Max Golionko
977c642934 docs: update formatiing for k8s monitoring with Managed VictoriaMetrics (#3768)
* jekyll formatting madness
2023-02-06 18:33:40 +08:00
Roman Khavronenko
c32d8ea29e vmalert: update docs (#3770)
vmalert: update flags description

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-06 09:51:30 +01:00
Denys Holius
8aa7559462 fixed wrong vmstorage port number (#3769) 2023-02-06 09:08:50 +01:00
Roman Khavronenko
6fd10e8871 vmalert: speed up state restore procedure on start (#3758)
* vmalert: speed up state restore procedure on start

Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.

While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.

This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.

See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* make lint happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-03 19:46:13 -08:00
Aliaksandr Valialkin
0a824d9490 app/vmselect/vmui: make vmui-update after e4c04b6dbe 2023-02-03 19:34:01 -08:00
Yury Molodov
e4c04b6dbe vmui: set light theme for app mode (#3748)
* fix: set light theme for app mode

* fix: check inputTenantID flag

* fix: rename inputTenantID to useTenantID
2023-02-03 19:31:37 -08:00
Aliaksandr Valialkin
98dc968920 docs/CHANGELOG.md: document f63f487787
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3707
2023-02-03 19:30:12 -08:00
Yury Molodov
f63f487787 vmui: mobile view (#3742)
* feat: add detect the system theme

* fix: change logic fetch tenants

* feat: add docs and info to cardinality page

* feat: add mobile view #3707
2023-02-03 19:27:57 -08:00
Aliaksandr Valialkin
88fed0232c dashboards: typo fix Datapoints scanned per series -> Datapoints scanned per query 2023-02-03 19:12:33 -08:00
Aliaksandr Valialkin
af1a9c5eda deployment/docker: update Go builder from Go1.19.5 to Go1.20.0
See https://go.dev/blog/go1.20
2023-02-03 14:17:33 -08:00
Aliaksandr Valialkin
7440f971ab docs/MetricsQL.md: add links to "rollup results" explanation 2023-02-03 11:09:42 -08:00
Aliaksandr Valialkin
12cf8d9f69 docs/managed-victoriametrics/how-to-monitor-k8s.md: rename image files according to docs/assets/README.md 2023-02-03 10:43:02 -08:00
Max Golionko
e18f8e9413 docs: move managed victoria metics guide into right folder (#3750)
* move guide folder
* image width control
2023-02-03 03:13:36 +08:00
Max Golionko
07b7fe83c4 Update docs/managed_victoriametrics/how-to-monitor-k8s.md
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-02 15:43:55 +02:00
Max Golionko
a2ba1f09e4 add guide to list of guides 2023-02-02 15:43:55 +02:00
Max Golionko
79527441ec added k8s guide for managed VM 2023-02-02 15:43:55 +02:00
Max Golionko
df1e545c0e disable codeql for docs. merge build and test back to one job (#3746) 2023-02-02 20:59:08 +08:00
Aliaksandr Valialkin
dc142867b8 docs/CHANGELOG.md: typo fixes 2023-02-01 20:40:44 -08:00
Aliaksandr Valialkin
da539bc286 deployment/docker: update VictoriaMetrics docker image tag from v1.86.2 to v1.87.0 2023-02-01 20:03:04 -08:00
Aliaksandr Valialkin
fe736c5388 docs/CHANGELOG.md: cut v1.87.0 2023-02-01 13:03:11 -08:00
Aliaksandr Valialkin
607b542222 vendor: make vendor-update 2023-02-01 12:23:23 -08:00
Aliaksandr Valialkin
8b9ebf625a lib/promscrape: add a comment explaining the logic behind adding exported_ perfix to metric names
This is a follow-up for 7b87fac8e7

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3557
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3406
2023-02-01 12:00:52 -08:00
Dmytro Kozlov
7b87fac8e7 lib/promscrape: fix honor_labels behavior (#3739)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-01 11:21:44 -08:00
Aliaksandr Valialkin
7b1caf1db3 docs/CHANGELOG.md: document 9254e494f9 2023-02-01 09:56:52 -08:00
Nikolay
9254e494f9 lib/storage: fixes finalDedup for backfilled data (#3737)
previously historical data backfilling may trigger force merge for previous month every hour
it consumes cpu, disk io and decrease cluster performance.
Following commit fixes it by applying deduplication for InMemoryParts
2023-02-01 09:54:21 -08:00
Zakhar Bessarab
68985455f1 fix: vmselect multi-level setup panic (#3738)
* app/vmselect/netstorage: fix panic for multi-level cluster setup when `replicationFactor` was set and request contained `trace` parameter (#3734)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmselect/netstorage: use correct context for retry

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-01 08:59:30 -08:00
Zakhar Bessarab
4cf37c5e70 app/vmbackup: fix deleting snapshot after backup completion (#3735) (#3736)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-01 11:23:58 +01:00
Aliaksandr Valialkin
3d331e4c5d app/vmselect/vmui: make vmui-update after dcc5616126 2023-01-31 13:24:43 -08:00
Aliaksandr Valialkin
4ecb61e247 docs/CHANGELOG.md: document 442a9f16b4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3661
2023-01-31 13:03:46 -08:00
Denys Holius
442a9f16b4 Makefile: adds i386 architecture (#3725)
see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3661
2023-01-31 12:58:53 -08:00
Yury Molodov
dcc5616126 vmui: improvement the theme (#3731)
* feat: add detect the system theme

* fix: change logic fetch tenants

* feat: add docs and info to cardinality page

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-31 12:54:59 -08:00
Aliaksandr Valialkin
080a3e2396 vendor: make vendor-update 2023-01-31 11:03:20 -08:00
Aliaksandr Valialkin
ac8bc77688 lib/bytesutil/internstring.go: increase the limit on the maximum string lengths, which can be interned
The limit has been increased from 300 bytes to 500 bytes according to the collected production stats.
This allows reducing CPU usage without significant increase of RAM usage in most practical cases.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
2023-01-31 10:56:55 -08:00
Roman Khavronenko
1cbdcd391c docs: mention -vmalert.proxyURL in vmalert docs (#3730)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-30 16:28:33 +01:00
Aliaksandr Valialkin
0788be35eb lib/promscrape/discovery/azure: add __meta_azure_machine_size label in the same way as Prometheus does
See https://github.com/prometheus/prometheus/pull/11650
2023-01-27 17:07:12 -08:00
Aliaksandr Valialkin
ab57b92932 lib/promscrape/discovery/kubernetes: add support for __meta_kubernetes_pod_container_id
See https://github.com/prometheus/prometheus/issues/11843
and https://github.com/prometheus/prometheus/pull/11844
2023-01-27 16:34:06 -08:00
Aliaksandr Valialkin
6a7faf9f22 vendor: make vendor-update 2023-01-27 15:57:38 -08:00
Yury Molodov
ac14d50c18 vmui: add select of Tenant ID (#3673)
* feat: add select of tenantID

* feat: replace tenantID to default url

* fix: move the tenantID selector to the top header

* fix: hide tenantID selector by condition

* fix: correct z-index

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-27 15:53:14 -08:00
Aliaksandr Valialkin
51ad94677c docs: update security chapters after bd716d1b0c 2023-01-27 15:45:35 -08:00
Denys Holius
bd716d1b0c Improving docs by adding additional security sections (#3713)
* docs/Cluster-VictoriaMetrics.md: adds security section

* docs/Quick-Start.md: adds Security recommendation section
2023-01-27 15:22:21 -08:00
Aliaksandr Valialkin
06f6b76521 app/vmagent: properly return 200 response code when importing data via Prometheus PushGateway protocol
This is the same fix as has been already applied to app/vminsert at cdb6d651e9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1415
2023-01-27 14:40:02 -08:00
Aliaksandr Valialkin
a0c8b86eab docs/vmauth.md: update docs after ff39a91147
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346
2023-01-27 14:10:19 -08:00
Aliaksandr Valialkin
ff39a91147 app/vmauth: limit the number of concurrent requests served by vmauth with the -maxConcurrentRequests command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346

This commit is based on the https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3486
2023-01-27 14:07:30 -08:00
Aliaksandr Valialkin
372b1688d7 app/vmauth: do not use net/http/httputil.ReverseProxy
This allows better controlling requests to backends and providing better error logging.
For example, if the backend was unavailable, then the ReverseProxy was logging the error
message without client ip and the initial request uri. This could harden debugging.

This is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3486
2023-01-27 13:40:05 -08:00
Aliaksandr Valialkin
1b81d8f542 lib/netutil: move IsTrivialNetworkError() function there, since it is used in multiple places across the code 2023-01-27 13:24:30 -08:00
Aliaksandr Valialkin
7e355080ce app/vmauth: pass the target url to reverse proxy via context.Value instead of request header
This is less hacky way, since it doesn't clash with request headers
2023-01-27 12:15:52 -08:00
Aliaksandr Valialkin
2afa4ae00a docs/managed-victoriametrics: typo fix in links to images 2023-01-27 11:36:20 -08:00
Aliaksandr Valialkin
6342eb5523 docs/assets/README.md: mention that locally placed doc-specific images simplify referring them from various views without the need to deal with folder prefixes 2023-01-27 11:29:23 -08:00
Aliaksandr Valialkin
54a0ccbaca docs/managed-victoriametrics/user-management.md: move the associated images to docs/managed-victoriametrics/ folder with user-management_ prefix according to docs/assets/README.md 2023-01-27 11:25:22 -08:00
Aliaksandr Valialkin
b28cf0faa8 docs/managed-victoriametrics/quickstart.md: move the associated images to docs/managed-victoriametrics/ folder with quickstart_ prefix according to docs/assets/README.md 2023-01-27 11:16:03 -08:00
Aliaksandr Valialkin
3251d392b5 docs/assets: add README.md with the explanation on which files can be put into the docs/assets folder 2023-01-27 11:02:16 -08:00
Aliaksandr Valialkin
119010f7f2 docs/Cluster-VictoriaMetrics.md: move Naive_cluster_scheme.png from the docs/assets/images/ folder into docs/ folder and add Cluster-VictoriaMetrics_ prefix to the image name
The docs/assets folder should be used only for assets specific to docs generation at https://docs.victoriametrics.com, e.g. css, js and images.

All the other assets related to specific docs should be placed in the same folder as the corresponding *.md file.
These assets should have the same name prefix as the corresponding doc file name. This simplifies tracking the lifetime of these assets.
For example, if the doc is removed, it is very easy to remove all assets associated with it with a simple `rm -rf docs/doc-name*` command.

This also simplifies generating correct urls for doc-specific assets from both https://docs.victoriametrics.com
and from https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/ - just refer to the asset name without any directory prefixes.
2023-01-27 10:54:27 -08:00
Aliaksandr Valialkin
eedb294754 lib/netutil: typo fix in the error message 2023-01-27 10:38:38 -08:00
dmitryk-dk
4250b67fe8 docs: move how to register from dbaas to docs 2023-01-27 14:07:09 +02:00
dmitryk-dk
465c889f7f docs: use absolute path 2023-01-27 14:07:09 +02:00
dmitryk-dk
834c18a346 docs: add documentation of user management on managed-vm 2023-01-27 14:07:09 +02:00
Aliaksandr Valialkin
36941d6d75 app/vmauth: consistency renaming: UserInfo.URLMap -> UserInfo.URLMaps
This is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3486
2023-01-27 00:19:02 -08:00
Aliaksandr Valialkin
558165521b docs/Cluster-VictoriaMetrics.md: update command-line descriptions after ebebaecd94 2023-01-27 00:04:41 -08:00
Aliaksandr Valialkin
0890adde67 docs: update command-line descriptions after 73256fe438 2023-01-27 00:00:37 -08:00
Aliaksandr Valialkin
28d92a2f31 lib/netutil: limit the time needed for reading proxy protocol headers
This should prevent from misconfigured proxies and from possible Slowloris-type DoS attacks
(see https://en.wikipedia.org/wiki/Slowloris_(computer_security) )

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-01-26 23:46:51 -08:00
Aliaksandr Valialkin
cb374677a9 app/vmagent/prometheusimport: delete the temporary directory created by vmagent after the test is complete
This is a follow-up for 1cfa183c2b
2023-01-26 23:21:24 -08:00
Nikolay
73256fe438 lib/netutil: init implimentation of proxy protocol (#3687)
* lib/netutil: init implimentation of proxy protocol
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-26 23:08:35 -08:00
Aliaksandr Valialkin
2c1419c687 docs/CHANGELOG.md: make the description for the bugfix from 465a285324 more reader-friendly 2023-01-26 10:08:13 -08:00
Nikolay
465a285324 lib/storage: properly release parts inMerge lock (#3711)
if storage doesn't have enough disk space, finalDedupWatcher holds inMerge lock for all parts and never release it until storage restart
2023-01-26 08:05:20 -08:00
Roman Khavronenko
95ee86b600 docs: specify the time window for series_limit (#3708)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-25 09:30:20 -08:00
Aliaksandr Valialkin
28f66f0079 docs: update the list of command-line flags according to the latest changes 2023-01-25 09:20:24 -08:00
Aliaksandr Valialkin
d655d6b047 lib/streamaggr: add ability to de-duplicate input samples before aggregation 2023-01-25 09:14:49 -08:00
Yury Molodov
99d49e3ceb vmui: include fonts in its bundle (#3705)
* feat: include fonts in the build

* fix: reduce size fonts

* wip

- Document the change at docs/CHANGELOG.md
- Run `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-24 09:30:56 -08:00
Yury Molodov
20ad848c5d vmui: improvements to the UI styles (#3704)
* feat: add dark theme

* update packages

* feat: add multilevel menu (#3678)

* fix: correct styles

* fix: update link to cardinality-explorer

* fix: remove unused scss variables

* docs/CHANGELOG.md: document the changes

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-24 09:20:31 -08:00
Roman Khavronenko
1a8875b417 discover/ec2: follow-up after e2b4ab8384 (#3703)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-24 11:02:18 +01:00
Roman Khavronenko
c7c4786f3f discover/ec2: bump API version (#3702)
Switch to the actual API version `2016-11-15`,
since the old version doesn't provide access to all
the fields which implementation expects.
For example, old API missing `zone_id` field
in `DescribeAvailabilityZonesResponse` response.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3700

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-24 10:42:55 +01:00
Aliaksandr Valialkin
a971bcc3fe lib/bytesutil: do not intern long strings, since they may need big amounts of additional memory for the cache
Allow users fine-tuning the maximum string length for interning via -internStringMaxLen command-line flag.
This may be used for fine-tuning RAM vs CPU usage for certain workloads.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
2023-01-23 23:36:22 -08:00
Aliaksandr Valialkin
0fd440cdb4 deployment: sync with cluster branch 2023-01-23 22:59:00 -08:00
Aliaksandr Valialkin
c496f06ca3 app/vmagent/{promremotewrite,vmimport}: remove unused functions InsertHandlerForReader()
Thanks to 1cfa183c2b , where the first such function has been removed
2023-01-23 22:31:29 -08:00
Aliaksandr Valialkin
f7acdb13db app/{vmagent,vminsert}: follow-up for 1cfa183c2b
- Call httpserver.GetQuotedRemoteAddr() and httpserver.GetRequestURI() only when the error occurs.
  This saves CPU time on fast path when there are no parsing errors.
- Create a helper function - httpserver.LogError() - for logging the error with the request uri and remote addr context.
2023-01-23 22:26:53 -08:00
Artem Navoiev
1cfa183c2b add error handler for parsing prometheus text format to vmagent and v… (#3693)
* add error handler for parsing prometheus text format to vmagent and vminsert

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix typo

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* typo

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix variables naming and error message

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-23 22:14:34 -08:00
Yury Molodov
3536bef36e vmui: add open graph and twitter card tags (#3697)
* feat: add open graph and twitter card tags

* app/vmui: spelling fixes

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-23 22:04:46 -08:00
Aliaksandr Valialkin
babecd8363 lib/promscrape: follow-up for 393876e52a
- Document the change in docs/CHANGELOG.md
- Reduce memory usage when sending stale markers even more by parsing the response in stream parsing mode
- Update the TestSendStaleSeries

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
2023-01-23 21:52:59 -08:00
Roman Khavronenko
393876e52a lib/promscrape: limit number of sent stale series at once (#3686)
Stale series are sent when there is a difference between current
and previous scrapes. Those series which disappeared in the current scrape
are marked as stale and sent to the remote storage.

Sending stale series requires memory allocation and in case when too many
series disappear in the same it could result in noticeable memory spike.
For example, re-deploy of a big fleet of service can result into
excessive memory usage for vmagent, because all the series with old
pod name will be marked as stale and sent to the remote write storage.

This change limits the number of stale series which can be sent at once,
so memory usage remains steady.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3675
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-23 21:15:59 -08:00
Aliaksandr Valialkin
2c4e384f07 lib/promscrape: properly log the actual response size after c4229a1bba 2023-01-23 21:04:50 -08:00
Aliaksandr Valialkin
ba5a6c851c lib/storage: use deterministic random generator in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 20:10:32 -08:00
Aliaksandr Valialkin
1a3a6ef907 lib/mergeset: use deterministic random generator in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:43:49 -08:00
Aliaksandr Valialkin
7030429958 lib/mergeset: fix data race in BenchmarkInmemoryBlockMarshal 2023-01-23 19:43:18 -08:00
Aliaksandr Valialkin
30e968df6d app/vmselect: use consistent randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:27:25 -08:00
Aliaksandr Valialkin
f2b40dbe9a app/vmalert: use consistent randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:25:10 -08:00
Aliaksandr Valialkin
a11dc6689a lib/decimal: use consistent randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:23:39 -08:00
Aliaksandr Valialkin
0a4d8dc777 lib/uint64set: use repeatable randomizer in tests
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 19:22:58 -08:00
Aliaksandr Valialkin
3d1cb011b6 lib/encoding: make deterministic tests which rely on math/rand
Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3683
2023-01-23 18:41:09 -08:00
Aliaksandr Valialkin
a7f8ce5e3d vendor: make vendor-update 2023-01-23 08:05:54 -08:00
Artem Navoiev
7c1daade15 tests: use DebugFlush instead of vmstorage stop. This simplifies the logic and allows to remove test-only methodds (#3694)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-23 14:45:59 +01:00
Denys Holius
f3cb412508 Fix/remove vmanomaly from release guide (#3699)
* docs/Release-Guide.md: remove vmanomaly from release guide because it has own release cycle

* fixed a typo
2023-01-23 14:32:26 +01:00
Aliaksandr Valialkin
edc51b3119 docs/Articles.md: added missing third-party articles about VictoriaMetrics 2023-01-22 14:23:45 -08:00
Aliaksandr Valialkin
cb5d073bc9 docs/Single-server-VictoriaMetrics.md: make it clear that VictoriaMetrics supports both pull and push protocols at how to import time series data chapter 2023-01-22 14:04:29 -08:00
Aliaksandr Valialkin
864c651c03 docs/Articles.md: add https://dev.to/aws-builders/ultra-monitoring-with-victoria-metrics-1p2 2023-01-22 13:51:53 -08:00
Aliaksandr Valialkin
4e25bc2087 lib/vmselectapi: propagate timeout errors from vmselect to vmstorage instead of closing the connection established from vmselect to vmstorage
This is a follow-up for 20e9598254
2023-01-20 19:33:42 -08:00
Aliaksandr Valialkin
74df30456b app/vmselect: make vmui-update after df7b81b44d 2023-01-20 12:07:13 -08:00
Yury Molodov
df7b81b44d vmui: add support for time zone selection for older versions of browsers (#3680)
* fix: add check for support of getting time zones

* vmui: add support for time zone selection for older versions of browsers
2023-01-20 11:47:53 -08:00
Denys Holius
d513230d43 Adds some improvements to release guide docs (#3679)
* docs/Release-Guide.md: fixed a typo

* Release-Guide.md: adds missed steps for updating vmanomaly and vmgateway helm charts
2023-01-19 16:03:42 +01:00
Max Golionko
e8554cd1cb ci: checkout correct branch for build step (#3676) 2023-01-19 08:34:20 +01:00
Aliaksandr Valialkin
59b67f1cfa snap: update Go builder from v1.19.4 to v1.19.5 2023-01-18 14:05:18 -08:00
Aliaksandr Valialkin
adeec6e369 deployment/docker: update VictoriaMetrics components in docker-compose from v1.86.0 to v1.86.2 2023-01-18 12:57:39 -08:00
Aliaksandr Valialkin
9785b3f57f docs/CHANGELOG.md: cut v1.86.2 2023-01-18 12:01:07 -08:00
Aliaksandr Valialkin
9c62391a5c .github/workflows: remove obsolete make targets: install-goling and install-errcheck
These targets became obsolete after ec2c82e800
2023-01-18 11:48:29 -08:00
Max Golionko
59b97f26c0 CI: split js and go codeql, split test and build, enable matrix for test (#3670)
* split js and go codeql, split test and build, enable matrix for test

* checkout before go setup

* enable build for PRs as well

* update filter
2023-01-18 11:42:27 -08:00
Aliaksandr Valialkin
19e28ce7b6 docs/CHANGELOG.md: document 777038fe44
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3672
2023-01-18 11:39:48 -08:00
Tobias Jungel
777038fe44 app/vmbackup: prevent password leaks (#3672)
This prevents vmbackup from leaking passwords into logs like shown below.

2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:12   build version: vmbackup-20221214-211706-tags-v1.85.1-0-g09a70d3e9
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:13   command-line flags
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:20     -dst="fs:///vm-backups/latest"
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:20     -snapshot.createURL="http://user:super_sercret123@victoriametricspshot/create"
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:20     -storageDataPath="/storage"
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/app/vmbackup/main.go:53 Snapshot create url http://user:super_sercret123@victoriametrics:8428/snapshot/create
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/app/vmbackup/main.go:60 Snapshot delete url http://user:super_sercret123@victoriametrics:8428/snapshot/delete
2023-01-18 11:35:21 -08:00
Aliaksandr Valialkin
e867df5ef5 app/vmui: increase perceived performance by 2.5x by reducing the delay before the query execution from 0.8s to 0.3s
The delay cannot be removed, since it is used for limiting the rate of queries sent to VictoriaMetrics during graph scrolling.
2023-01-18 01:33:48 -08:00
Aliaksandr Valialkin
2ac530eb28 lib/{storage,mergeset}: wake up background merges as soon as there is a potential work for them
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
2023-01-18 01:10:18 -08:00
Aliaksandr Valialkin
b8409d6600 lib/{storage,mergeset}: do not run assisted merges when flushing pending samples to parts
Assisted merges are intended to be performed by goroutines, which accept the incoming samples,
in order to limit the data ingestion rate.

The worker, which converts pending samples to parts, shouldn't be penalized by assisted merges,
since this may result in increased number of pending rows as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647#issuecomment-1385039142
when the assisted merge takes too much time.
2023-01-18 00:20:58 -08:00
Aliaksandr Valialkin
1ac025bbc9 lib/storage: use better naming for a function returning new []rawRows - newRawRowsBlock() -> newRawRows() 2023-01-18 00:01:03 -08:00
Aliaksandr Valialkin
0c625185cb app/vmselect/promql: updates tests for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3664 2023-01-17 23:25:45 -08:00
Aliaksandr Valialkin
68463c9e87 lib/promscrape: follow-up for d79f1b106c
- Document the fix at docs/CHANGELOG.md
- Limit the concurrency for sendStaleMarkers() function in order to limit its memory usage
  when big number of targets disappear and staleness markers are sent
  for all the metrics exposed by these targets.
- Make sure that the writeRequestCtx is returned to the pool
  when there is no need to send staleness markers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
2023-01-17 23:11:56 -08:00
lzfhust
d79f1b106c using writeRequestCtxPool when delete kubernetes clusters from kubernetes_sd_configs (#3669) 2023-01-17 22:57:56 -08:00
Zakhar Bessarab
322d96bfe5 discovery/{consul,nomad}: fix cancelling serviceWatcher in-flight requests (#3658)
* lib/promscrape/discovery/{consul,nomad}: fix background service update watches not canceling requests on serviceWatcher stop

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/{consul,nomad}: fix closing serviseWatcher during scrape job restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-17 21:47:11 -08:00
Scott Kevill
46b3b76d6d lib/fs: use unix.Statfs() / unix.Statvfs() when using a path (#3663) 2023-01-17 21:19:26 -08:00
Roman Khavronenko
5cb8ce8174 ci: disable JS codeQL check (#3659)
We have limited amount of time used by Github CI runners
and JS analysis accounts for a half of it.
Since JS represents only a small fraction of the codebase
and is solely maintained by one person - I suggest to disable
the CodeQL check in order to save CI runners time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-17 21:05:27 -08:00
Yury Molodov
fcef2ff6b2 vmui: correctly display range results in Table view (#3657)
* fix: properly display range results

* fix: set range values to empty array

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-17 21:03:28 -08:00
Aliaksandr Valialkin
1ffa793322 docs/CHANGELOG.md: fix the link to feature request implemented at e58921aa8f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3571
2023-01-17 20:27:43 -08:00
Yury Molodov
e58921aa8f vmui: give more visually different colors to graph lines (#3656)
* feat: make more different colors of graph lines

* docs/CHANGELOG.md: give more visually different colors to graph lines
2023-01-17 20:25:37 -08:00
Aliaksandr Valialkin
c94020f7dc app/vmui: do not round the number in formatPrettyNumber() if the range isn't set 2023-01-17 19:50:44 -08:00
Aliaksandr Valialkin
006af394ff vendor: update github.com/VictoriaMetrics/metricsql from v0.51.1 to v0.51.2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3664
2023-01-17 11:26:41 -08:00
Aliaksandr Valialkin
289af65071 lib/promscrape: properly apply series limit
Fix the following issues:

- Series limit wasn't applied when staleness tracking was disabled.
- Series limit didn't prevent from sending staleness markers for new series exceeding the limit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Thanks to @hagen1778 for the initial attempt to fix the issue
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3665
2023-01-17 10:14:49 -08:00
Aliaksandr Valialkin
e06168f489 docs/CHANGELOG.md: document the 74f196c37813a18c2c8f28831696ed7d9f535604 2023-01-17 09:10:41 -08:00
Aliaksandr Valialkin
09d7fa2737 lib/{mergeset,storage}: do not slow down concurrently executed queries during assisted merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2023-01-16 14:31:52 -08:00
Aliaksandr Valialkin
094ae82df5 vendor: make vendor-update 2023-01-15 14:16:34 -08:00
Aliaksandr Valialkin
092e2c8f2d github.com/VictoriaMetrics/metrics: update from v1.23.0 to v1.23.1
See https://github.com/VictoriaMetrics/metrics/issues/42
2023-01-15 14:06:43 -08:00
Yury Molodov
04c408e986 vmui: make the step input field global across all the tabs and views (#3644)
* feat: make the step input field global

* fix: correct get step from url

* fix: set minimumSignificantDigits to 1

* app/vmselect/vmui: `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-15 13:47:08 -08:00
Aliaksandr Valialkin
cdb6d651e9 app/vminsert: return 200 OK status code when importing data in pushgateway format
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1415
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
2023-01-15 13:29:53 -08:00
Aliaksandr Valialkin
207a62a3c2 app/vmselect/promql: reduce memory allocations when searching for time series pairs with identical labelsets in q1 op q2 queries
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
2023-01-15 13:03:23 -08:00
Aliaksandr Valialkin
27afe7bc38 app/vmselect/promql: reduce the number of memory allocations inside getCommonLabelFilters()
This should improve performance a bit for `q1 op q2` queries
2023-01-15 13:03:23 -08:00
Denys Holius
ffe6e6fe59 guides/guide-delete-or-replace-metrics.md: fixed wrong curl command (#3652) 2023-01-15 13:02:52 -08:00
Yury Molodov
7fd82c0d3a feat: make nav menu as links (#3646) 2023-01-13 09:45:07 +01:00
Corporte Gadfly
f32a79189b docs: fix typo (#3648) 2023-01-13 09:39:11 +01:00
Aliaksandr Valialkin
be8fba9b6a app/vmselect/netstorage: tune the number of blocks per series which should be unpacked by a single goroutine instead of spinning up multiple goroutines
This reduces overhead on time series data unpacking for typical cases,
this reducing CPU usage at vmselect
2023-01-12 09:31:44 -08:00
Aliaksandr Valialkin
98de06ff38 docs/CHANGELOG.md: update the description of the change at 20f28eb9d6 2023-01-12 08:58:52 -08:00
Nikolay
20f28eb9d6 /lib/promscrape: use correct err logger for scrape unmarshalling (#3645)
/lib/promscrape: use correct err logger for scrape unmarshalling
It correctly suppresses scrape errors and adds correct context for err msg
2023-01-12 17:40:42 +01:00
Roman Khavronenko
ec7c3f45ba dashboards: bump operator dash to v9 of Grafana (#3642)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-12 16:31:26 +01:00
Aliaksandr Valialkin
7067e8206c app/vmselect/promql: reduce memory allocations at getCommonLabelFilters() function
Intern tag keys and values there
2023-01-12 01:27:41 -08:00
Aliaksandr Valialkin
e2498af530 lib/promscrape: log the number of unsuccessful scrapes during the last -promscrape.suppressScrapeErrorsDelay
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3413
Thanks to @jelmd for the pull request.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2575
2023-01-12 01:09:32 -08:00
Aliaksandr Valialkin
9f5b5708ff app/vmselect: handle the /custom-dashboards request from /graph/ page in the same way as from the /vmui/ page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3322
2023-01-11 23:41:04 -08:00
Aliaksandr Valialkin
9fdd1a10c6 docs/Cluster-VictoriaMetrics.md: mention the -vmui.customDashboardsPath command-line flag at vmselect
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3322
2023-01-11 23:39:55 -08:00
Aliaksandr Valialkin
a194982117 app/vmselect: follow-up after 820312a2b1
- Move the feature description at the correct place at docs/CHANGELOG.md
- Run `make vmui-update`
- Various cosmetic fixes

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3322
2023-01-11 23:28:00 -08:00
Dmytro Kozlov
820312a2b1 app/vmui: define custom path for dashboards json file (#3545)
* app/vmui: define custom path for dashboards json file

* app/vmui: remove unneeded code

* app/vmui: move handler to own file, fix show dashboards,

* app/vmui: move flag to handler, add flag description

* app/vmauth: fix part of the comments

* feat: add store for dashboards

* fix: prevent fetch dashboards for app mode

* app/vmauth: use simple cache for predefined dashboards

* app/vmauth: update dashboards doc

* app/vmauth: fix ci

* app/vmui: decrease timeout

* app/vmselect: removed cache, fix comments

* app/vmselect: remove unused const

* app/vmselect: fix error log, use slice byte instead of struct

Co-authored-by: Yury Moladau <yurymolodov@gmail.com>
2023-01-11 23:06:07 -08:00
Aliaksandr Valialkin
ec23ab6bc2 lib/promscrape/discovery: missing changes after b4ad3a3b4c 2023-01-11 23:02:45 -08:00
Dmytro Kozlov
20af29294e app/vmctl: add remote read protocol integration tests (#3626) 2023-01-11 23:00:10 -08:00
Aliaksandr Valialkin
b4ad3a3b4c lib/promscrape: follow-up for 8537533beb
- Add a comment describing the purpose of the `role` field inside `apiConfig` struct
- Revert changes at lib/promscrape/discovery/dockerswarm/dockerswarm.go ,
  since they reduce code readability. E.g. the reader needs to look up the named string constants
  in order to get their values.
2023-01-11 22:54:18 -08:00
Zakhar Bessarab
8537533beb lib/promscrape/discovery/dockerswarm: fix discovery filters being applied to all objects (#3632)
* lib/promscrape/discovery/dockerswarm: fix discovery filters being applied to all objects

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-11 22:50:34 -08:00
Yury Molodov
5c8bb029b5 vmui: small changes on explore metrics page (#3634)
* fix: change issue link

* fix: remove legend toggle

* fix: move select graph size

* feat: save url params on explore metrics page

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-11 22:16:10 -08:00
Artem Navoiev
13a1a7f826 fix operator docs hugo schema
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-11 17:28:34 +01:00
Artem Navoiev
83c87f822a docs: prepare operator docs to migration
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-11 18:19:40 +02:00
Artem Navoiev
54aec2b1ba docs: operator *.MD -> *.md
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-11 16:29:29 +01:00
Roman Khavronenko
b3a70b8284 dasbhoards: fix the tooltip info for 1.86 (#3628)
See c63755c316 (diff-bba263a473e7fbc9d0fde075ebef6b3d4e32c322ee1210a3e07182292c7723aaR18)

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-11 11:30:12 +01:00
Aliaksandr Valialkin
351fc152e0 docs/CHANGELOG.md: cut v1.86.1 2023-01-11 01:26:43 -08:00
Aliaksandr Valialkin
975bb8722f lib/vmselectapi: properly calculate query timeout
vmselect passes query timeout to vmstorage in seconds.
The commit 20e9598254 treated it as timeout in nanoseconds.
Fix this in order to prevent from the following errors under vmstorage load:

cannot process vmselect request: cannot execute "search_v7": couldn't start executing the request in 0.000 seconds,
since -search.maxConcurrentRequests=... concurrent requests are already executed.
2023-01-11 01:22:33 -08:00
Roman Khavronenko
f73953a049 app/vmselect: fix typo (#3629)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-11 01:09:47 -08:00
Aliaksandr Valialkin
dff47c73b7 app/vmselect: improve logging when the incoming query cannot be executed because of timeout in the wait queue 2023-01-11 01:06:05 -08:00
Aliaksandr Valialkin
65b9dcfcca app/vmselect/promql: typo fix after 0771d57860 2023-01-11 01:05:31 -08:00
Aliaksandr Valialkin
0771d57860 app/vmselect/promql: make a copy of per-series timestamps before their modification
The per-series timestamps are usually shared among series, so it is unsafe modifying them.

The issue has been appeared after the optimization at 2f3ddd4884
2023-01-11 00:59:13 -08:00
Aliaksandr Valialkin
31fc29599f app/vmselect/promql: move the eval function args in parallel query trace outside the loop 2023-01-10 22:23:30 -08:00
Aliaksandr Valialkin
a0f9cb27f9 docs/CHANGELOG.md: document v1.79.7 LTS release 2023-01-10 21:24:28 -08:00
Aliaksandr Valialkin
4faf7ea41e vendor: make vendor-update 2023-01-10 18:58:34 -08:00
Aliaksandr Valialkin
c449714c0a deployment/docker: update Go builder from v1.19.4 to v1.19.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.19.5+label%3ACherryPickApproved
2023-01-10 18:43:04 -08:00
Aliaksandr Valialkin
37fe04999c docs/CHANGELOG.md: add release date for v1.86.0 2023-01-10 17:45:57 -08:00
Aliaksandr Valialkin
eda940106a deployment/docker: update Docker tag for VictoriaMetrics components from v1.85.3 to v1.86.0 2023-01-10 17:26:37 -08:00
Aliaksandr Valialkin
28df7c2a96 docs/CHANGELOG.md: cut v1.86.0 2023-01-10 16:22:26 -08:00
Aliaksandr Valialkin
2d294cca59 deployment/docker: update Alpine base image from v3.17.0 to v3.17.1
See https://alpinelinux.org/posts/Alpine-3.17.1-released.html
2023-01-10 16:14:40 -08:00
Aliaksandr Valialkin
95ce1ba6ce lib/httpserver: directly pass flag value to CheckAuthFlag()
There is no sense in passing a pointer to flag value there.

This is a follow-up for 4225a0bd75
2023-01-10 15:52:23 -08:00
Zakhar Bessarab
4225a0bd75 Use httpAuth.* flags as a fallback for endpoints protected by *AuthKey flags (#3582)
* {lib/server, app/}: use `httpAuth.*` flag as fallback for `*AuthKey` if it is not set

* lib/ingestserver/opentsdbhttp: fix opentdb HTTP handler not respecting `httpAuth.*` flags

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-10 15:46:13 -08:00
Dmytro Kozlov
0811000bb0 app/vmctl: Add insecure skip verify flag for remote read protocol (#3611)
* app/vmctl: Add insecure skip verify flag for remote read protocol
2023-01-10 23:18:49 +01:00
Artem Navoiev
37c52ccaf4 vmagent: add minimal scrape file exampe for vmagent quick start (#3627)
* vmagent: add minimal scrape file exampe for vmagent quick start

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* replace example with link to your prometheus.yml in docker

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-10 23:16:10 +01:00
Aliaksandr Valialkin
cbe62f23ba lib/promscrape/discovery/gce: follow-up for b2ccdaaa2f
- Use promutils.Labels.GetLabels() instead of comparing promutils.Labels.Labels to nil.
  This make the code more consistent with other places.

- Mention the release where the issue has been introduced at docs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3624
2023-01-10 13:51:03 -08:00
Aliaksandr Valialkin
53d871d0b1 app/vmselect/netstorage: reduce tail latency during query processing
Previously the selected time series were split evenly among available CPU cores
for further processing - e.g unpacking the data and applying the given rollup
function to the unpacked data.
Some time series could be processed slower than others.
This could result in uneven work distribution among available CPU cores,
e.g. some CPU cores could complete their work sooner than others.
This could slow down query execution.

The new algorithm allows stealing time series to process from other CPU cores
when all the local work is done. This should reduce the maximum time
needed for query execution (aka tail latency).

The new algorithm should also scale better on systems with many CPU cores,
since every CPU processes locally assigned time series without inter-CPU communications.

The inter-CPU communications are used only when all the local work is finished
and the pending work from other CPUs needs to be stealed.
2023-01-10 13:43:14 -08:00
Zakhar Bessarab
b2ccdaaa2f lib/promscrape/discovery/gce: fix crash in case instance does not have any labels set (#3625)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-10 11:07:11 +01:00
Denys Holius
b06e795a1e docs/Release-Guide.md: added missed link to rpm repository (#3623) 2023-01-10 10:14:56 +01:00
Aliaksandr Valialkin
e640ff72f1 app/vmselect/netstorage: reduce memory allocations when unpacking time series
Unpack time series with less than 400K samples in the currently running goroutine.
Previously a new goroutine was being started for unpacking the samples.
This was requiring additional memory allocations.
2023-01-09 23:18:17 -08:00
Aliaksandr Valialkin
9a563a6aef app/vmselect/promql: eliminate memory allocation when sorting values inside float64s 2023-01-09 23:06:46 -08:00
Aliaksandr Valialkin
30ed33fae0 app/vmselect/promql: pre-allocate memory for values to be merged in mergeTimeseries()
This should reduce the number of memory re-allocations
2023-01-09 22:51:17 -08:00
Aliaksandr Valialkin
645c24dc5f app/vmselect/promql: consistently intern series names obtained from marshalMetricNameSorted
This reduces memory allocations when the returned series names are used as map keys later
2023-01-09 22:45:40 -08:00
Aliaksandr Valialkin
2f3ddd4884 app/vmselect/promql: avoid memory allocations and copying from source timeseries to the returned result at timeseriesToResult() 2023-01-09 22:38:59 -08:00
Aliaksandr Valialkin
26cf680468 app/vmselect/promql: remove memory allocations from sortMetricTags() 2023-01-09 22:22:15 -08:00
Aliaksandr Valialkin
4f0c11ee93 app/vmselect/promql: intern output series names inside timeseriesToResult()
This reduces the number of memory allocations for repeated queries,
which return (almost) the same set of time series.
2023-01-09 22:19:56 -08:00
Aliaksandr Valialkin
562d6bca08 app/vmselect/promql: intern output series names during normal aggregation 2023-01-09 22:15:24 -08:00
Aliaksandr Valialkin
21ee9a1fab app/vmselect/promql: intern output series names during incremental aggregation
This should reduce the number of memory allocations for repeated queries
2023-01-09 22:11:36 -08:00
Aliaksandr Valialkin
df2a494a7c app/vmselect/netstorage: pre-allocate 4 block references per each time series during querying
Usually the number of blocks returned per each time series during queries is around 4.
So it is a good idea to pre-allocate 4 block references per time series
in order to reduce the number of memory allocations.
2023-01-09 22:03:23 -08:00
Aliaksandr Valialkin
c5e0f527bc app/vmselect/netstorage: cache canonical MetricName for time series returned from the storage
This reduces memory allocations for repeated queries, which return (almost) the same set of time series.
2023-01-09 21:53:10 -08:00
Aliaksandr Valialkin
7afcca0c51 all: use metricsql.CompileRegexp instead of regexp.Compile for compiling regexps used in graphite queries
This should speed up repeated queries, since metricsql.CompileRegexp returns regexps from the cache
on subsequent calls for the same input regexp.
2023-01-09 21:43:08 -08:00
Aliaksandr Valialkin
67ab49baa9 vendor: make vendor-update 2023-01-09 21:34:34 -08:00
Aliaksandr Valialkin
e5eca54951 lib/promscrape/discovery/nomad: sync nomad_sd_configs fields with the Prometheus implementation
See the list of configs supported by Prometheus at f88a0a7d83/discovery/nomad/nomad.go (L76-L84)

- Removed "token" option. In can be set either via NOMAD_TOKEN env var or via `bearer_token` config option.
- Removed "scheme" option. It is automatically detected depending on whether the `tls_config` is set.
- Removed "services" and "tags" options, since they aren't supported by Prometheus.
- Added "region" option. If it is missing, then the region is read from NOMAD_REGION env var.
  If this var is empty, then it is set to "global" in the same way as Nomad client does.
  See 865ee8d37c/api/api.go (L297)
  and 865ee8d37c/api/api.go (L555-L556)
- If the "server" option is missing, then it is read from NOMAD_ADDR in the same way
  as Nomad client does - see 865ee8d37c/api/api.go (L294-L296)

This is a follow-up for 8aee209c53

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367
2023-01-09 21:14:48 -08:00
Aliaksandr Valialkin
c38a10e143 app/vmselect/netstorage: eliminate memory allocation for sortBlocksHeap arg when calling mergeSortBlocks() 2023-01-09 21:08:51 -08:00
Aliaksandr Valialkin
1f9d605988 app/vmselect/netstorage: consistently select the sample with the biggest value out of samples with identical timestamps
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333

This fix is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3620 ,
but doesn't slow down the common case with merging replicated data blocks so significantly.

Benchmark results:

Before the change:

BenchmarkMergeSortBlocks/replicationFactor-1-4         	   13968	     85643 ns/op	 956.53 MB/s	    1700 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-2-4         	   10806	    109171 ns/op	1500.77 MB/s	    2191 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-3-4         	    8887	    130623 ns/op	1881.45 MB/s	    2660 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-4-4         	    7440	    157348 ns/op	2082.52 MB/s	    3174 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-5-4         	    6534	    184473 ns/op	2220.38 MB/s	    3612 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4  	   13419	     85205 ns/op	 961.44 MB/s	    2213 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 	     579	   1894900 ns/op	  43.23 MB/s	   46760 B/op	       1 allocs/op

After the change:

BenchmarkMergeSortBlocks/replicationFactor-1-4         	   13832	     85298 ns/op	 960.40 MB/s	    1716 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-2-4         	    8833	    134222 ns/op	1220.66 MB/s	    2675 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-3-4         	    6487	    184830 ns/op	1329.65 MB/s	    3636 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-4-4         	    4977	    236318 ns/op	1386.61 MB/s	    4733 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-5-4         	    4088	    296734 ns/op	1380.36 MB/s	    5761 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4  	   14083	     84067 ns/op	 974.47 MB/s	    2110 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 	     536	   2043534 ns/op	  40.09 MB/s	   50511 B/op	       1 allocs/op
2023-01-09 13:01:48 -08:00
Denys Holius
fe0e199859 deployment/docker: update Alertmanager tag from v0.24.0 to v0.25.0 in docker-compose files (#3619)
deployment/docker: bump alertmanager to latest v0.25.0
2023-01-09 12:37:14 +01:00
Roman Khavronenko
8aee209c53 lib/promscrape: remove datacenter field from nomad_sd_config (#3612)
Looks like `datacenter` field isn't part of `/v1/services` API.
See https://developer.hashicorp.com/nomad/api-docs/services#list-services
and https://developer.hashicorp.com/nomad/api-docs/services#read-service

Related issues:
https://github.com/traefik/traefik/issues/9109
https://github.com/prometheus/prometheus/issues/11776

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-09 09:07:40 +01:00
Aliaksandr Valialkin
28f8dc41b0 lib/promscrape/discoveryutils: cleanup after 5df9fddaf2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468
2023-01-07 01:26:54 -08:00
Zakhar Bessarab
5df9fddaf2 lib/promscrape/discoveryutils: use correct timeout for blocking requests (#3609)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-07 01:13:03 -08:00
Aliaksandr Valialkin
41e00a0df7 lib/storage: simplify the fix from 488940502c
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3566
2023-01-07 01:04:43 -08:00
Dmytro Kozlov
488940502c lib/storage: fix returning camelcase label names (#3608)
* lib/storage: fix returning camelcase label names

* doc: add change log

* Update docs/CHANGELOG.md

* Update docs/CHANGELOG.md

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-07 00:50:14 -08:00
Aliaksandr Valialkin
5fe7ff24c2 lib/streamaggr: limit the the number of concurrent flushes of the aggregate data to the exact number of available CPUs
This should reduce the maximum memory usage during concurrent flushes of the aggregate data
2023-01-07 00:18:51 -08:00
Aliaksandr Valialkin
ad5bfe3089 lib/promscrape: reduce the number of concurrently executed processScrapedData calls from 2x of the number of CPUs to the number of CPUs
This should reduce the maximum memory usage for processScrapedData() function by 2x.
The only part, which can be IO-bound in the processScrapedData() is pushData() call,
when it buffers data to persistent queue if the remote storage cannot keep up
with the data ingestion speed. In this case it is OK if the scrape pace will be limited.
2023-01-07 00:14:30 -08:00
Aliaksandr Valialkin
af263fe881 all: small improvements in error messages and command-line flag descriptions related to concurrency limiters 2023-01-07 00:11:44 -08:00
Aliaksandr Valialkin
45f39e291e lib/writeconcurrencylimiter: moved the error generation from incConcurrency() to the caller place 2023-01-06 23:45:58 -08:00
Aliaksandr Valialkin
986a05e18d lib/promscrape: limit the concurrency during parsing and relabeling the scraped samples
This should reduce memory usage when scraping big number of targets,
since this limits the summary memory usage during concurrent parsing and relabeling
by the number of available CPU cores.
2023-01-06 22:59:17 -08:00
Aliaksandr Valialkin
293e4dc77b app/{vminsert,vmstorage}: add comments on why storage.AddRows() is called without limiting the number of concurrent calls 2023-01-06 22:40:07 -08:00
Aliaksandr Valialkin
5c4bd4f7c1 lib/streamaggr: limit the number of concurrent flushes of aggregate metrics in order to limit memory usage 2023-01-06 22:39:13 -08:00
Aliaksandr Valialkin
c63755c316 lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit
Previously the -maxConcurrentInserts was limiting the number of established client connections,
which write data to VictoriaMetrics. Some of these connections could be idle.
Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting
the number of such connections. So now the -maxConcurrentInserts command-line option
limits the number of concurrently executed insert requests, not including idle connections.

It is recommended removing -maxConcurrentInserts command-line option, since the default value
for this option should work good for most cases.
2023-01-06 22:20:19 -08:00
Aliaksandr Valialkin
f299d2ca1a lib/vmselectapi: limit the number of concurrently executed requests
This should prevent from out of memory errors when big number of vmselect
nodes send many concurrent requests to vmstorage

The limit can be controlled at vmstorage via the following command-line flags:
- search.maxConcurrentRequests
- search.maxQueueDuration

See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#resource-usage-limits
2023-01-06 22:11:34 -08:00
Aliaksandr Valialkin
e7637885a6 app/vmselect: improve error message when the request cannot be started because too many concurrent requests are already executed 2023-01-06 22:10:42 -08:00
Aliaksandr Valialkin
463b957e54 lib/promscrape/discovery/{consul,nomad}: wait until the deleted serviceWatchers are stopped inside updateServices() call
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367
2023-01-05 21:52:33 -08:00
Aliaksandr Valialkin
f392913d00 lib/promscrape: follow-up after bced9fb978
- Document the bugfix at docs/CHANGELOG.md
- Wait until all the worker goroutines are done in consulWatcher.mustStop()
- Do not log `context canceled` errors when discovering consul serviceNames
- Removed explicit handling of gzipped responses at lib/promscrape/discoveryutils.Client,
  since this handling is automatically performed by net/http.Transport.
  See DisableCompression option at https://pkg.go.dev/net/http#Transport .
- Remove explicit handling of the proxyURL, since it is automatically handled
  by net/http.Transport. See Proxy option at https://pkg.go.dev/net/http#Transport .
- Expliticly set MaxIdleConnsPerHost, since its default value equals to 2.
  Such a small value may result in excess tcp connection churn
  when more than 2 concurrent requests are processed by lib/promscrape/discoveryutils.Client.
- Do not set explicitly the `Host` request header, since it is automatically set by net/http.Client.
- Backport the bugfix to the recently added nomad_sd_configs - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468
2023-01-05 21:13:06 -08:00
Zakhar Bessarab
bced9fb978 lib/promscrape/discoveryutils: switch to native http client from fasthttp (#3568) 2023-01-05 19:34:47 -08:00
Roman Khavronenko
5bdd880142 vmstorage: add more context to the flock acquiring msg (#3584)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3578

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-05 18:30:42 -08:00
Aliaksandr Valialkin
9f348cf8a1 lib/promscrape/discovery/nomad: follow-up after 48f371a46c
- Remove undocumented `username` and `password` config options from `nomad_sd_config`.
  TODO: probably, remove these options from `consul_sd_config` too?
  These options exist there for backwards compatibility purposes.

- Add __meta_nomad_service_alloc_id and __meta_nomad_service_job_id meta-labels
  These labels contain AllocID and JobID fields for the discovered Nomad services.

- Various typo fixes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367
2023-01-05 18:07:20 -08:00
Aliaksandr Valialkin
cad8553c01 Makefile: remove trailing space after golangci-lint run command
It is left after ec2c82e800
2023-01-05 16:59:07 -08:00
Aliaksandr Valialkin
1a28f0e5b3 lib/promrelabel: pass query args via query string at /metric-relabel-debug and /target-relabel-debug pages if their length doesnt exceed 1000
This allows copy-n-pasting the url to another browser window and seeing the same result.

The limit in 1000 chars is selected in order to prevent from potential issues with systems
which limit the url length such as Internet Explorer - see https://stackoverflow.com/questions/812925/what-is-the-maximum-possible-length-of-a-query-string

If the limit is exceeded, then query args are sent via POST method and aren't visible in the url.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3580
2023-01-05 16:48:04 -08:00
Karan Sharma
48f371a46c lib/promscrape: add Prometheus-compatible service discovery for Nomad (#3549)
Add nomad_sd_config support for service discovery
2023-01-05 23:03:58 +01:00
Denys Holius
043b28c725 .github/workflows/nightly-build.yml: added dockerhub login (#3594) 2023-01-05 16:54:14 +01:00
Luke Palmer
ec2c82e800 Lint and errcheck using golangci-lint (#3558) 2023-01-05 16:12:46 +01:00
Zakhar Bessarab
01bc0c94ab doc: add vmbackupmanager monitoring section (#3605)
* doc: add vmbackupmanager monitoring section

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-05 16:03:06 +01:00
Thomas Danielsson
9d1104d812 dashboards: fix operator datasource variable (#3604)
Got "Failed to upgrade legacy queries Datasource $ds was not found" in
Grafana on operator dashboard.
It's datasource variable was incorrectly named `datasource`.

Also made the rest of the dashboards have homogeneous datasource-variable
names and selections, matching vmagent dashboard.
2023-01-05 14:59:56 +01:00
Artem Navoiev
8b763175ff Add Understand Your Setup Size Guide (#3572)
docs: add Understand Your Setup Size Guide

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-05 14:56:50 +01:00
Aliaksandr Valialkin
2ee81a5dbb docs/CHANGELOG.md: add missing dot 2023-01-05 03:35:02 -08:00
Zakhar Bessarab
185cdcd813 lib/promscrape/discovery/dockerswarm: fix query encoding of filters (#3586)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-05 03:34:25 -08:00
Aliaksandr Valialkin
0dea3b71da lib/promscrape: pre-fetch metric_relabel_configs rules when debugging metric relabeling for a particular target
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3407
2023-01-05 03:26:49 -08:00
Aliaksandr Valialkin
a1076abcbf lib/promscrape: follow-up for a7e29c38bc
- Document the bugfix at docs/CHANGELOG.md
- Make the fix more durable against future changes when droppedTargetsMap.Register may be called from other places.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3580
2023-01-05 02:52:08 -08:00
Zakhar Bessarab
a7e29c38bc lib/promscrape/targetstatus: fix crash during droppedTarget registration (#3595)
* lib/promscrape/targetstatus: fix crash during droppedTarget registration in case original labels are not present

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/targetstatus: address review comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-05 02:39:31 -08:00
Yury Molodov
2460e0f51e vmui: improve Explore metrics (#3598)
* feat: add multiple select

* feat: improve explore interface

* app/vmselect/vmui: `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-05 02:23:04 -08:00
Aliaksandr Valialkin
0e1f0ade31 lib/streamaggr: sort by and without labels in the aggregate output metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-05 02:08:44 -08:00
Aliaksandr Valialkin
04dff34de4 vendor: update github.com/VictoriaMetrics/metricsql from v0.50.0 to v0.51.0
Updates https://github.com/VictoriaMetrics/metricsql/pull/7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3589
2023-01-05 01:50:10 -08:00
Aliaksandr Valialkin
66947ee5a2 lib/streamaggr: remove unused fields 2023-01-04 13:33:46 -08:00
Roman Khavronenko
9d0e1f8e68 dashboards: add backupmanager dashboard (#3599)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-04 17:26:15 +01:00
Roman Khavronenko
63bf583b3c github: rm plaintext render (#3597)
`render: plain text` makes fields not formattable and prevents
 from pasting screenshots.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-04 14:48:17 +01:00
Denys Holius
09fe346d18 docs/Release-Guide.md: adds release scenario for RPM LTS packages (#3588) 2023-01-04 14:36:14 +01:00
Zakhar Bessarab
59f20c1034 github: use github templates for filling in feature requests or bug reports (#3587)
github: use github templates for filling in feature requests or bug reports
2023-01-04 14:34:19 +01:00
Artem Navoiev
0ff85d00a4 update year in License
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-04 09:46:38 +01:00
Aliaksandr Valialkin
fafece1af8 vendor: make vendor-update 2023-01-03 23:36:42 -08:00
Aliaksandr Valialkin
5bca3a5be2 app/vmselect: remove dependency on lib/promscrape from app/vmselect 2023-01-03 23:28:27 -08:00
Aliaksandr Valialkin
fd175ad80b docs: update -help outputs for vm* tools 2023-01-03 23:27:06 -08:00
Aliaksandr Valialkin
fa13bbc48a app/{vmagent,vminsert}: add support for streaming aggregation
See https://docs.victoriametrics.com/stream-aggregation.html

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-03 22:19:21 -08:00
Aliaksandr Valialkin
add2c4bf07 lib/bytesutil: add InternBytes() function as a shortcut to InternString(ToUnsafeString(..)) 2023-01-03 22:16:22 -08:00
Aliaksandr Valialkin
f33e687723 app/vmagent/remotewrite: improve descriptions for -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig
- Cross-reference these command-line flags.
- Add a link to https://docs.victoriametrics.com/vmagent.html#relabeling
2023-01-03 22:04:49 -08:00
Aliaksandr Valialkin
d3a1c36842 docs/Single-server-VictoriaMetrics.md: impromve formatting for prominent features chapter 2023-01-03 21:58:21 -08:00
Aliaksandr Valialkin
6db5c3801e app/vmbackup: remove superflouos whitespace after 8fe21ec707 2023-01-03 21:52:40 -08:00
Aliaksandr Valialkin
189f85b24c docs/vmbackupmanager.md: run make docs-sync after fa842d6534 2023-01-03 21:50:28 -08:00
Aliaksandr Valialkin
7b264b0c23 lib/promrelabel: allow calling Match on nil IfExpression
This simplifies the caller side of IfExpression
2023-01-03 21:44:03 -08:00
yanggang
8fe21ec707 Fix flag help message for the backups types. (#3577)
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-01-03 10:56:42 +01:00
yanggang
fa842d6534 fix typo for json values. (#3576)
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-01-03 10:55:38 +01:00
yanggang
93e935bcaa Fix vmctl command hint for vm-native-step-interval (#3575)
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-01-03 10:54:53 +01:00
Artem Navoiev
0a519c93ef run checks only for master/cluster branches (#3581)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-03 11:08:44 +04:00
Roman Khavronenko
4b3d8eb573 vmalert: mention specifics of Alertmanager HA mode (#3573)
Stress the importance of specifying of all Alertmanager
URLs in vmalert's `-notifier.url` or `notifier.config`
if it runs in cluster mode.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3547

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-30 09:54:03 +01:00
Aliaksandr Valialkin
05fa11b296 app/vmui: reuse the series color in line tooltip 2022-12-29 15:32:41 -08:00
Aliaksandr Valialkin
1794f3d46e app/vmui: small usability improvements
- Show in the line tooltip the number of the query which generates the given line.
  This simplifies comparison of lines generated by multiple queries.

- Show metric name as __name__ label in the line tooltip in the same way as other labels are shown there.
  This makes the label information in the tooltip more consistent.

- Properly quote label values with JSON.stringify(). This prevents from improper formatting
  when label values contain doublequote chars.

- Remove double curly braces artifact at graph legend for lines without names and labels.

- Properly use modifier for regular expressions across the code.
2022-12-29 14:52:51 -08:00
Aliaksandr Valialkin
59e1e84a92 docs/CHANGELOG.md: document 1720bddb4f
Updats https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3530
2022-12-29 12:31:48 -08:00
Aliaksandr Valialkin
b8bc62431a app/vmselect/vmui: make vmui-update after 1720bddb4f 2022-12-29 12:20:06 -08:00
Yury Molodov
1720bddb4f fix: display correct graph tooltip title (#3562) 2022-12-29 12:06:48 -08:00
Roman Khavronenko
2cedb3e883 csvimport: support empty values (#3565)
Before, if the imported line contained multiple metrics and one
or more of them had an empty values - the whole line was ignored.

Now, only metrics with empty values are ignored, and the rest
of the metrics are accepted successfully.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3540

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-12-29 11:52:10 -08:00
Roman Khavronenko
83870aeb8d tests: attempt to fix flaky graphite test (#3567)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-29 11:48:47 -08:00
Aliaksandr Valialkin
7af8857b68 docs/CHANGELOG.md: add a link to the docs and the pull request for update_entries_limit option in vmalert
This is a follow up for 6588fcbfca
2022-12-29 10:38:26 -08:00
Aliaksandr Valialkin
c90752a8be vendor: update github.com/valyala/fastjson/fastfloat from v1.6.3 to v1.6.4
This should properly parse floating-point numbers with missing integer or fractional parts.
For example, 123. or .123

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3544
2022-12-29 10:34:11 -08:00
ChenyuanHu
0a9e1d64fe app/vmselect/prometheus: no need manually call queryDuration.UpdateDuration (#3564)
There is no need to manually call `queryDuration.UpdateDuration(startTime)`, because `defer queryDuration.UpdateDuration(startTime)` is executed at the beginning of the function(L660).
2022-12-29 14:18:00 +01:00
Roman Khavronenko
6588fcbfca vmalert: allow configuring the default number of stored rule's update states (#3556)
Allow configuring the default number of stored rule's update states in memory
 via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's
 `update_entries_limit` configuration param.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-29 12:36:44 +01:00
Aliaksandr Valialkin
3dc684634e app/vmselect/searchutils: accept partial RFC3339 values at time, start and end query args
This simplifies manual usage of the APIs. For example, the following query
would return the results over the 2022 year.

  /api/v1/query_range?start=2022&end=2023&step=1d&query=...

This is equivalent to:

  /api/v1/query_range?start=2022-01-01T00:00:00Z&end=2023-01-01T00:00:00Z&step=1d&query=...
2022-12-28 19:41:54 -08:00
Yury Molodov
d2f89b55b7 vmui: fix step field (#3561)
* feat: use a unit next to the step value

* app/vmselect/vmui: `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-12-28 16:00:51 -08:00
Aliaksandr Valialkin
3d082ed6db vendor: make vendor-update 2022-12-28 15:00:02 -08:00
Aliaksandr Valialkin
c4229a1bba lib/promscrape: log the actual response size in the error message when the response size exceeds -promscrape.maxScrapeSize
This is a follow-up for 7ad9fff7e5
2022-12-28 14:42:11 -08:00
Aliaksandr Valialkin
1b16118e17 lib/{storage,mergeset}: tune the threshold for assisted merge
The https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425#issuecomment-1359117221
reveals that CPU usage for incoming queries may significantly increase when the number
of in-memory parts becomes too big.

This commit reduces the maximum number of in-memory parts before starting the assisted merge
during data ingestion. This should reduce CPU usage for incoming queries,
since they need to inspect lower number of in-memory parts.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
2022-12-28 14:39:24 -08:00
Clément Nussbaumer
7ad9fff7e5 fix(promscrape): check MaxScrapeSize after gzip decompression (#3550) 2022-12-28 12:19:41 -08:00
Aliaksandr Valialkin
293dda7169 lib/snapshot: improve log message on unexpected status code during attempts to create or delete snapshots
Use "unexpected status code returned from %q: %d; expecting %d" log message format
instead of less clear format "unexpected status code returned from %q; expecting %d; got %d"

This is a follow-up for c612bb165e
2022-12-28 11:41:50 -08:00
Aliaksandr Valialkin
ed9161a04f docs: remove a case study for Dreamteam.gg, since it looks like the company no longer exists 2022-12-28 11:34:40 -08:00
Zakhar Bessarab
c612bb165e lib/snapshot: fix error message format for failed HTTP request (#3559) 2022-12-28 18:04:11 +01:00
Artem Navoiev
34c705988e fix broken links
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2022-12-27 13:17:22 +02:00
Artem Navoiev
7d9c4bebc0 update links to grafana dashboards (#3534)
docs: update links to grafana dashboards

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2022-12-25 17:36:20 +01:00
Aliaksandr Valialkin
b11c806d1c app/vmui: show min, max and avg lines at Explore metrics graphs when instance is selected in the same way as when only the job is selected
This improves consistency of the graphs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-23 23:21:11 -08:00
Aliaksandr Valialkin
2a7e392bb3 app/vmselect/vmui: make vmui-update after 0dca224ec3 2022-12-23 22:25:04 -08:00
Aliaksandr Valialkin
0dca224ec3 app/vmui: small improvements for header panel
- Rename `Custom panel` tab to more clear `Query` tab
- Rename `Cardinality` tab to `Explore cardinality`, so it becomes consistent with `Explore metrics` tab
- Move `Dashboards` tab to the end, since it isn't used too much
2022-12-23 22:24:31 -08:00
Aliaksandr Valialkin
8ef1fe2047 app/vmui: move the Explore metrics tab closer to Custom panel tab
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-23 22:16:21 -08:00
Aliaksandr Valialkin
f599e1bd34 app/vmui: show less lines at metrics explorer when the instance isn't selected
Show min, max and avg graphs across instances for the selected job.
This should improve usability of such a graphs when the job contains many instances.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-23 22:15:01 -08:00
Aliaksandr Valialkin
4deea604bf app/vmui: follow-up after f6d31f5216
- Document the feature at docs/CHANGELOG.md.
- Document the metrics explorer at https://docs.victoriametrics.com/#metrics-explorer .
- Properly set `start` and `end` args for the selected time range
  when performing the request, which returns metric names.
- Improve queries, so they return lower number of lines and labels.
  This should improve metrics' exploration.
- Properly encode label filters and query args before passing them to VictoriaMetrics.
- Various cosmetic fixes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-22 17:17:01 -08:00
Yury Molodov
f6d31f5216 vmui: add explore tab for exploration of metrics, which belong to a particular job/instance (#3470)
* feat: add "Explore" page

* feat: add graphs for explore page

* vmui: add explore tab for exploration of metrics, which belong to a particular job/instance

* refactor: rename variables

* refactor: extract graph to ExploreMetricItemGraph.tsx

* feat: add searchable for Select.tsx

* feat: improve metrics explorer

* feat: set document title by page

* feat: add page to view icons

* fix: improve styles

* fix: add encodeURIComponent to query
2022-12-22 15:24:40 -08:00
Aliaksandr Valialkin
f6ac045933 docs/CHANGELOG.md: document 1df87807fd
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3513
2022-12-22 15:22:09 -08:00
Yury Molodov
1df87807fd vmui: step (#3521)
* feat: add step rounding

* fix: change step in URL parameters

* refactor: change comment for roundStep
2022-12-22 14:54:28 -08:00
Aliaksandr Valialkin
0076422350 lib/promscrape/discovery/azure: typo fix 2022-12-21 21:25:16 -08:00
Aliaksandr Valialkin
f1441a598f app/vmselect/promql: add tests for d3de110070 2022-12-21 20:25:21 -08:00
Aliaksandr Valialkin
fa236c5a84 lib/promrelabel: make fmt after d3de110070 2022-12-21 20:24:57 -08:00
Aliaksandr Valialkin
d3de110070 app/vmselect/promql: make sure that label_replace() doesn't create an empty dst_label if the src_label doesn't match regex 2022-12-21 20:20:01 -08:00
Aliaksandr Valialkin
31886aef3d lib/promrelabel: add support for keepequal and dropequal relabeling actions
These actions are supported by Prometheus starting from v2.41.0

See https://github.com/prometheus/prometheus/pull/11564 ,
https://github.com/prometheus/prometheus/issues/11556
and https://github.com/prometheus/prometheus/issues/3756

Side note:

It's a pity that Prometheus developers decided inventing `keepequal` and `dropequal`
relabeling actions instead of adding support for `keep_if_equal` and `drop_if_equal` relabeling
actions supported by VictoriaMetrics since June 2020 - see 2a39ba639d .
2022-12-21 20:04:55 -08:00
Aliaksandr Valialkin
3300546eab lib/bytesutil: make sure that the cleanup code is performed only by a single goroutine out of many concurrently running goroutines
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3466
2022-12-21 13:07:24 -08:00
Yury Molodov
0f8ffc7df9 vmui: fix change timezone (#3519)
vmui: fix time picker with changed time zone
2022-12-21 17:22:37 +01:00
Aliaksandr Valialkin
192db0f0b1 deployment/docker: update VictoriaMetrics tag from v1.85.2 to v1.85.3 in docker-compose files 2022-12-20 15:34:23 -08:00
Aliaksandr Valialkin
f443fad56d docs/CHANGELOG.md: cut v1.85.3 2022-12-20 14:51:23 -08:00
Aliaksandr Valialkin
77874d6055 app/vmselect/vmui: make vmui-update after 731d189fa9 2022-12-20 14:51:07 -08:00
Roman Khavronenko
7ca49290b6 docs: fix the image link (#3509)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3495
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-20 14:36:25 -08:00
Roman Khavronenko
e4e9dfb785 vmagent: respect -usePromCompatibleNaming if no relabeling is set (#3511)
* vmagent: respect `-usePromCompatibleNaming` if no relabeling is set

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3493

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: upd test

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-12-20 14:32:45 -08:00
Yury Molodov
731d189fa9 fix: change the logic for hide query (#3514) 2022-12-20 14:29:46 -08:00
Zakhar Bessarab
4be4645142 app/vmbackupmanager: add metrics for better observability (#488)
* app/vmbackupmanager: add metrics for better observability, include more information to `/api/v1/backups` API call response

* app/vmbackupmanager: drop old metrics before creating new ones

* app/vmbackupmanager: use `_total` postfix for counter metrics

* app/vmbackupmanager: remove `_total` postfix for gauge-like metrics

* app/vmbackupmanager: add `_last_run_failed` metrics for backups and retention

* app/vmbackupmanager: address review feedback

* app/vmbackupmanager: fix metric name

* app/vmbackupmanager: address review feedback, remove background updates of metrics, add restoring state of `_last_run_failed` metric from remote storage

* app/vmbackupmanager: improve performance for backup size calculation

* app/vmbackupmanager: refactor backup and retention runs to deduplicate each run logic

* {app/vmbackupmanager,lib/formatutil}: move HumanizeBytes into lib package

* app/vmbackupmanager: fix creating new metrics instead of reusing existing ones

* lit/formatutil: add comment to make linter happy

* app/vmbackupmanager: address review feedback
2022-12-20 14:18:06 -08:00
Aliaksandr Valialkin
4e55b67a44 lib/storage: clear the err if it is set to io.EOF when searching for the TSID by metricID
This is expected error after when recently added indexdb data isn't available for search yet
or wasn't flushed to disk after unclean shutdown of VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3515
2022-12-20 14:05:29 -08:00
Aliaksandr Valialkin
8f5e822565 Makefile: update golangci-lint version from v1.48.0 to v1.50.1 2022-12-20 13:09:40 -08:00
Aliaksandr Valialkin
cad90c7ac1 Makefile: publish release docker images at DockerHub with the stable tag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2911
2022-12-20 12:27:06 -08:00
Aliaksandr Valialkin
a48510573e Revert "docs/Release-Guide.md: add LATEST_TAG=stable env var for make publish-release in order to create stable tag for the published components at DockerHub"
This reverts commit 8afa7ef837.
2022-12-20 12:24:27 -08:00
Aliaksandr Valialkin
8afa7ef837 docs/Release-Guide.md: add LATEST_TAG=stable env var for make publish-release in order to create stable tag for the published components at DockerHub
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2911
2022-12-20 12:22:42 -08:00
Aliaksandr Valialkin
1d7b5cb83c docs/CHANGELOG.md: document the change at 547f07463b29c09c62c9af35eac9cee6764b3286
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612
2022-12-20 10:22:07 -08:00
Zakhar Bessarab
18e55d14c6 app/vmbackupmanager: update doc to include cluster to cluster restore example (#3506) 2022-12-20 14:54:56 +01:00
Roman Khavronenko
8122191368 docs: fix link typo in operator docs (#3508) 2022-12-20 14:52:47 +01:00
Aliaksandr Valialkin
6bf46c7bf5 docs/CHANGELOG.md: formatting fix 2022-12-20 01:06:34 -08:00
Aliaksandr Valialkin
9fa3f1dc57 app/vmselect/promql: do not extend too short lookbehind window for rate() function if it is set explicitly
Previously too short lookbehind window d for rate(m[d]) could be automatically extended
if it didn't cover at least two raw samples. This was needed in order to guarantee
non-empty results from rate(m[d]) on short time ranges.

Now the lookbehind window isn't extended if it is set explicitly,
since it is expected that the user knows what he is doing.

The lookbehind window continues to be extended when needed if it isn't set explicitly.
For example, in the case of rate(m).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3483
2022-12-20 00:18:20 -08:00
Aliaksandr Valialkin
eabb8762ee docs/Articles.md: add a link to https://www.youtube.com/watch?v=Mesc6JBFNhQ 2022-12-19 21:40:51 -08:00
Michal Kralik
fd53f86c84 build: fix issue with missing docker scan (#3501) 2022-12-19 15:22:45 -08:00
1214 changed files with 63933 additions and 18693 deletions

View File

@@ -1,46 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
It would be great to [upgrade](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verify whether the bug is reproducible there.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html).
**To Reproduce**
Steps to reproduce the behavior.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Logs**
Check if any warnings or errors were logged by VictoriaMetrics components
or components in communication with VictoriaMetrics (e.g. Prometheus, Grafana).
**Screenshots**
If applicable, add screenshots to help explain your problem.
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
**Version**
The line returned when passing `--version` command line flag to the binary. For example:
```
$ ./victoria-metrics-prod --version
victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
**Used command-line flags**
Please provide the command-line flags used for running VictoriaMetrics and its components.

86
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View File

@@ -0,0 +1,86 @@
name: Bug report
description: Create a report to help us improve
labels: [bug]
body:
- type: markdown
attributes:
value: |
Before filling a bug report it would be great to [upgrade](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verify whether the bug is reproducible there.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html) first.
- type: textarea
id: describe-the-bug
attributes:
label: Describe the bug
description: |
A clear and concise description of what the bug is.
placeholder: |
When I do `A` VictoriaMetrics does `B`. I expect it to do `C`.
validations:
required: true
- type: textarea
id: to-reproduce
attributes:
label: To Reproduce
description: |
Steps to reproduce the behavior.
If reproducing an issue requires some specific configuration file, please paste it here.
placeholder: |
Steps to reproduce the behavior.
validations:
required: true
- type: textarea
id: version
attributes:
label: Version
description: |
The line returned when passing `--version` command line flag to the binary. For example:
```
$ ./victoria-metrics-prod --version
victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
validations:
required: true
- type: textarea
id: logs
attributes:
label: Logs
description: |
Check if any warnings or errors were logged by VictoriaMetrics components
or components in communication with VictoriaMetrics (e.g. Prometheus, Grafana).
validations:
required: false
- type: textarea
id: screenshots
attributes:
label: Screenshots
description: |
If applicable, add screenshots to help explain your problem.
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
validations:
required: false
- type: textarea
id: flags
attributes:
label: Used command-line flags
description: |
Please provide the command-line flags used for running VictoriaMetrics and its components.
validations:
required: false
- type: textarea
id: additional-info
attributes:
label: Additional information
placeholder: |
Additional information that doesn't fit elsewhere
validations:
required: false

View File

@@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: Ask on Slack
url: https://slack.victoriametrics.com/
about: You can ask for help here!

View File

@@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@@ -0,0 +1,43 @@
name: Feature request
description: Suggest an idea for this project
labels: [enhancement]
body:
- type: textarea
id: describe-the-problem
attributes:
label: Is your feature request related to a problem? Please describe
description: |
A clear and concise description of what the problem is.
placeholder: |
Ex. I'm always frustrated when [...]
validations:
required: false
- type: textarea
id: describe-the-solution
attributes:
label: Describe the solution you'd like
description: |
A clear and concise description of what you want to happen.
validations:
required: true
- type: textarea
id: alternative-solutions
attributes:
label: Describe alternatives you've considered
description: |
A clear and concise description of any alternative solutions or features you've considered.
placeholder: |
I have tried to do `A`, but that doesn't solve a problem completely.
I have tried to do `A` and `B`, but implementing this would be better.
validations:
required: false
- type: textarea
id: feature-additional-info
attributes:
label: Additional information
description: |
Additional information which you consider helpful for implementing this feature.
placeholder: |
Add any other context or screenshots about the feature request here.
validations:
required: false

View File

@@ -17,7 +17,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.4
go-version: 1.21.0
id: go
- name: Code checkout
uses: actions/checkout@master

View File

@@ -0,0 +1,46 @@
name: "CodeQL - JS"
on:
push:
branches: [master, cluster]
paths:
- "**.js"
pull_request:
# The branches below must be a subset of the branches above
branches: [master, cluster]
paths:
- "**.js"
schedule:
- cron: "30 18 * * 2"
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ["javascript"]
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "javascript"

View File

@@ -13,12 +13,26 @@ name: "CodeQL"
on:
push:
branches: [ master, cluster ]
branches: [master, cluster]
paths-ignore:
- "docs/**"
- "**.md"
- "**.txt"
- "**.js"
pull_request:
# The branches below must be a subset of the branches above
branches: [ master, cluster ]
branches: [master, cluster]
paths-ignore:
- "docs/**"
- "**.md"
- "**.txt"
- "**.js"
schedule:
- cron: '30 18 * * 2'
- cron: "30 18 * * 2"
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
analyze:
@@ -32,45 +46,47 @@ jobs:
strategy:
fail-fast: false
matrix:
language: [ 'go', 'javascript' ]
language: ["go"]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://git.io/codeql-language-support
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.19
if: ${{ matrix.language == 'go' }}
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.20.1
check-latest: true
cache: true
if: ${{ matrix.language == 'go' }}
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

View File

@@ -1,45 +1,96 @@
name: main
on:
push:
branches:
- master
- cluster
paths-ignore:
- 'docs/**'
- '**.md'
- "docs/**"
- "**.md"
pull_request:
branches:
- master
- cluster
paths-ignore:
- 'docs/**'
- '**.md'
- "docs/**"
- "**.md"
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: Build
lint:
name: lint
runs-on: ubuntu-latest
steps:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.4
id: go
- name: Code checkout
uses: actions/checkout@master
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.20.1
check-latest: true
cache: true
- name: Dependencies
run: |
make install-golint
make install-errcheck
make install-golangci-lint
- name: Build
run: |
export PATH=$PATH:$(go env GOPATH)/bin # temporary fix. See https://github.com/actions/setup-go/issues/14
make check-all
git diff --exit-code
make test-full
make test-pure
make test-full-386
make victoria-metrics-crossbuild
make vmuitils-crossbuild
test:
needs: lint
strategy:
matrix:
scenario: ["test-full", "test-pure", "test-full-386"]
name: test
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.20.1
check-latest: true
cache: true
- name: run tests
run: |
make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt
build:
needs: test
name: build
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
id: go
uses: actions/setup-go@v3
with:
go-version: 1.20.1
check-latest: true
cache: true
- uses: actions/cache@v3
with:
path: gocache-for-docker
key: gocache-docker-${{ runner.os }}-${{ steps.go.outputs.go-version }}-${{ hashFiles('go.mod') }}
- name: Build
run: |
make victoria-metrics-crossbuild
make vmuitils-crossbuild

View File

@@ -12,13 +12,37 @@ jobs:
name: Build
runs-on: ubuntu-latest
steps:
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.4
go-version: 1.20.1
id: go
- name: Setup docker scan
run: |
mkdir -p ~/.docker/cli-plugins && \
curl https://github.com/docker/scan-cli-plugin/releases/latest/download/docker-scan_linux_amd64 -L -s -S -o ~/.docker/cli-plugins/docker-scan &&\
chmod +x ~/.docker/cli-plugins/docker-scan
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Code checkout
uses: actions/checkout@master
- name: Publish
- uses: actions/cache@v3
with:
path: gocache-for-docker
key: gocache-docker-${{ runner.os }}-${{ steps.go.outputs.go-version }}-${{ hashFiles('go.mod') }}
- name: build & publish
run: |
docker scan --severity=medium --login --token "$SNYK_TOKEN" --accept-license
LATEST_TAG=nightly PKG_TAG=nightly make publish
env:
SNYK_TOKEN: ${{ secrets.SNYK_AUTH_TOKEN }}

15
.golangci.yml Normal file
View File

@@ -0,0 +1,15 @@
run:
timeout: 2m
enable:
- revive
issues:
exclude-rules:
- linters:
- staticcheck
text: "SA(4003|1019|5011):"
linters-settings:
errcheck:
exclude: ./errcheck_excludes.txt

View File

@@ -175,7 +175,7 @@
END OF TERMS AND CONDITIONS
Copyright 2019-2022 VictoriaMetrics, Inc.
Copyright 2019-2023 VictoriaMetrics, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@@ -144,6 +144,7 @@ vmutils-windows-amd64: \
vmctl-windows-amd64
victoria-metrics-crossbuild: \
victoria-metrics-linux-386 \
victoria-metrics-linux-amd64 \
victoria-metrics-linux-arm64 \
victoria-metrics-linux-arm \
@@ -155,6 +156,7 @@ victoria-metrics-crossbuild: \
victoria-metrics-openbsd-amd64
vmutils-crossbuild: \
vmutils-linux-386 \
vmutils-linux-amd64 \
vmutils-linux-arm64 \
vmutils-linux-arm \
@@ -167,16 +169,17 @@ vmutils-crossbuild: \
vmutils-windows-amd64
publish-release:
git checkout $(TAG) && $(MAKE) release publish && \
git checkout $(TAG)-cluster && $(MAKE) release publish && \
git checkout $(TAG)-enterprise && $(MAKE) release publish && \
git checkout $(TAG)-enterprise-cluster && $(MAKE) release publish
git checkout $(TAG) && LATEST_TAG=stable $(MAKE) release publish && \
git checkout $(TAG)-cluster && LATEST_TAG=cluster-stable $(MAKE) release publish && \
git checkout $(TAG)-enterprise && LATEST_TAG=enterprise-stable $(MAKE) release publish && \
git checkout $(TAG)-enterprise-cluster && LATEST_TAG=enterprise-cluster-stable $(MAKE) release publish
release: \
release-victoria-metrics \
release-vmutils
release-victoria-metrics: \
release-victoria-metrics-linux-386 \
release-victoria-metrics-linux-amd64 \
release-victoria-metrics-linux-arm \
release-victoria-metrics-linux-arm64 \
@@ -185,6 +188,10 @@ release-victoria-metrics: \
release-victoria-metrics-freebsd-amd64 \
release-victoria-metrics-openbsd-amd64
# adds i386 arch
release-victoria-metrics-linux-386:
GOOS=linux GOARCH=386 $(MAKE) release-victoria-metrics-goos-goarch
release-victoria-metrics-linux-amd64:
GOOS=linux GOARCH=amd64 $(MAKE) release-victoria-metrics-goos-goarch
@@ -216,6 +223,7 @@ release-victoria-metrics-goos-goarch: victoria-metrics-$(GOOS)-$(GOARCH)-prod
cd bin && rm -rf victoria-metrics-$(GOOS)-$(GOARCH)-prod
release-vmutils: \
release-vmutils-linux-386 \
release-vmutils-linux-amd64 \
release-vmutils-linux-arm64 \
release-vmutils-linux-arm \
@@ -225,6 +233,9 @@ release-vmutils: \
release-vmutils-openbsd-amd64 \
release-vmutils-windows-amd64
release-vmutils-linux-386:
GOOS=linux GOARCH=386 $(MAKE) release-vmutils-goos-goarch
release-vmutils-linux-amd64:
GOOS=linux GOARCH=amd64 $(MAKE) release-vmutils-goos-goarch
@@ -315,21 +326,7 @@ vet:
go vet ./lib/...
go vet ./app/...
lint: install-golint
golint lib/...
golint app/...
install-golint:
which golint || go install golang.org/x/lint/golint@latest
errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./lib/...
errcheck -exclude=errcheck_excludes.txt ./app/...
install-errcheck:
which errcheck || go install github.com/kisielk/errcheck@latest
check-all: fmt vet lint errcheck golangci-lint govulncheck
check-all: fmt vet golangci-lint govulncheck
test:
go test ./lib/... ./app/...
@@ -380,10 +377,10 @@ install-qtc:
golangci-lint: install-golangci-lint
golangci-lint run --exclude '(SA4003|SA1019|SA5011):' -D errcheck -D structcheck --timeout 2m
golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.48.0
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.51.1
govulncheck: install-govulncheck
govulncheck ./...

180
README.md
View File

@@ -40,19 +40,36 @@ VictoriaMetrics has the following prominent features:
* It can be used as a drop-in replacement for Prometheus in Grafana, because it supports [Prometheus querying API](#prometheus-querying-api-usage).
* It can be used as a drop-in replacement for Graphite in Grafana, because it supports [Graphite API](#graphite-api-usage).
* It features easy setup and operation:
* VictoriaMetrics consists of a single [small executable](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d) without external dependencies.
* VictoriaMetrics consists of a single [small executable](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d)
without external dependencies.
* All the configuration is done via explicit command-line flags with reasonable defaults.
* All the data is stored in a single directory pointed by `-storageDataPath` command-line flag.
* Easy and fast backups from [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) to S3 or GCS can be done with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) / [vmrestore](https://docs.victoriametrics.com/vmrestore.html) tools. See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
* It implements PromQL-based query language - [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which provides improved functionality on top of PromQL.
* Easy and fast backups from [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
can be done with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) / [vmrestore](https://docs.victoriametrics.com/vmrestore.html) tools.
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
* It implements PromQL-like query language - [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which provides improved functionality on top of PromQL.
* It provides global query view. Multiple Prometheus instances or any other data sources may ingest data into VictoriaMetrics. Later this data may be queried via a single query.
* It provides high performance and good vertical and horizontal scalability for both [data ingestion](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b) and [data querying](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4). It [outperforms InfluxDB and TimescaleDB by up to 20x](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).
* It [uses 10x less RAM than InfluxDB](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) and [up to 7x less RAM than Prometheus, Thanos or Cortex](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f) when dealing with millions of unique time series (aka [high cardinality](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality)).
* It provides high performance and good vertical and horizontal scalability for both
[data ingestion](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b)
and [data querying](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4).
It [outperforms InfluxDB and TimescaleDB by up to 20x](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).
* It [uses 10x less RAM than InfluxDB](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893)
and [up to 7x less RAM than Prometheus, Thanos or Cortex](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f)
when dealing with millions of unique time series (aka [high cardinality](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality)).
* It is optimized for time series with [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate).
* It provides high data compression, so [up to 70x more data points](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4) may be crammed into limited storage comparing to TimescaleDB and [up to 7x less storage space is required compared to Prometheus, Thanos or Cortex](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f).
* It is optimized for storage with high-latency IO and low IOPS (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc). See [disk IO graphs from these benchmarks](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b).
* A single-node VictoriaMetrics may substitute moderately sized clusters built with competing solutions such as Thanos, M3DB, Cortex, InfluxDB or TimescaleDB. See [vertical scalability benchmarks](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae), [comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683) and [Remote Write Storage Wars](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) talk from [PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/).
* It protects the storage from data corruption on unclean shutdown (i.e. OOM, hardware reset or `kill -9`) thanks to [the storage architecture](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
* It provides high data compression, so up to 70x more data points may be stored into limited storage comparing to TimescaleDB
according to [these benchmarks](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4)
and up to 7x less storage space is required compared to Prometheus, Thanos or Cortex
according to [this benchmark](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f).
* It is optimized for storage with high-latency IO and low IOPS (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc).
See [disk IO graphs from these benchmarks](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b).
* A single-node VictoriaMetrics may substitute moderately sized clusters built with competing solutions such as Thanos, M3DB, Cortex, InfluxDB or TimescaleDB.
See [vertical scalability benchmarks](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae),
[comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683)
and [Remote Write Storage Wars](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) talk
from [PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/).
* It protects the storage from data corruption on unclean shutdown (i.e. OOM, hardware reset or `kill -9`) thanks to
[the storage architecture](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
* It supports metrics' scraping, ingestion and [backfilling](#backfilling) via the following protocols:
* [Metrics scraping from Prometheus exporters](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* [Prometheus remote write API](#prometheus-setup).
@@ -65,11 +82,15 @@ VictoriaMetrics has the following prominent features:
* [Arbitrary CSV data](#how-to-import-csv-data).
* [Native binary format](#how-to-import-data-in-native-format).
* [DataDog agent or DogStatsD](#how-to-send-data-from-datadog-agent).
* It supports powerful [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html), which can be used as a [statsd](https://github.com/statsd/statsd) alternative.
* It supports metrics [relabeling](#relabeling).
* It can deal with [high cardinality issues](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality) and [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) issues via [series limiter](#cardinality-limiter).
* It ideally works with big amounts of time series data from APM, Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data and various [Enterprise workloads](https://docs.victoriametrics.com/enterprise.html).
* It can deal with [high cardinality issues](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality) and
[high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) issues via [series limiter](#cardinality-limiter).
* It ideally works with big amounts of time series data from APM, Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data
and various [Enterprise workloads](https://docs.victoriametrics.com/enterprise.html).
* It has open source [cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
* It can store data on [NFS-based storages](https://en.wikipedia.org/wiki/Network_File_System) such as [Amazon EFS](https://aws.amazon.com/efs/) and [Google Filestore](https://cloud.google.com/filestore).
* It can store data on [NFS-based storages](https://en.wikipedia.org/wiki/Network_File_System) such as [Amazon EFS](https://aws.amazon.com/efs/)
and [Google Filestore](https://cloud.google.com/filestore).
See also [various Articles about VictoriaMetrics](https://docs.victoriametrics.com/Articles.html).
@@ -84,7 +105,7 @@ Case studies:
* [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch)
* [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern)
* [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl)
* [Dreamteam](https://docs.victoriametrics.com/CaseStudies.html#dreamteam)
* [Dig Security](https://docs.victoriametrics.com/CaseStudies.html#dig-security)
* [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio)
* [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence)
* [Grammarly](https://docs.victoriametrics.com/CaseStudies.html#grammarly)
@@ -271,6 +292,8 @@ Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](
VictoriaMetrics provides UI for query troubleshooting and exploration. The UI is available at `http://victoriametrics:8428/vmui`.
The UI allows exploring query results via graphs and tables.
It also provides the following features:
- [metrics explorer](#metrics-explorer)
- [cardinality explorer](#cardinality-explorer)
- [query tracer](#query-tracing)
- [top queries explorer](#top-queries)
@@ -306,9 +329,21 @@ See the [example VMUI at VictoriaMetrics playground](https://play.victoriametric
* queries with the biggest average execution duration;
* queries that took the most summary time for execution.
## Metrics explorer
[VMUI](#vmui) provides an ability to explore metrics exported by a particular `job` / `instance` in the following way:
1. Open the `vmui` at `http://victoriametrics:8428/vmui/`.
2. Click the `Explore metrics` tab.
3. Select the `job` you want to explore.
4. Optionally select the `instance` for the selected job to explore.
5. Select metrics you want to explore and compare.
It is possible to change the selected time range for the graphs in the top right corner.
## Cardinality explorer
VictoriaMetrics provides an ability to explore time series cardinality at `cardinality` tab in [vmui](#vmui) in the following ways:
VictoriaMetrics provides an ability to explore time series cardinality at `Explore cardinality` tab in [vmui](#vmui) in the following ways:
- To identify metric names with the highest number of series.
- To identify labels with the highest number of series.
@@ -363,7 +398,7 @@ DataDog agent allows configuring destinations for metrics sending via ENV variab
or via [configuration file](https://docs.datadoghq.com/agent/guide/agent-configuration-files/) in section `dd_url`.
<p align="center">
<img src="Single-server-VictoriaMetrics-sending_DD_metrics_to_VM.png" width="800">
<img src="docs/Single-server-VictoriaMetrics-sending_DD_metrics_to_VM.png" width="800">
</p>
To configure DataDog agent via ENV variable add the following prefix:
@@ -397,7 +432,7 @@ DataDog allows configuring [Dual Shipping](https://docs.datadoghq.com/agent/guid
sending via ENV variable `DD_ADDITIONAL_ENDPOINTS` or via configuration file `additional_endpoints`.
<p align="center">
<img src="Single-server-VictoriaMetrics-sending_DD_metrics_to_VM_and_DD.png" width="800">
<img src="docs/Single-server-VictoriaMetrics-sending_DD_metrics_to_VM_and_DD.png" width="800">
</p>
Run DataDog using the following ENV variable with VictoriaMetrics as additional metrics receiver:
@@ -701,8 +736,7 @@ VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query
VictoriaMetrics accepts optional `extra_filters[]=series_selector` query arg, which can be used for enforcing arbitrary label filters for queries. For example,
`/api/v1/query_range?extra_filters[]={env=~"prod|staging",user="xyz"}&query=<query>` would automatically add `{env=~"prod|staging",user="xyz"}` label filters to the given `<query>`. This functionality can be used for limiting the scope of time series visible to the given tenant. It is expected that the `extra_filters[]` query args are automatically set by auth proxy sitting in front of VictoriaMetrics. See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
VictoriaMetrics accepts multiple formats for `time`, `start` and `end` query args - see [these docs](#timestamp-formats).
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
@@ -727,6 +761,18 @@ Additionally, VictoriaMetrics provides the following handlers:
For example, request to `/api/v1/status/top_queries?topN=5&maxLifetime=30s` would return up to 5 queries per list, which were executed during the last 30 seconds.
VictoriaMetrics tracks the last `-search.queryStats.lastQueriesCount` queries with durations at least `-search.queryStats.minQueryDuration`.
### Timestamp formats
VictoriaMetrics accepts the following formats for `time`, `start` and `end` query args
in [query APIs](https://docs.victoriametrics.com/#prometheus-querying-api-usage) and
in [export APIs](https://docs.victoriametrics.com/#how-to-export-time-series).
- Unix timestamps in seconds with optional milliseconds after the point. For example, `1562529662.678`.
- [RFC3339](https://www.ietf.org/rfc/rfc3339.txt). For example, '2022-03-29T01:02:03Z`.
- Partial RFC3339. Examples: `2022`, `2022-03`, `2022-03-29`, `2022-03-29T01`, `2022-03-29T01:02`.
- Relative duration comparing to the current time. For example, `1h5m` means `one hour and five minutes ago`.
## Graphite API usage
VictoriaMetrics supports data ingestion in Graphite protocol - see [these docs](#how-to-send-data-from-graphite-compatible-agents-such-as-statsd) for details.
@@ -946,12 +992,13 @@ Each JSON line contains samples for a single time series. An example output:
{"metric":{"__name__":"up","job":"prometheus","instance":"localhost:9090"},"values":[1,1,1],"timestamps":[1549891461511,1549891476511,1549891491511]}
```
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data. These args may contain either
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
Optional `max_rows_per_line` arg may be added to the request for limiting the maximum number of rows exported per each JSON line.
@@ -994,12 +1041,13 @@ where:
* `<timeseries_selector_for_export>` may contain any [time series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors)
for metrics to export.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data. These args may contain either
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
The exported CSV data can be imported to VictoriaMetrics via [/api/v1/import/csv](#how-to-import-csv-data).
@@ -1021,12 +1069,13 @@ wget -O- -q 'http://your_victoriametrics_instance:8428/api/v1/series/count' | jq
# relaunch victoriametrics with search.maxExportSeries more than value from previous command
```
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data. These args may contain either
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
The exported data can be imported to VictoriaMetrics via [/api/v1/import/native](#how-to-import-data-in-native-format).
@@ -1037,7 +1086,9 @@ The [deduplication](#deduplication) isn't applied for the data exported in nativ
## How to import time series data
Time series data can be imported into VictoriaMetrics via any supported data ingestion protocol:
VictoriaMetrics can discover and scrape metrics from Prometheus-compatible targets (aka "pull" protocol) -
see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
Additionally, VictoriaMetrics can accept metrics via the following popular data ingestion protocols (aka "push" protocols):
* [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). See [these docs](#prometheus-setup) for details.
* DataDog `submit metrics` API. See [these docs](#how-to-send-data-from-datadog-agent) for details.
@@ -1258,11 +1309,12 @@ VictoriaMetrics exports [Prometheus-compatible federation data](https://promethe
at `http://<victoriametrics-addr>:8428/federate?match[]=<timeseries_selector_for_federation>`.
Optional `start` and `end` args may be added to the request in order to scrape the last point for each selected time series on the `[start ... end]` interval.
`start` and `end` may contain either unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
By default, the last point on the interval `[now - max_lookback ... now]` is scraped for each time series. The default value for `max_lookback` is `5m` (5 minutes), but it can be overridden with `max_lookback` query arg.
For instance, `/federate?match[]=up&max_lookback=1h` would return last points on the `[now - 1h ... now]` interval. This may be useful for time series federation
@@ -1305,6 +1357,7 @@ By default VictoriaMetrics is tuned for an optimal resource usage under typical
- `-search.maxSamplesPerQuery` limits the number of raw samples a single query can process. This allows limiting CPU usage for heavy queries.
- `-search.maxPointsPerTimeseries` limits the number of calculated points, which can be returned per each matching time series from [range query](https://docs.victoriametrics.com/keyConcepts.html#range-query).
- `-search.maxPointsSubqueryPerTimeseries` limits the number of calculated points, which can be generated per each matching time series during [subquery](https://docs.victoriametrics.com/MetricsQL.html#subqueries) evaluation.
- `-search.maxSeriesPerAggrFunc` limits the number of time series, which can be generated by [MetricsQL aggregate functions](https://docs.victoriametrics.com/MetricsQL.html#aggregate-functions) in a single query.
- `-search.maxSeries` limits the number of time series, which may be returned from [/api/v1/series](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers). This endpoint is used mostly by Grafana for auto-completion of metric names, label names and label values. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxSeries` to quite low value in order limit CPU and memory usage.
- `-search.maxTagKeys` limits the number of items, which may be returned from [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names). This endpoint is used mostly by Grafana for auto-completion of label names. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxTagKeys` to quite low value in order to limit CPU and memory usage.
- `-search.maxTagValues` limits the number of items, which may be returned from [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values). This endpoint is used mostly by Grafana for auto-completion of label values. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxTagValues` to quite low value in order to limit CPU and memory usage.
@@ -1426,8 +1479,8 @@ This increases overhead during data querying, since VictoriaMetrics needs to rea
bigger number of parts per each request. That's why it is recommended to have at least 20%
of free disk space under directory pointed by `-storageDataPath` command-line flag.
Information about merging process is available in [the dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
and [the dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176).
Information about merging process is available in [the dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
and [the dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/).
See more details in [monitoring docs](#monitoring).
See [this article](https://valyala.medium.com/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) for more details.
@@ -1565,7 +1618,9 @@ VictoriaMetrics provides the following security-related command-line flags:
Explicitly set internal network interface for TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats.
For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<internal_iface_ip>:2003`. This protects from unexpected requests from untrusted network interfaces.
VictoriaMetrics has achieved security certifications for Database Software Development and Software-Based Monitoring Services. We apply strict security measures in everything we do. See our [Security page](https://victoriametrics.com/security/) for more details.
See also [security recommendation for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#security)
and [the general security page at VictoriaMetrics website](https://victoriametrics.com/security/).
## Tuning
@@ -1594,8 +1649,8 @@ Alternatively, single-node VictoriaMetrics can self-scrape the metrics when `-se
set to duration greater than 0. For example, `-selfScrapeInterval=10s` would enable self-scraping of `/metrics` page
with 10 seconds interval.
Official Grafana dashboards available for [single-node](https://grafana.com/dashboards/10229)
and [clustered](https://grafana.com/grafana/dashboards/11176) VictoriaMetrics.
Official Grafana dashboards available for [single-node](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
and [clustered](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/) VictoriaMetrics.
See an [alternative dashboard for clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11831)
created by community.
@@ -1844,8 +1899,8 @@ The following metrics for each type of cache are exported at [`/metrics` page](#
* `vm_cache_misses_total` - the number of cache misses
* `vm_cache_entries` - the number of entries in the cache
Both Grafana dashboards for [single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176)
Both Grafana dashboards for [single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
contain `Caches` section with cache metrics visualized. The panels show the current
memory usage by each type of cache, and also a cache hit rate. If hit rate is close to 100%
then cache efficiency is already very high and does not need any tuning.
@@ -2121,7 +2176,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -graphiteListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
@@ -2141,7 +2198,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8428")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 104857600)
@@ -2154,7 +2213,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-influxDBLabel string
Default label for the DB name sent over '?db={db_name}' query parameter (default "db")
-influxListenAddr string
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write . See also -influxListenAddr.useProxyProtocol
-influxListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -influxListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
-influxSkipMeasurement
@@ -2166,7 +2227,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-inmemoryDataFlushInterval duration
The interval for guaranteed saving of in-memory data to disk. The saved data survives unclean shutdown such as OOM crash, hardware reset, SIGKILL, etc. Bigger intervals may help increasing lifetime of flash storage with limited write cycles (e.g. Raspberry PI). Smaller intervals increase disk IO load. Minimum supported value is 1s (default 5s)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-logNewSeries
Whether to log new series. This option is for debug purposes only. It can lead to performance issues when big number of new series are ingested into VictoriaMetrics
-loggerDisableTimestamps
@@ -2175,6 +2238,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -2184,7 +2249,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 33554432)
@@ -2200,9 +2265,13 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
TCP and UDP address to listen for OpenTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
-opentsdbListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
@@ -2270,6 +2339,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
How frequently to reload the full state from Kubernetes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs for details (default 30s)
-promscrape.kumaSDCheckInterval duration
Interval for checking for changes in kuma service discovery. This works only if kuma_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxResponseHeadersSize size
@@ -2283,6 +2354,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1000000)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.nomad.waitTime duration
Wait time used by Nomad service discovery. Default value is used if not set
-promscrape.nomadSDCheckInterval duration
Interval for checking for changes in Nomad. This works only if nomad_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#nomad_sd_configs for details (default 30s)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#openstack_sd_configs for details (default 30s)
-promscrape.seriesLimitPerTarget int
@@ -2327,8 +2402,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The interval between datapoints stored in the database. It is used at Graphite Render API handler for normalizing the interval between datapoints in case it isn't normalized. It can be overridden by sending 'storage_step' query arg to /render API or by sending the desired interval via 'Storage-Step' http header during querying /render API. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default 10s)
-search.latencyOffset duration
The time when data points become visible in query results after the collection. It can be overridden on per-query basis via latency_offset arg. Too small value can result in incomplete last points for query results (default 30s)
-search.logQueryMemoryUsage size
Log queries, which require more memory than specified by this flag. This may help detecting and optimizing heavy queries. Query logging is disabled by default. See also -search.logSlowQueryDuration and -search.maxMemoryPerQuery
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
Log queries with execution time exceeding this value. Zero disables slow query logging. See also -search.logQueryMemoryUsage (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. See also -search.maxQueueDuration and -search.maxMemoryPerQuery (default 8)
-search.maxExportDuration duration
@@ -2342,7 +2420,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-search.maxLookback duration
Synonym to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxMemoryPerQuery size
The maximum amounts of memory a single query may consume. Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests
The maximum amounts of memory a single query may consume. Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests . See also -search.logQueryMemoryUsage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as VMUI or Grafana. There is no sense in setting this limit to values bigger than the horizontal resolution of the graph (default 30000)
@@ -2361,6 +2439,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage (default 30000000)
-search.maxSeries int
The maximum number of time series, which can be returned from /api/v1/series. This option allows limiting memory usage (default 30000)
-search.maxSeriesPerAggrFunc int
The maximum number of time series an aggregate MetricsQL function can generate (default 1000000)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.setLookbackToStep' flag
-search.maxStatusRequestDuration duration
@@ -2427,6 +2507,12 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 10000000)
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-streamAggr.config string
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html . See also -remoteWrite.streamAggr.keepInput and -streamAggr.dedupInterval
-streamAggr.dedupInterval duration
Input samples are de-duplicated with this interval before being aggregated. Only the last sample per each time series per each interval is aggregated if the interval is greater than zero
-streamAggr.keepInput
Whether to keep input samples after the aggregation with -streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is stored. See https://docs.victoriametrics.com/stream-aggregation.html
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
@@ -2444,4 +2530,6 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Show VictoriaMetrics version
-vmalert.proxyURL string
Optional URL for proxying requests to vmalert. For example, if -vmalert.proxyURL=http://vmalert:8880 , then alerting API requests such as /api/v1/rules from Grafana will be proxied to http://vmalert:8880/api/v1/rules
-vmui.customDashboardsPath string
Optional path to vmui dashboards. See https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmui/packages/vmui/public/dashboards
```

View File

@@ -24,7 +24,9 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the last sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication and https://docs.victoriametrics.com/#downsampling")
dryRun = flag.Bool("dryRun", false, "Whether to check only -promscrape.config and then exit. "+
@@ -64,7 +66,7 @@ func main() {
vminsert.Init()
startSelfScraper()
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
logger.Infof("started VictoriaMetrics in %.3f seconds", time.Since(startTime).Seconds())
sig := procutil.WaitForSigterm()
@@ -90,7 +92,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != "GET" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")

View File

@@ -129,10 +129,10 @@ func setUp() {
storagePath = filepath.Join(os.TempDir(), testStorageSuffix)
processFlags()
logger.Init()
vmstorage.InitWithoutMetrics(promql.ResetRollupResultCacheIfNeeded)
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
vmselect.Init()
vminsert.Init()
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, false, requestHandler)
readyStorageCheckFunc := func() bool {
resp, err := http.Get(testHealthHTTPPath)
if err != nil {
@@ -189,10 +189,8 @@ func tearDown() {
func TestWriteRead(t *testing.T) {
t.Run("write", testWrite)
vmstorage.Storage.DebugFlush()
time.Sleep(1 * time.Second)
vmstorage.Stop()
// open storage after stop in write
vmstorage.InitWithoutMetrics(promql.ResetRollupResultCacheIfNeeded)
t.Run("read", testRead)
}
@@ -261,7 +259,7 @@ func testRead(t *testing.T) {
for _, q := range test.Query {
q = testutil.PopulateTimeTplString(q, insertionTime)
if test.Issue != "" {
test.Issue = "Regression in " + test.Issue
test.Issue = "\nRegression in " + test.Issue
}
switch true {
case strings.HasPrefix(q, "/api/v1/export"):
@@ -284,7 +282,7 @@ func testRead(t *testing.T) {
queryResult := Query{}
httpReadStruct(t, testReadHTTPPath, q, &queryResult)
if err := checkQueryResult(queryResult, test.ResultQuery); err != nil {
t.Fatalf("Query. %s fails with error %s.%s", q, err, test.Issue)
t.Fatalf("Query. %s fails with error: %s.%s", q, err, test.Issue)
}
default:
t.Fatalf("unsupported read query %s", q)

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -12,7 +12,7 @@ func TestPopulateTimeTplString(t *testing.T) {
}
f := func(s, resultExpected string) {
t.Helper()
result := PopulateTimeTplString(s, now)
result := PopulateTimeTplString(s, now.UTC())
if result != resultExpected {
t.Fatalf("unexpected result; got %q; want %q", result, resultExpected)
}

View File

@@ -6,7 +6,7 @@
"forms_daily_count;item=x 2 {TIME_S-2m}",
"forms_daily_count;item=y 3 {TIME_S-1m}",
"forms_daily_count;item=y 4 {TIME_S-2m}"],
"query": ["/api/v1/query?query=min%20by%20(item)%20(min_over_time(forms_daily_count[10m:1m]))&time={TIME_S-1m}"],
"query": ["/api/v1/query?query=min%20by%20(item)%20(min_over_time(forms_daily_count[10m:1m]))&time={TIME_S-1m}&latency_offset=1ms"],
"result_query": {
"status":"success",
"data":{"resultType":"vector","result":[{"metric":{"item":"x"},"value":["{TIME_S-1m}","2"]},{"metric":{"item":"y"},"value":["{TIME_S-1m}","4"]}]}

View File

@@ -24,8 +24,10 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
see [these docs](https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
* Can accept data via all the ingestion protocols supported by VictoriaMetrics - see [these docs](#how-to-push-data-to-vmagent).
* Can replicate collected metrics simultaneously to multiple remote storage systems -
see [these docs](#replication-and-high-availability).
* Can aggregate incoming samples by time and by labels before sending them to remote storage - see [these docs](https://docs.victoriametrics.com/stream-aggregation.html).
* Can replicate collected metrics simultaneously to multiple Prometheus-compatible remote storage systems - see [these docs](#replication-and-high-availability).
* Can save egress network bandwidth usage costs when [VictoriaMetrics remote write protocol](#victoriametrics-remote-write-protocol)
is used for sending the data to VictoriaMetrics.
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
@@ -45,14 +47,15 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
Please download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) (
`vmagent` is also available in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags)),
unpack it and pass the following flags to the `vmagent` binary in order to start scraping Prometheus-compatible targets:
unpack it and pass the following flags to the `vmagent` binary in order to start scraping Prometheus-compatible targets
and sending the data to the Prometheus-compatible remote storage:
* `-promscrape.config` with the path to Prometheus config file (usually located at `/etc/prometheus/prometheus.yml`).
* `-promscrape.config` with the path to [Prometheus config file](https://docs.victoriametrics.com/sd_configs.html) (usually located at `/etc/prometheus/prometheus.yml`).
The path can point either to local file or to http url. `vmagent` doesn't support some sections of Prometheus config file,
so you may need either to delete these sections or to run `vmagent` with `-promscrape.config.strictParse=false` command-line flag.
In this case `vmagent` ignores unsupported sections. See [the list of unsupported sections](#unsupported-prometheus-config-sections).
* `-remoteWrite.url` with the remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified
multiple times to replicate data concurrently to an arbitrary number of remote storage systems. See [various use cases](#use-cases).
* `-remoteWrite.url` with Prometheus-compatible remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified
multiple times to replicate data concurrently to multiple remote storage systems. See [various use cases](#use-cases).
Example command line:
@@ -73,6 +76,8 @@ and sending it to the provided `-remoteWrite.url`:
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
`vmagent` can save network bandwidth usage costs under high load when [VictoriaMetrics remote write protocol is enabled](#victoriametrics-remote-write-protocol).
See [troubleshooting docs](#troubleshooting) if you encounter common issues with `vmagent`.
Pass `-help` to `vmagent` in order to see [the full list of supported command-line flags with their descriptions](#advanced-usage).
@@ -118,7 +123,8 @@ data to the remote storage. It re-tries sending the data to remote storage until
The maximum on-disk size for the buffered metrics can be limited with `-remoteWrite.maxDiskUsagePerURL`.
`vmagent` works on various architectures from the IoT world - 32-bit arm, 64-bit arm, ppc64, 386, amd64.
See [the corresponding Makefile rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmagent/Makefile) for details.
The `vmagent` can save network bandwidth usage costs by using [VictoriaMetrics remote write protocol](#victoriametrics-remote-write-protocol).
### Drop-in replacement for Prometheus
@@ -126,6 +132,12 @@ If you use Prometheus only for scraping metrics from various targets and forward
then `vmagent` can replace Prometheus. Typically, `vmagent` requires lower amounts of RAM, CPU and network bandwidth compared with Prometheus.
See [these docs](#how-to-collect-metrics-in-prometheus-format) for details.
### Statsd alternative
`vmagent` can be used as an alternative to [statsd](https://github.com/statsd/statsd)
when [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) is enabled.
See [these docs](https://docs.victoriametrics.com/stream-aggregation.html#statsd-alternative) for details.
### Flexible metrics relay
`vmagent` can accept metrics in [various popular data ingestion protocols](#how-to-push-data-to-vmagent), apply [relabeling](#relabeling)
@@ -167,6 +179,28 @@ the `-remoteWrite.url` command-line flag should be configured as `<schema>://<vm
according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
There is also support for multitenant writes. See [these docs](#multitenancy).
## VictoriaMetrics remote write protocol
`vmagent` supports sending data to the configured `-remoteWrite.url` either via Prometheus remote write protocol
or via VictoriaMetrics remote write protocol.
VictoriaMetrics remote write protocol provides the following benefits comparing to Prometheus remote write protocol:
- Reduced network bandwidth usage by 2x-5x. This allows saving network bandwidth usage costs when `vmagent` and
the configured remote storage systems are located in different datacenters, availability zones or regions.
- Reduced disk read/write IO and disk space usage at `vmagent` when the remote storage is temporarily unavailable.
In this case `vmagent` buffers the incoming data to disk using the VictoriaMetrics remote write format.
This reduces disk read/write IO and disk space usage by 2x-5x comparing to Prometheus remote write format.
`vmagent` automatically uses VictoriaMetrics remote write protocol when it sends data to VictoriaMetrics components such as other `vmagent` instances,
[single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html)
or `vminsert` at [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
`vmagent` automatically switches to Prometheus remote write protocol when it sends data to old versions of VictoriaMetrics components
or to other Prometheus-compatible remote storage systems. It is possible to force switch to Prometheus remote write protocol
by specifying `-remoteWrite.forcePromProto` command-line flag for the corresponding `-remoteWrite.url`.
## Multitenancy
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`.
@@ -316,7 +350,7 @@ Extra labels can be added to metrics collected by `vmagent` via the following me
## Automatically generated metrics
`vmagent` automatically generates the following metrics per each scrape of every [Prometheus-compatible target](#how-to-collect-metrics-in-prometheus-format)
and attaches target-specific `instance` and `job` labels to these metrics:
and attaches `instance`, `job` and other target-specific labels to these metrics:
* `up` - this metric exposes `1` value on successful scrape and `0` value on unsuccessful scrape. This allows monitoring
failing scrapes with the following [MetricsQL query](https://docs.victoriametrics.com/MetricsQL.html):
@@ -598,8 +632,8 @@ provide the following tools for debugging target-level and metric-level relabeli
- Target-level debugging (e.g. `relabel_configs` section at [scrape_configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs))
can be performed by navigating to `http://vmagent:8429/targets` page (`http://victoriametrics:8428/targets` page for single-node VictoriaMetrics)
and clicking the `debug` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual relabeling rules applied to the target labels.
and clicking the `debug target relabeling` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual target relabeling rules applied to the discovered target labels.
The `http://vmagent:8429/targets` page shows only active targets. If you need to understand why some target
is dropped during the relabeling, then navigate to `http://vmagent:8428/service-discovery` page
@@ -608,11 +642,9 @@ provide the following tools for debugging target-level and metric-level relabeli
which result to target drop.
- Metric-level debugging (e.g. `metric_relabel_configs` section at [scrape_configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs)
and all the relabeling, which can be set up via `-relabelConfig`, `-remoteWrite.relabelConfig` and `-remoteWrite.urlRelabelConfig`
command-line flags) can be performed by navigating to `http://vmagent:8429/metric-relabel-debug` page
(`http://victoriametrics:8428/metric-relabel-debug` page for single-node VictoriaMetrics)
and submitting there relabeling rules together with the metric to be relabeled.
The page will show step-by-step results for the entered relabeling rules executed against the entered metric.
can be performed by navigating to `http://vmagent:8429/targets` page (`http://victoriametrics:8428/targets` page for single-node VictoriaMetrics)
and clicking the `debug metrics relabeling` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual metric relabeling rules applied to the given target labels.
## Prometheus staleness markers
@@ -757,7 +789,8 @@ scrape_configs:
## Cardinality limiter
By default `vmagent` doesn't limit the number of time series each scrape target can expose. The limit can be enforced in the following places:
By default `vmagent` doesn't limit the number of time series each scrape target can expose.
The limit can be enforced in the following places:
* Via `-promscrape.seriesLimitPerTarget` command-line option. This limit is applied individually
to all the scrape targets defined in the file pointed by `-promscrape.config`.
@@ -768,10 +801,7 @@ By default `vmagent` doesn't limit the number of time series each scrape target
via [Kubernetes annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) for targets,
which may expose too high number of time series.
See also `sample_limit` option at [scrape_config section](https://docs.victoriametrics.com/sd_configs.html#scrape_configs).
Scraped metrics are dropped for time series exceeding the given limit.
Scraped metrics are dropped for time series exceeding the given limit on the time window of 24h.
`vmagent` creates the following additional per-target metrics for targets with non-zero series limit:
- `scrape_series_limit_samples_dropped` - the number of dropped samples during the scrape when the unique series limit is exceeded.
@@ -785,6 +815,7 @@ These metrics allow building the following alerting rules:
- `scrape_series_current / scrape_series_limit > 0.9` - alerts when the number of series exposed by the target reaches 90% of the limit.
- `sum_over_time(scrape_series_limit_samples_dropped[1h]) > 0` - alerts when some samples are dropped because the series limit on a particular target is reached.
See also `sample_limit` option at [scrape_config section](https://docs.victoriametrics.com/sd_configs.html#scrape_configs).
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`.
The limit can be enforced by setting the following command-line flags:
@@ -816,7 +847,7 @@ See also [cardinality explorer docs](https://docs.victoriametrics.com/#cardinali
We recommend setting up regular scraping of this page either through `vmagent` itself or by Prometheus
so that the exported metrics may be analyzed later.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683-victoriametrics-vmagent/) for `vmagent` state overview.
Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add a review to the dashboard.
@@ -875,7 +906,7 @@ If you have suggestions for improvements or have found a bug - please open an is
The number of dropped blocks can be monitored via `vmagent_remotewrite_packets_dropped_total` metric exported at [/metrics page](#monitoring).
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling).
Such storage systems include Prometheus, Cortex and Thanos, which typically emit `out of order sample` errors.
Such storage systems include Prometheus, Mimir, Cortex and Thanos, which typically emit `out of order sample` errors.
The best solution is to use remote storage with [backfilling support](https://docs.victoriametrics.com/#backfilling) such as VictoriaMetrics.
* `vmagent` buffers scraped data at the `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
@@ -1156,7 +1187,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -graphiteListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
@@ -1176,7 +1209,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -httpListenAddr.useProxyProtocol (default ":8429")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 104857600)
@@ -1189,7 +1224,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-influxDBLabel string
Default label for the DB name sent over '?db={db_name}' query parameter (default "db")
-influxListenAddr string
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write . See also -influxListenAddr.useProxyProtocol
-influxListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -influxListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
-influxSkipMeasurement
@@ -1199,7 +1236,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-influxTrimTimestamp duration
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-kafka.consumer.topic array
Kafka topic names for data consumption. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
Supports an array of values separated by comma or specified via multiple flags.
@@ -1232,6 +1271,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -1241,7 +1282,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 33554432)
@@ -1253,9 +1294,13 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
TCP and UDP address to listen for OpenTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
-opentsdbListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
@@ -1321,6 +1366,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
How frequently to reload the full state from Kubernetes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs for details (default 30s)
-promscrape.kumaSDCheckInterval duration
Interval for checking for changes in kuma service discovery. This works only if kuma_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxResponseHeadersSize size
@@ -1334,6 +1381,10 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1000000)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.nomad.waitTime duration
Wait time used by Nomad service discovery. Default value is used if not set
-promscrape.nomadSDCheckInterval duration
Interval for checking for changes in Nomad. This works only if nomad_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#nomad_sd_configs for details (default 30s)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#openstack_sd_configs for details (default 30s)
-promscrape.seriesLimitPerTarget int
@@ -1397,6 +1448,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.flushInterval duration
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
-remoteWrite.forcePromProto array
Whether to force Prometheus remote write protocol for sending data to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.headers array
Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'
Supports an array of values separated by comma or specified via multiple flags.
@@ -1443,7 +1497,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.relabelConfig string
Optional path to file with relabel_config entries. The path can point either to local file or to http url. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details
Optional path to file with relabeling configs, which are applied to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Supports array of values separated by comma or specified via multiple flags.
@@ -1455,6 +1509,15 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.significantFigures array
The number of significant figures to leave in metric values before writing them to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.config array
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html . See also -remoteWrite.streamAggr.keepInput and -remoteWrite.streamAggr.dedupInterval
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.dedupInterval array
Input samples are de-duplicated with this interval before being aggregated. Only the last sample per each time series per each interval is aggregated if the interval is greater than zero
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.keepInput array
Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. See https://docs.victoriametrics.com/stream-aggregation.html
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. By default system CA is used
Supports an array of values separated by comma or specified via multiple flags.
@@ -1473,10 +1536,10 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.tmpDataPath string
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
-remoteWrite.url array
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelConfig array
Optional path to relabel config for the corresponding -remoteWrite.url. The path can point either to local file or to http url
Optional path to relabel configs for the corresponding -remoteWrite.url. See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
Supports an array of values separated by comma or specified via multiple flags.
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit

View File

@@ -9,8 +9,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -26,10 +26,8 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -9,8 +9,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,11 +28,9 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return insertRows(at, series, extraLabels)
})
ce := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, ce, func(series []parser.Series) error {
return insertRows(at, series, extraLabels)
})
}

View File

@@ -7,7 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -20,9 +20,7 @@ var (
//
// See https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol
func InsertHandler(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, insertRows)
})
return stream.Parse(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -15,8 +15,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -37,10 +37,8 @@ var (
//
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
return stream.Parse(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
}
@@ -52,15 +50,13 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(at, db, rows, extraLabels)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return stream.Parse(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(at, db, rows, extraLabels)
})
}

View File

@@ -38,23 +38,35 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8429", "TCP address to listen for http connections. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write . "+
"See also -influxListenAddr.useProxyProtocol")
influxUseProxyProtocol = flag.Bool("influxListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -influxListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. "+
"See also -graphiteListenAddr.useProxyProtocol")
graphiteUseProxyProtocol = flag.Bool("graphiteListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -graphiteListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpenTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty")
opentsdbHTTPListenAddr = flag.String("opentsdbHTTPListenAddr", "", "TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmagent. The following files are checked: "+
"Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol")
opentsdbUseProxyProtocol = flag.Bool("opentsdbListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbHTTPListenAddr = flag.String("opentsdbHTTPListenAddr", "", "TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. "+
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig . "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
)
@@ -104,28 +116,27 @@ func main() {
startTime := time.Now()
remotewrite.Init()
common.StartUnmarshalWorkers()
writeconcurrencylimiter.Init()
if len(*influxListenAddr) > 0 {
influxServer = influxserver.MustStart(*influxListenAddr, func(r io.Reader) error {
influxServer = influxserver.MustStart(*influxListenAddr, *influxUseProxyProtocol, func(r io.Reader) error {
return influx.InsertHandlerForReader(r, false)
})
}
if len(*graphiteListenAddr) > 0 {
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, graphite.InsertHandler)
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, *graphiteUseProxyProtocol, graphite.InsertHandler)
}
if len(*opentsdbListenAddr) > 0 {
httpInsertHandler := getOpenTSDBHTTPInsertHandler()
opentsdbServer = opentsdbserver.MustStart(*opentsdbListenAddr, opentsdb.InsertHandler, httpInsertHandler)
opentsdbServer = opentsdbserver.MustStart(*opentsdbListenAddr, *opentsdbUseProxyProtocol, opentsdb.InsertHandler, httpInsertHandler)
}
if len(*opentsdbHTTPListenAddr) > 0 {
httpInsertHandler := getOpenTSDBHTTPInsertHandler()
opentsdbhttpServer = opentsdbhttpserver.MustStart(*opentsdbHTTPListenAddr, httpInsertHandler)
opentsdbhttpServer = opentsdbhttpserver.MustStart(*opentsdbHTTPListenAddr, *opentsdbHTTPUseProxyProtocol, httpInsertHandler)
}
promscrape.Init(remotewrite.Push)
if len(*httpListenAddr) > 0 {
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
}
logger.Infof("started vmagent in %.3f seconds", time.Since(startTime).Seconds())
@@ -197,7 +208,7 @@ func getAuthTokenFromPath(path string) (*auth.Token, error) {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != "GET" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")
@@ -225,7 +236,14 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
statusCode := http.StatusNoContent
if strings.HasPrefix(path, "/prometheus/api/v1/import/prometheus/metrics/job/") ||
strings.HasPrefix(path, "/api/v1/import/prometheus/metrics/job/") {
// Return 200 status code for pushgateway requests.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
statusCode = http.StatusOK
}
w.WriteHeader(statusCode)
return true
}
if strings.HasPrefix(path, "datadog/") {
@@ -235,6 +253,9 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
switch path {
case "/prometheus/api/v1/write", "/api/v1/write":
if common.HandleVMProtoServerHandshake(w, r) {
return true
}
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(nil, r); err != nil {
prometheusWriteErrors.Inc()
@@ -349,12 +370,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/prometheus/config", "/config":
if *configAuthKey != "" && r.FormValue("authKey") != *configAuthKey {
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("The provided authKey doesn't match -configAuthKey"),
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
return true
}
promscrapeConfigRequests.Inc()
@@ -363,12 +379,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/prometheus/api/v1/status/config", "/api/v1/status/config":
// See https://prometheus.io/docs/prometheus/latest/querying/api/#config
if *configAuthKey != "" && r.FormValue("authKey") != *configAuthKey {
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("The provided authKey doesn't match -configAuthKey"),
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
return true
}
promscrapeStatusConfigRequests.Inc()

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -10,9 +10,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,14 +30,12 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzip := req.Header.Get("Content-Encoding") == "gzip"
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return insertRows(at, block, extraLabels)
})
return stream.Parse(req.Body, isGzip, func(block *stream.Block) error {
return insertRows(at, block, extraLabels)
})
}
func insertRows(at *auth.Token, block *parser.Block, extraLabels []prompbmarshal.Label) error {
func insertRows(at *auth.Token, block *stream.Block, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)

View File

@@ -7,7 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -20,9 +20,7 @@ var (
//
// See http://opentsdb.net/docs/build/html/api_telnet/put.html
func InsertHandler(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, insertRows)
})
return stream.Parse(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -9,7 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -25,10 +25,8 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -1,17 +1,17 @@
package prometheusimport
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,20 +31,11 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
}, nil)
})
}
// InsertHandlerForReader processes metrics from given reader with optional gzip format
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, 0, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
}, nil)
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return stream.Parse(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
}, func(s string) {
httpserver.LogError(req, s)
})
}

View File

@@ -0,0 +1,60 @@
package prometheusimport
import (
"bytes"
"flag"
"log"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
)
var (
srv *httptest.Server
testOutput *bytes.Buffer
)
func TestInsertHandler(t *testing.T) {
setUp()
defer tearDown()
req := httptest.NewRequest(http.MethodPost, "/insert/0/api/v1/import/prometheus", bytes.NewBufferString(`{"foo":"bar"}
go_memstats_alloc_bytes_total 1`))
if err := InsertHandler(nil, req); err != nil {
t.Errorf("unxepected error %s", err)
}
expectedMsg := "cannot unmarshal Prometheus line"
if !strings.Contains(testOutput.String(), expectedMsg) {
t.Errorf("output %q should contain %q", testOutput.String(), expectedMsg)
}
}
func setUp() {
srv = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(204)
}))
flag.Parse()
remoteWriteFlag := "remoteWrite.url"
if err := flag.Lookup(remoteWriteFlag).Value.Set(srv.URL); err != nil {
log.Fatalf("unable to set %q with value %q, err: %v", remoteWriteFlag, srv.URL, err)
}
logger.Init()
common.StartUnmarshalWorkers()
remotewrite.Init()
testOutput = &bytes.Buffer{}
logger.SetOutputForTests(testOutput)
}
func tearDown() {
common.StopUnmarshalWorkers()
srv.Close()
logger.ResetOutputForTest()
tmpDataDir := flag.Lookup("remoteWrite.tmpDataPath").Value.String()
fs.MustRemoveAll(tmpDataDir)
}

View File

@@ -1,7 +1,6 @@
package promremotewrite
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
@@ -11,9 +10,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -29,19 +27,9 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
})
}
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(at *auth.Token, r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, nil)
})
isVMRemoteWrite := req.Header.Get("Content-Encoding") == "zstd"
return stream.Parse(req.Body, isVMRemoteWrite, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
}

View File

@@ -67,10 +67,11 @@ var (
)
type client struct {
sanitizedURL string
remoteWriteURL string
fq *persistentqueue.FastQueue
hc *http.Client
sanitizedURL string
remoteWriteURL string
isVMRemoteWrite bool
fq *persistentqueue.FastQueue
hc *http.Client
sendBlock func(block []byte) bool
authCfg *promauth.Config
@@ -92,7 +93,7 @@ type client struct {
stopCh chan struct{}
}
func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int, isVMRemoteWrite bool) *client {
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Panicf("FATAL: cannot initialize auth config for remoteWrite.url=%q: %s", remoteWriteURL, err)
@@ -122,17 +123,19 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
}
tr.Proxy = http.ProxyURL(pu)
}
hc := &http.Client{
Transport: tr,
Timeout: sendTimeout.GetOptionalArgOrDefault(argIdx, time.Minute),
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authCfg: authCfg,
awsCfg: awsCfg,
fq: fq,
hc: &http.Client{
Transport: tr,
Timeout: sendTimeout.GetOptionalArgOrDefault(argIdx, time.Minute),
},
stopCh: make(chan struct{}),
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
isVMRemoteWrite: isVMRemoteWrite,
authCfg: authCfg,
awsCfg: awsCfg,
fq: fq,
hc: hc,
stopCh: make(chan struct{}),
}
c.sendBlock = c.sendBlockHTTP
return c
@@ -291,7 +294,9 @@ func (c *client) runWorker() {
}
}
// sendBlockHTTP returns false only if c.stopCh is closed.
// sendBlockHTTP sends the given block to c.remoteWriteURL.
//
// The function returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
@@ -305,7 +310,7 @@ func (c *client) sendBlockHTTP(block []byte) bool {
}
again:
req, err := http.NewRequest("POST", c.remoteWriteURL, bytes.NewBuffer(block))
req, err := http.NewRequest(http.MethodPost, c.remoteWriteURL, bytes.NewBuffer(block))
if err != nil {
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", c.sanitizedURL, err)
}
@@ -313,8 +318,13 @@ again:
h := req.Header
h.Set("User-Agent", "vmagent")
h.Set("Content-Type", "application/x-protobuf")
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
if c.isVMRemoteWrite {
h.Set("Content-Encoding", "zstd")
h.Set("X-VictoriaMetrics-Remote-Write-Version", "1")
} else {
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
}
if c.awsCfg != nil {
if err := c.awsCfg.SignRequest(req, sigv4Hash); err != nil {
// there is no need in retry, request will be rejected by client.Do and retried by code below

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -33,9 +34,10 @@ type pendingSeries struct {
periodicFlusherWG sync.WaitGroup
}
func newPendingSeries(pushBlock func(block []byte), significantFigures, roundDigits int) *pendingSeries {
func newPendingSeries(pushBlock func(block []byte), isVMRemoteWrite bool, significantFigures, roundDigits int) *pendingSeries {
var ps pendingSeries
ps.wr.pushBlock = pushBlock
ps.wr.isVMRemoteWrite = isVMRemoteWrite
ps.wr.significantFigures = significantFigures
ps.wr.roundDigits = roundDigits
ps.stopCh = make(chan struct{})
@@ -88,6 +90,9 @@ type writeRequest struct {
// pushBlock is called when whe write request is ready to be sent.
pushBlock func(block []byte)
// Whether to encode the write request with VictoriaMetrics remote write protocol.
isVMRemoteWrite bool
// How many significant figures must be left before sending the writeRequest to pushBlock.
significantFigures int
@@ -104,7 +109,7 @@ type writeRequest struct {
}
func (wr *writeRequest) reset() {
// Do not reset pushBlock, significantFigures and roundDigits, since they are re-used.
// Do not reset lastFlushTime, pushBlock, isVMRemoteWrite, significantFigures and roundDigits, since they are re-used.
wr.wr.Timeseries = nil
@@ -126,7 +131,7 @@ func (wr *writeRequest) flush() {
wr.wr.Timeseries = wr.tss
wr.adjustSampleValues()
atomic.StoreUint64(&wr.lastFlushTime, fasttime.UnixTimestamp())
pushWriteRequest(&wr.wr, wr.pushBlock)
pushWriteRequest(&wr.wr, wr.pushBlock, wr.isVMRemoteWrite)
wr.reset()
}
@@ -188,7 +193,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
wr.buf = buf
}
func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byte)) {
func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byte), isVMRemoteWrite bool) {
if len(wr.Timeseries) == 0 {
// Nothing to push
return
@@ -197,7 +202,11 @@ func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byt
bb.B = prompbmarshal.MarshalWriteRequest(bb.B[:0], wr)
if len(bb.B) <= maxUnpackedBlockSize.IntN() {
zb := snappyBufPool.Get()
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
if isVMRemoteWrite {
zb.B = zstd.CompressLevel(zb.B[:0], bb.B, 0)
} else {
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
}
writeRequestBufPool.Put(bb)
if len(zb.B) <= persistentqueue.MaxBlockSize {
pushBlock(zb.B)
@@ -221,18 +230,18 @@ func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byt
}
n := len(samples) / 2
wr.Timeseries[0].Samples = samples[:n]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries[0].Samples = samples[n:]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries[0].Samples = samples
return
}
timeseries := wr.Timeseries
n := len(timeseries) / 2
wr.Timeseries = timeseries[:n]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries = timeseries[n:]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries = timeseries
}

View File

@@ -2,40 +2,48 @@ package remotewrite
import (
"fmt"
"math"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/golang/snappy"
)
func TestPushWriteRequest(t *testing.T) {
for _, rowsCount := range []int{1, 10, 100, 1e3, 1e4} {
rowsCounts := []int{1, 10, 100, 1e3, 1e4}
expectedBlockLensProm := []int{216, 1848, 16424, 169882, 1757876}
expectedBlockLensVM := []int{138, 492, 3927, 34995, 288476}
for i, rowsCount := range rowsCounts {
expectedBlockLenProm := expectedBlockLensProm[i]
expectedBlockLenVM := expectedBlockLensVM[i]
t.Run(fmt.Sprintf("%d", rowsCount), func(t *testing.T) {
testPushWriteRequest(t, rowsCount)
testPushWriteRequest(t, rowsCount, expectedBlockLenProm, expectedBlockLenVM)
})
}
}
func testPushWriteRequest(t *testing.T, rowsCount int) {
wr := newTestWriteRequest(rowsCount, 10)
pushBlockLen := 0
pushBlock := func(block []byte) {
if pushBlockLen > 0 {
panic(fmt.Errorf("BUG: pushBlock called multiple times; pushBlockLen=%d at first call, len(block)=%d at second call", pushBlockLen, len(block)))
func testPushWriteRequest(t *testing.T, rowsCount, expectedBlockLenProm, expectedBlockLenVM int) {
f := func(isVMRemoteWrite bool, expectedBlockLen int, tolerancePrc float64) {
t.Helper()
wr := newTestWriteRequest(rowsCount, 20)
pushBlockLen := 0
pushBlock := func(block []byte) {
if pushBlockLen > 0 {
panic(fmt.Errorf("BUG: pushBlock called multiple times; pushBlockLen=%d at first call, len(block)=%d at second call", pushBlockLen, len(block)))
}
pushBlockLen = len(block)
}
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
if math.Abs(float64(pushBlockLen-expectedBlockLen)/float64(expectedBlockLen)*100) > tolerancePrc {
t.Fatalf("unexpected block len for rowsCount=%d, isVMRemoteWrite=%v; got %d bytes; expecting %d bytes +- %.0f%%",
rowsCount, isVMRemoteWrite, pushBlockLen, expectedBlockLen, tolerancePrc)
}
pushBlockLen = len(block)
}
pushWriteRequest(wr, pushBlock)
b := prompbmarshal.MarshalWriteRequest(nil, wr)
zb := snappy.Encode(nil, b)
maxPushBlockLen := len(zb)
minPushBlockLen := maxPushBlockLen / 2
if pushBlockLen < minPushBlockLen {
t.Fatalf("unexpected block len after pushWriteRequest; got %d bytes; must be at least %d bytes", pushBlockLen, minPushBlockLen)
}
if pushBlockLen > maxPushBlockLen {
t.Fatalf("unexpected block len after pushWriteRequest; got %d bytes; must be smaller or equal to %d bytes", pushBlockLen, maxPushBlockLen)
}
// Check Prometheus remote write
f(false, expectedBlockLenProm, 0)
// Check VictoriaMetrics remote write
f(true, expectedBlockLenVM, 15)
}
func newTestWriteRequest(seriesCount, labelsCount int) *prompbmarshal.WriteRequest {

View File

@@ -15,11 +15,13 @@ import (
var (
unparsedLabelsGlobal = flagutil.NewArrayString("remoteWrite.label", "Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. "+
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabel_config entries. "+
"The path can point either to local file or to http url. These entries are applied to all the metrics "+
"before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details")
relabelConfigPaths = flagutil.NewArrayString("remoteWrite.urlRelabelConfig", "Optional path to relabel config for the corresponding -remoteWrite.url. "+
"The path can point either to local file or to http url")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabeling configs, which are applied "+
"to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. "+
"The path can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmagent.html#relabeling")
relabelConfigPaths = flagutil.NewArrayString("remoteWrite.urlRelabelConfig", "Optional path to relabel configs for the corresponding -remoteWrite.url. "+
"See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmagent.html#relabeling")
usePromCompatibleNaming = flag.Bool("usePromCompatibleNaming", false, "Whether to replace characters unsupported by Prometheus with underscores "+
"in the ingested metric names and label names. For example, foo.bar{a.b='c'} is transformed into foo_bar{a_b='c'} during data ingestion if this flag is set. "+
@@ -86,7 +88,7 @@ func initLabelsGlobal() {
}
func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, extraLabels []prompbmarshal.Label, pcs *promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries {
if len(extraLabels) == 0 && pcs.Len() == 0 {
if len(extraLabels) == 0 && pcs.Len() == 0 && !*usePromCompatibleNaming {
// Nothing to change.
return tss
}

View File

@@ -0,0 +1,49 @@
package remotewrite
import (
"reflect"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
func TestApplyRelabeling(t *testing.T) {
f := func(extraLabels []prompbmarshal.Label, pcs *promrelabel.ParsedConfigs, sTss, sExpTss string) {
rctx := &relabelCtx{}
tss, expTss := parseSeries(sTss), parseSeries(sExpTss)
gotTss := rctx.applyRelabeling(tss, extraLabels, pcs)
if !reflect.DeepEqual(gotTss, expTss) {
t.Fatalf("expected to have: \n%v;\ngot: \n%v", expTss, gotTss)
}
}
f(nil, nil, "up", "up")
f([]prompbmarshal.Label{{Name: "foo", Value: "bar"}}, nil, "up", `up{foo="bar"}`)
f([]prompbmarshal.Label{{Name: "foo", Value: "bar"}}, nil, `up{foo="baz"}`, `up{foo="bar"}`)
pcs, err := promrelabel.ParseRelabelConfigsData([]byte(`
- target_label: "foo"
replacement: "aaa"
- action: labeldrop
regex: "env.*"
`))
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
f(nil, pcs, `up{foo="baz", env="prod"}`, `up{foo="aaa"}`)
oldVal := *usePromCompatibleNaming
*usePromCompatibleNaming = true
f(nil, nil, `foo.bar`, `foo_bar`)
*usePromCompatibleNaming = oldVal
}
func parseSeries(data string) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
tss = append(tss, prompbmarshal.TimeSeries{
Labels: promutils.MustNewLabelsFromString(data).GetLabels(),
})
return tss
}

View File

@@ -21,18 +21,22 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
)
var (
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support Prometheus remote_write API. "+
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol "+
"or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteMultitenantURLs = flagutil.NewArrayString("remoteWrite.multitenantURL", "Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
forcePromProto = flagutil.NewArrayBool("remoteWrite.forcePromProto", "Whether to force Prometheus remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
@@ -58,6 +62,15 @@ var (
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
maxDailySeries = flag.Int("remoteWrite.maxDailySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
streamAggrConfig = flagutil.NewArrayString("remoteWrite.streamAggr.config", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See also -remoteWrite.streamAggr.keepInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. "+
"By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. "+
"See https://docs.victoriametrics.com/stream-aggregation.html")
streamAggrDedupInterval = flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval", "Input samples are de-duplicated with this interval before being aggregated. "+
"Only the last sample per each time series per each interval is aggregated if the interval is greater than zero")
)
var (
@@ -140,6 +153,7 @@ func Init() {
logger.Fatalf("cannot load relabel configs: %s", err)
}
allRelabelConfigs.Store(rcs)
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
@@ -435,9 +449,13 @@ var (
)
type remoteWriteCtx struct {
idx int
fq *persistentqueue.FastQueue
c *client
idx int
fq *persistentqueue.FastQueue
c *client
sas *streamaggr.Aggregators
streamAggrKeepInput bool
pss []*pendingSeries
pssNextIdx uint64
@@ -460,15 +478,28 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_pending_inmemory_blocks{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetInmemoryQueueLen())
})
// Auto-detect whether the remote storage supports VictoriaMetrics remote write protocol.
isVMRemoteWrite := false
usePromProto := forcePromProto.GetOptionalArg(argIdx)
if !usePromProto {
isVMRemoteWrite = common.HandleVMProtoClientHandshake(remoteWriteURL)
if !isVMRemoteWrite {
logger.Infof("the remote storage at %q doesn't support VictoriaMetrics remote write protocol. Switching to Prometheus remote write protocol. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol", sanitizedURL)
}
}
var c *client
switch remoteWriteURL.Scheme {
case "http", "https":
c = newHTTPClient(argIdx, remoteWriteURL.String(), sanitizedURL, fq, *queues)
c = newHTTPClient(argIdx, remoteWriteURL.String(), sanitizedURL, fq, *queues, isVMRemoteWrite)
default:
logger.Fatalf("unsupported scheme: %s for remoteWriteURL: %s, want `http`, `https`", remoteWriteURL.Scheme, sanitizedURL)
}
c.init(argIdx, *queues, sanitizedURL)
// Initialize pss
sf := significantFigures.GetOptionalArgOrDefault(argIdx, 0)
rd := roundDigits.GetOptionalArgOrDefault(argIdx, 100)
pssLen := *queues
@@ -479,9 +510,10 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
}
pss := make([]*pendingSeries, pssLen)
for i := range pss {
pss[i] = newPendingSeries(fq.MustWriteBlock, sf, rd)
pss[i] = newPendingSeries(fq.MustWriteBlock, isVMRemoteWrite, sf, rd)
}
return &remoteWriteCtx{
rwctx := &remoteWriteCtx{
idx: argIdx,
fq: fq,
c: c,
@@ -490,6 +522,20 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
rowsPushedAfterRelabel: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_rows_pushed_after_relabel_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
rowsDroppedByRelabel: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_relabel_metrics_dropped_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
}
// Initialize sas
sasFile := streamAggrConfig.GetOptionalArg(argIdx)
if sasFile != "" {
dedupInterval := streamAggrDedupInterval.GetOptionalArgOrDefault(argIdx, 0)
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternal, dedupInterval)
if err != nil {
logger.Fatalf("cannot initialize stream aggregators from -remoteWrite.streamAggrFile=%q: %s", sasFile, err)
}
rwctx.sas = sas
rwctx.streamAggrKeepInput = streamAggrKeepInput.GetOptionalArg(argIdx)
}
return rwctx
}
func (rwctx *remoteWriteCtx) MustStop() {
@@ -501,6 +547,8 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.fq.UnblockAllReaders()
rwctx.c.MustStop()
rwctx.c = nil
rwctx.sas.MustStop()
rwctx.sas = nil
rwctx.fq.MustClose()
rwctx.fq = nil
@@ -509,6 +557,7 @@ func (rwctx *remoteWriteCtx) MustStop() {
}
func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
// Apply relabeling
var rctx *relabelCtx
var v *[]prompbmarshal.TimeSeries
rcs := allRelabelConfigs.Load().(*relabelConfigs)
@@ -526,11 +575,17 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
rowsCountAfterRelabel := getRowsCount(tss)
rwctx.rowsDroppedByRelabel.Add(rowsCountBeforeRelabel - rowsCountAfterRelabel)
}
pss := rwctx.pss
idx := atomic.AddUint64(&rwctx.pssNextIdx, 1) % uint64(len(pss))
rowsCount := getRowsCount(tss)
rwctx.rowsPushedAfterRelabel.Add(rowsCount)
pss[idx].Push(tss)
// Apply stream aggregation if any
rwctx.sas.Push(tss)
if rwctx.sas == nil || rwctx.streamAggrKeepInput {
// Push samples to the remote storage
rwctx.pushInternal(tss)
}
// Return back relabeling contexts to the pool
if rctx != nil {
*v = prompbmarshal.ResetTimeSeries(tss)
tssRelabelPool.Put(v)
@@ -538,6 +593,12 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
}
}
func (rwctx *remoteWriteCtx) pushInternal(tss []prompbmarshal.TimeSeries) {
pss := rwctx.pss
idx := atomic.AddUint64(&rwctx.pssNextIdx, 1) % uint64(len(pss))
pss[idx].Push(tss)
}
var tssRelabelPool = &sync.Pool{
New: func() interface{} {
a := []prompbmarshal.TimeSeries{}

View File

@@ -1,7 +1,6 @@
package vmimport
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
@@ -12,8 +11,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,20 +30,9 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
})
}
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return stream.Parse(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -9,6 +9,13 @@ protocol and require `-remoteWrite.url` to be configured.
Vmalert is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
implementation and aims to be compatible with its syntax.
A [single-node](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmalert)
or [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmalert)
of VictoriaMetrics are capable of proxying requests to vmalert via `-vmalert.proxyURL` command-line flag.
Use this feature for the following cases:
* for proxying requests from [Grafana Alerting UI](https://grafana.com/docs/grafana/latest/alerting/);
* for accessing vmalert's UI through VictoriaMetrics Web interface.
## Features
* Integration with [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) TSDB;
@@ -21,7 +28,8 @@ implementation and aims to be compatible with its syntax.
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite);
* Recording and Alerting rules backfilling (aka `replay`). See [these docs](#rules-backfilling);
* Lightweight and without extra dependencies.
* Supports [reusable templates](#reusable-templates) for annotations.
* Supports [reusable templates](#reusable-templates) for annotations;
* Load of recording and alerting rules from local filesystem, GCS and S3.
## Limitations
@@ -69,16 +77,17 @@ Then configure `vmalert` accordingly:
-external.label=replica=a # Multiple external labels may be set
```
Note there's a separate `remoteWrite.url` to allow writing results of
Note there's a separate `-remoteWrite.url` command-line flag to allow writing results of
alerting/recording rules into a different storage than the initial data that's
queried. This allows using `vmalert` to aggregate data from a short-term,
high-frequency, high-cardinality storage into a long-term storage with
decreased cardinality and a bigger interval between samples.
See also [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html).
See the full list of configuration flags in [configuration](#configuration) section.
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
to specify different `-external.label` command-line flags in order to define which `vmalert` generated rules or alerts.
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
@@ -191,6 +200,11 @@ expr: <string>
# Is applicable to alerting rules only.
[ debug: <bool> | default = false ]
# Defines the number of rule's updates entries stored in memory
# and available for view on rule's Details page.
# Overrides `rule.updateEntriesLimit` value for this specific rule.
[ update_entries_limit: <integer> | default 0 ]
# Labels to add or overwrite for each alert.
labels:
[ <labelname>: <tmpl_string> ]
@@ -319,6 +333,12 @@ expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
# Defines the number of rule's updates entries stored in memory
# and available for view on rule's Details page.
# Overrides `rule.updateEntriesLimit` value for this specific rule.
[ update_entries_limit: <integer> | default 0 ]
```
For recording rules to work `-remoteWrite.url` must be specified.
@@ -385,6 +405,25 @@ The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz`
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### Reading rules from object storage
[Enterprise version](https://docs.victoriametrics.com/enterprise.html) of `vmalert` may read alerting and recording rules
from object storage:
- `./bin/vmalert -rule=s3://bucket/dir/alert.rules` would read rules from the given path at S3 bucket
- `./bin/vmalert -rule=gs://bucket/bir/alert.rules` would read rules from the given path at GCS bucket
S3 and GCS paths support only matching by prefix, e.g. `s3://bucket/dir/rule_` matches
all files with prefix `rule_` in the folder `dir`.
The following [command-line flags](#flags) can be used for fine-tuning access to S3 and GCS:
- `-s3.credsFilePath` - path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
- `-s3.configFilePath` - path to file with S3 configs. Configs are loaded from default location if not set.
- `-s3.configProfile` - profile name for S3 configs. If no set, the value of the environment variable will be loaded (`AWS_PROFILE` or `AWS_DEFAULT_PROFILE`).
- `-s3.customEndpoint` - custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set.
- `-s3.forcePathStyle` - prefixing endpoint with bucket name when set false, true by default.
### Topology examples
The following sections are showing how `vmalert` may be used and configured
@@ -469,7 +508,8 @@ Alertmanager will automatically deduplicate alerts with identical labels, so ens
all `vmalert`s are having the same config.
Don't forget to configure [cluster mode](https://prometheus.io/docs/alerting/latest/alertmanager/)
for Alertmanagers for better reliability.
for Alertmanagers for better reliability. List all Alertmanager URLs in vmalert's `-notifier.url`
to ensure [high availability](https://github.com/prometheus/alertmanager#high-availability).
This example uses single-node VM server for the sake of simplicity.
Check how to replace it with [cluster VictoriaMetrics](#cluster-victoriametrics) if needed.
@@ -502,8 +542,8 @@ groups:
expr: avg_over_time(http_requests[5m])
```
Ability of `vmalert` to be configured with different `datasource.url` and `remoteWrite.url` allows
reading data from one data source and backfilling results to another. This helps to build a system
Ability of `vmalert` to be configured with different `-datasource.url` and `-remoteWrite.url` command-line flags
allows reading data from one data source and backfilling results to another. This helps to build a system
for aggregating and downsampling the data.
The following example shows how to build a topology where `vmalert` will process data from one cluster
@@ -527,7 +567,7 @@ Please note, [replay](#rules-backfilling) feature may be used for transforming h
Flags `-remoteRead.url` and `-notifier.url` are omitted since we assume only recording rules are used.
See also [downsampling docs](https://docs.victoriametrics.com/#downsampling).
See also [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) and [downsampling](https://docs.victoriametrics.com/#downsampling).
#### Multiple remote writes
@@ -581,6 +621,8 @@ can read the same rules configuration as normal, evaluate them on the given time
results via remote write to the configured storage. vmalert supports any PromQL/MetricsQL compatible
data source for backfilling.
See a blogpost about [Rules backfilling via vmalert](https://victoriametrics.com/blog/rules-replay/).
### How it works
In `replay` mode vmalert works as a cli-tool and exits immediately after work is done.
@@ -678,38 +720,49 @@ The default list of alerting rules for these metric can be found [here](https://
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
metrics may be analyzed later.
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview.
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950-victoriametrics-vmalert/) for `vmalert` overview.
Graphs on this dashboard contain useful hints - hover the `i` icon in the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard.
## Troubleshooting
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
the data is already present in configured `-datasource.url`:
### Data delay
Data delay is one of the most common issues with rules execution.
vmalert executes configured rules within certain intervals at specifics timestamps.
It expects that the data is already present in configured `-datasource.url` at the moment of time when rule is executed:
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
may get empty response from datasource and produce empty recording rules or reset alerts state:
may get empty response from the datasource and produce empty recording rules or reset alerts state:
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
By default recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
This behavior is controlled by `-search.latencyOffset` command-line flag and the `latency_offset` query ag at `vmselect`.
Usually, this results into a 30s shift for recording rules results.
Note that too small value passed to `-search.latencyOffset` or to `latency_offest` query arg may lead to incomplete query results.
Try the following recommendations to reduce the chance of hitting the data delay issue:
Try the following recommendations in such cases:
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
are delivered to the datasource;
* Always configure group's `evaluationInterval` to be bigger or at least equal to
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution);
* Ensure that `[duration]` value is at least twice bigger than
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`.
If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
command-line flag to add a time shift for evaluations;
* If time intervals between datapoints in datasource are irregular or `>=5min` - try changing vmalert's
`-datasource.queryStep` command-line flag to specify how far search query can lookback for the recent datapoint.
The recommendation is to have the step at least two times bigger than `scrape_interval`, since
there are no guarantees that scrape will not fail.
command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
* If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
in datasource is inconsistent or `>=5min` - try changing vmalert's `-datasource.queryStep` command-line flag to specify
how far search query can lookback for the recent datapoint. The recommendation is to have the step
at least two times bigger than the resolution.
> Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
### Alerts state
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
@@ -721,8 +774,9 @@ If `-remoteWrite.url` command-line flag is configured, vmalert will persist aler
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
changed in time.
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
on `/vmalert/groups` page and check the `Last updates` section:
vmalert stores last `-rule.updateEntriesLimit` (or `update_entries_limit` [per-rule config](https://docs.victoriametrics.com/vmalert.html#alerting-rules))
state updates for each rule. To check updates, click on `Details` link next to rule's name on `/vmalert/groups` page
and check the `Last updates` section:
<img alt="vmalert state" src="vmalert_state.png">
@@ -731,7 +785,9 @@ HTTP request sent by vmalert to the `-datasource.url` during evaluation. If spec
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
moment when rule was evaluated.
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
### Debug mode
vmalert allows configuring more detailed logging for specific alerting rule. Just set `debug: true` in rule's configuration
and vmalert will start printing additional log messages:
```terminal
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
@@ -883,13 +939,21 @@ The shortlist of configuration flags is the following:
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
Address to listen for http connections (default ":8880")
Address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8880")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-insert.maxQueueDuration duration
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -898,6 +962,8 @@ The shortlist of configuration flags is the following:
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
@@ -955,7 +1021,7 @@ The shortlist of configuration flags is the following:
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus alertmanager URL, e.g. http://127.0.0.1:9093
Prometheus Alertmanager URL, e.g. http://127.0.0.1:9093. List all Alertmanager URLs if it runs in the cluster mode to ensure high availability.
Supports an array of values separated by comma or specified via multiple flags.
-pprofAuthKey string
Auth key for /debug/pprof/* endpoints. It must be passed via authKey query arg. It overrides httpAuth.* settings
@@ -992,7 +1058,7 @@ The shortlist of configuration flags is the following:
-remoteRead.headers string
Optional HTTP headers to send with each request to the corresponding -remoteRead.url. For example, -remoteRead.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteRead.url. Multiple headers must be delimited by '^^': -remoteRead.headers='header1:value1^^header2:value2'
-remoteRead.ignoreRestoreErrors
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
Whether to ignore errors from remote storage when restoring alerts state on startup. DEPRECATED - this flag has no effect and will be removed in the next releases. (default true)
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.oauth2.clientID string
@@ -1069,8 +1135,8 @@ The shortlist of configuration flags is the following:
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. For example, if -remoteWrite.url=http://127.0.0.1:8428 is specified, then the alerts state will be written to http://127.0.0.1:8428/api/v1/write . See also -remoteWrite.disablePathAppend, '-remoteWrite.showURL'.
-replay.disableProgressBar
Whether to disable rendering progress bars during the replay. Progress bar rendering might be verbose or break the logs parsing, so it is recommended to be disabled when not used in interactive mode.
-replay.maxDatapointsPerQuery int
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
-replay.maxDatapointsPerQuery /query_range
Max number of data points expected in one request. It affects the max time range for every /query_range request during the replay. The higher the value, the less requests will be made during replay. (default 1000)
-replay.ruleRetryAttempts int
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
-replay.rulesDelay duration
@@ -1080,18 +1146,24 @@ The shortlist of configuration flags is the following:
-replay.timeTo string
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Path to the files with alerting and/or recording rules.
Supports hierarchical patterns and regexpes.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
-rule="dir/*.yaml" -rule="/*.yaml" -rule="gcs://vmalert-rules/tenant_%{TENANT_ID}/prod".
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
Supports an array of values separated by comma or specified via multiple flags.
-rule.configCheckInterval duration
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead
-rule.maxResolveDuration duration
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
-rule.resendDelay duration
Minimum amount of time to wait before resending an alert to notifier
-rule.templates array
@@ -1102,10 +1174,24 @@ The shortlist of configuration flags is the following:
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
absolute path to all .tpl files in root.
Supports an array of values separated by comma or specified via multiple flags.
-rule.updateEntriesLimit int
Defines the max number of rule's state updates stored in-memory. Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param. (default 20)
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
Whether to validate annotation and label templates (default true)
-s3.configFilePath string
Path to file with S3 configs. Configs are loaded from default location if not set.
See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.configProfile string
Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.credsFilePath string
Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.customEndpoint string
Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.forcePathStyle
Prefixing endpoint with bucket name when set false, true by default. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default true)
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
@@ -1185,6 +1271,8 @@ dns_sd_configs:
```
The list of configured or discovered Notifiers can be explored via [UI](#Web).
If Alertmanager runs in cluster mode then all its URLs needs to be available during discovery
to ensure [high availability](https://github.com/prometheus/alertmanager#high-availability).
The configuration file [specification](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/notifier/config.go)
is the following:

View File

@@ -74,10 +74,15 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
Debug: cfg.Debug,
}),
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
metrics: &alertingRuleMetrics{},
}
if cfg.UpdateEntriesLimit != nil {
ar.state = newRuleState(*cfg.UpdateEntriesLimit)
} else {
ar.state = newRuleState(*ruleUpdateEntriesLimit)
}
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
ar.metrics.pending = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_alerts_pending{%s}`, labels),
func() float64 {
@@ -416,7 +421,9 @@ func (ar *AlertingRule) UpdateWith(r Rule) error {
ar.Labels = nr.Labels
ar.Annotations = nr.Annotations
ar.EvalInterval = nr.EvalInterval
ar.Debug = nr.Debug
ar.q = nr.q
ar.state = nr.state
return nil
}
@@ -491,7 +498,9 @@ func (ar *AlertingRule) ToAPI() APIRule {
State: "inactive",
Alerts: ar.AlertsToAPI(),
LastSamples: lastState.samples,
MaxUpdates: ar.state.size(),
Updates: ar.state.getAll(),
Debug: ar.Debug,
// encode as strings to avoid rounding in JSON
ID: fmt.Sprintf("%d", ar.ID()),
@@ -598,54 +607,59 @@ func alertForToTimeSeries(a *notifier.Alert, timestamp int64) prompbmarshal.Time
return newTimeSeries([]float64{float64(a.ActiveAt.Unix())}, []int64{timestamp}, labels)
}
// Restore restores the state of active alerts basing on previously written time series.
// Restore restores only ActiveAt field. Field State will be always Pending and supposed
// to be updated on next Exec, as well as Value field.
// Only rules with For > 0 will be restored.
func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookback time.Duration, labels map[string]string) error {
if q == nil {
return fmt.Errorf("querier is nil")
// Restore restores the value of ActiveAt field for active alerts,
// based on previously written time series `alertForStateMetricName`.
// Only rules with For > 0 can be restored.
func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, ts time.Time, lookback time.Duration) error {
if ar.For < 1 {
return nil
}
ts := time.Now()
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res, err
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
if len(ar.alerts) < 1 {
return nil
}
// account for external labels in filter
var labelsFilter string
for k, v := range labels {
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
}
expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])",
alertForStateMetricName, ar.Name, labelsFilter, int(lookback.Seconds()))
qMetrics, _, err := q.Query(ctx, expr, ts)
if err != nil {
return err
}
for _, m := range qMetrics {
ls := &labelSet{
origin: make(map[string]string, len(m.Labels)),
processed: make(map[string]string, len(m.Labels)),
for _, a := range ar.alerts {
if a.Restored || a.State != notifier.StatePending {
continue
}
for _, l := range m.Labels {
if l.Name == "__name__" {
continue
}
ls.origin[l.Name] = l.Value
ls.processed[l.Name] = l.Value
var labelsFilter []string
for k, v := range a.Labels {
labelsFilter = append(labelsFilter, fmt.Sprintf("%s=%q", k, v))
}
a, err := ar.newAlert(m, ls, time.Unix(int64(m.Values[0]), 0), qFn)
sort.Strings(labelsFilter)
expr := fmt.Sprintf("last_over_time(%s{%s}[%ds])",
alertForStateMetricName, strings.Join(labelsFilter, ","), int(lookback.Seconds()))
ar.logDebugf(ts, nil, "restoring alert state via query %q", expr)
qMetrics, _, err := q.Query(ctx, expr, ts)
if err != nil {
return fmt.Errorf("failed to create alert: %w", err)
return err
}
a.ID = hash(ls.processed)
a.State = notifier.StatePending
if len(qMetrics) < 1 {
ar.logDebugf(ts, nil, "no response was received from restore query")
continue
}
// only one series expected in response
m := qMetrics[0]
// __name__ supposed to be alertForStateMetricName
m.DelLabel("__name__")
// we assume that restore query contains all label matchers,
// so all received labels will match anyway if their number is equal.
if len(m.Labels) != len(a.Labels) {
ar.logDebugf(ts, nil, "state restore query returned not expected label-set %v", m.Labels)
continue
}
a.ActiveAt = time.Unix(int64(m.Values[0]), 0)
a.Restored = true
ar.alerts[a.ID] = a
logger.Infof("alert %q (%d) restored to state at %v", a.Name, a.ID, a.ActiveAt)
}
return nil

View File

@@ -6,12 +6,15 @@ import (
"reflect"
"sort"
"strings"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
func TestAlertingRule_ToTimeSeries(t *testing.T) {
@@ -502,118 +505,156 @@ func TestAlertingRule_ExecRange(t *testing.T) {
}
}
func TestAlertingRule_Restore(t *testing.T) {
testCases := []struct {
rule *AlertingRule
metrics []datasource.Metric
expAlerts map[uint64]*notifier.Alert
}{
{
newTestRuleWithLabels("no extra labels"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
),
},
map[uint64]*notifier.Alert{
hash(nil): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("metric labels"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
alertNameLabel, "metric labels",
alertGroupNameLabel, "groupID",
"foo", "bar",
"namespace", "baz",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{
alertNameLabel: "metric labels",
alertGroupNameLabel: "groupID",
"foo": "bar",
"namespace": "baz",
}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("rule labels", "source", "vm"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
"foo", "bar",
"namespace", "baz",
// extra labels set by rule
"source", "vm",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{
"foo": "bar",
"namespace": "baz",
"source": "vm",
}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("multiple alerts"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-1",
),
metricWithValueAndLabels(t, float64(time.Now().Truncate(2*time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-2",
),
metricWithValueAndLabels(t, float64(time.Now().Truncate(3*time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-3",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{"host": "localhost-1"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
hash(map[string]string{"host": "localhost-2"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(2 * time.Hour)},
hash(map[string]string{"host": "localhost-3"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(3 * time.Hour)},
},
},
func TestGroup_Restore(t *testing.T) {
defaultTS := time.Now()
fqr := &fakeQuerierWithRegistry{}
fn := func(rules []config.Rule, expAlerts map[uint64]*notifier.Alert) {
t.Helper()
defer fqr.reset()
for _, r := range rules {
fqr.set(r.Expr, metricWithValueAndLabels(t, 0, "__name__", r.Alert))
}
fg := newGroup(config.Group{Name: "TestRestore", Rules: rules}, fqr, time.Second, nil)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
nts := func() []notifier.Notifier { return []notifier.Notifier{&fakeNotifier{}} }
fg.start(context.Background(), nts, nil, fqr)
wg.Done()
}()
fg.close()
wg.Wait()
gotAlerts := make(map[uint64]*notifier.Alert)
for _, rs := range fg.Rules {
alerts := rs.(*AlertingRule).alerts
for k, v := range alerts {
if !v.Restored {
// set not restored alerts to predictable timestamp
v.ActiveAt = defaultTS
}
gotAlerts[k] = v
}
}
if len(gotAlerts) != len(expAlerts) {
t.Fatalf("expected %d alerts; got %d", len(expAlerts), len(gotAlerts))
}
for key, exp := range expAlerts {
got, ok := gotAlerts[key]
if !ok {
t.Fatalf("expected to have key %d", key)
}
if got.State != notifier.StatePending {
t.Fatalf("expected state %d; got %d", notifier.StatePending, got.State)
}
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
}
}
fakeGroup := Group{Name: "TestRule_Exec"}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if err := tc.rule.Restore(context.TODO(), fq, time.Hour, nil); err != nil {
t.Fatalf("unexpected err: %s", err)
}
if len(tc.rule.alerts) != len(tc.expAlerts) {
t.Fatalf("expected %d alerts; got %d", len(tc.expAlerts), len(tc.rule.alerts))
}
for key, exp := range tc.expAlerts {
got, ok := tc.rule.alerts[key]
if !ok {
t.Fatalf("expected to have key %d", key)
}
if got.State != exp.State {
t.Fatalf("expected state %d; got %d", exp.State, got.State)
}
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
}
stateMetric := func(name string, value time.Time, labels ...string) datasource.Metric {
labels = append(labels, "__name__", alertForStateMetricName)
labels = append(labels, alertNameLabel, name)
labels = append(labels, alertGroupNameLabel, "TestRestore")
return metricWithValueAndLabels(t, float64(value.Unix()), labels...)
}
// one active alert, no previous state
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
})
fqr.reset()
// one active alert with state restore
ts := time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// two rules, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("foo", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
{Alert: "bar", Expr: "bar", For: promutils.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// two rules, two active alerts, two with state restored
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("bar", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
{Alert: "bar", Expr: "bar", For: promutils.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// one active alert but wrong state restore
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertname="bar",alertgroup="TestRestore"}[3600s])`,
stateMetric("wrong alert", ts))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
})
// one active alert with labels
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
ActiveAt: ts,
},
})
// one active alert with restore labels missmatch
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev", "team", "foo"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
ActiveAt: defaultTS,
},
})
}
}
func TestAlertingRule_Exec_Negative(t *testing.T) {
@@ -709,7 +750,6 @@ func TestAlertingRule_Template(t *testing.T) {
"summary": `{{ $labels.alertname }}: Too high connection number for "{{ $labels.instance }}"`,
},
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
},
[]datasource.Metric{
metricWithValueAndLabels(t, 1, "instance", "foo"),
@@ -749,7 +789,6 @@ func TestAlertingRule_Template(t *testing.T) {
"description": `{{ $labels.alertname}}: It is {{ $value }} connections for "{{ $labels.instance }}"`,
},
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
},
[]datasource.Metric{
metricWithValueAndLabels(t, 2, "__name__", "first", "instance", "foo", alertNameLabel, "override"),
@@ -789,7 +828,6 @@ func TestAlertingRule_Template(t *testing.T) {
"summary": `Alert "{{ $labels.alertname }}({{ $labels.alertgroup }})" for instance {{ $labels.instance }}`,
},
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
},
[]datasource.Metric{
metricWithValueAndLabels(t, 1,
@@ -820,6 +858,7 @@ func TestAlertingRule_Template(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
tc.rule.state = newRuleState(10)
fq.add(tc.metrics...)
if _, err := tc.rule.Exec(context.TODO(), time.Now(), 0); err != nil {
t.Fatalf("unexpected err: %s", err)
@@ -936,6 +975,6 @@ func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
For: waitFor,
EvalInterval: waitFor,
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
state: newRuleState(10),
}
}

View File

@@ -5,8 +5,6 @@ import (
"fmt"
"hash/fnv"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
@@ -114,6 +112,9 @@ type Rule struct {
Labels map[string]string `yaml:"labels,omitempty"`
Annotations map[string]string `yaml:"annotations,omitempty"`
Debug bool `yaml:"debug,omitempty"`
// UpdateEntriesLimit defines max number of rule's state updates stored in memory.
// Overrides `-rule.updateEntriesLimit`.
UpdateEntriesLimit *int `yaml:"update_entries_limit,omitempty"`
// Catches all undefined fields and must be empty after parsing.
XXX map[string]interface{} `yaml:",inline"`
@@ -200,19 +201,15 @@ type ValidateTplFn func(annotations map[string]string) error
// Parse parses rule configs from given file patterns
func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
var fp []string
for _, pattern := range pathPatterns {
matches, err := filepath.Glob(pattern)
if err != nil {
return nil, fmt.Errorf("error reading file pattern %s: %w", pattern, err)
}
fp = append(fp, matches...)
files, err := readFromFS(pathPatterns)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
}
errGroup := new(utils.ErrGroup)
var groups []Group
for _, file := range fp {
for file, data := range files {
uniqueGroups := map[string]struct{}{}
gr, err := parseFile(file)
gr, err := parseConfig(data)
if err != nil {
errGroup.Add(fmt.Errorf("failed to parse file %q: %w", file, err))
continue
@@ -240,14 +237,10 @@ func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressio
return groups, nil
}
func parseFile(path string) ([]Group, error) {
data, err := os.ReadFile(path)
func parseConfig(data []byte) ([]Group, error) {
data, err := envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("error reading alert rule file %q: %w", path, err)
}
data, err = envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("cannot expand environment vars in %q: %w", path, err)
return nil, fmt.Errorf("cannot expand environment vars: %w", err)
}
g := struct {
Groups []Group `yaml:"groups"`

View File

@@ -550,6 +550,20 @@ rules:
- alert: foo
expr: sum by(job) (up == 1)
debug: true
`)
})
t.Run("`update_entries_limit` change", func(t *testing.T) {
f(t, `
name: TestGroup
rules:
- alert: foo
expr: sum by(job) (up == 1)
`, `
name: TestGroup
rules:
- alert: foo
expr: sum by(job) (up == 1)
update_entries_limit: 33
`)
})
}

89
app/vmalert/config/fs.go Normal file
View File

@@ -0,0 +1,89 @@
package config
import (
"fmt"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config/fslocal"
"strings"
"sync"
)
// FS represent a file system abstract for reading files.
type FS interface {
// Init initializes FS.
Init() error
// String must return human-readable representation of FS.
String() string
// Read returns a list of read files in form of a map
// where key is a file name and value is a content of read file.
// Read must be called only after the successful Init call.
Read() (map[string][]byte, error)
}
var (
fsRegistryMu sync.Mutex
fsRegistry = make(map[string]FS)
)
// readFromFS parses the given path list and inits FS for each item.
// Once inited, readFromFS will try to read and return files from each FS.
// readFromFS returns an error if at least one FS failed to init.
// The function can be called multiple times but each unique path
// will be inited only once.
//
// It is allowed to mix different FS types in path list.
func readFromFS(paths []string) (map[string][]byte, error) {
var err error
result := make(map[string][]byte)
for _, path := range paths {
fsRegistryMu.Lock()
fs, ok := fsRegistry[path]
if !ok {
fs, err = newFS(path)
if err != nil {
fsRegistryMu.Unlock()
return nil, fmt.Errorf("error while parsing path %q: %w", path, err)
}
if err := fs.Init(); err != nil {
fsRegistryMu.Unlock()
return nil, fmt.Errorf("error while initializing path %q: %w", path, err)
}
fsRegistry[path] = fs
}
fsRegistryMu.Unlock()
files, err := fs.Read()
if err != nil {
return nil, fmt.Errorf("error while reading files from %q: %w", fs, err)
}
for k, v := range files {
if _, ok := result[k]; ok {
return nil, fmt.Errorf("duplicate found for file name %q: file names must be unique", k)
}
result[k] = v
}
}
return result, nil
}
// newFS creates FS based on the give path.
// Supported file systems are: fs
func newFS(path string) (FS, error) {
scheme := "fs"
n := strings.Index(path, "://")
if n >= 0 {
scheme = path[:n]
path = path[n+len("://"):]
}
if len(path) == 0 {
return nil, fmt.Errorf("path cannot be empty")
}
switch scheme {
case "fs":
return &fslocal.FS{Pattern: path}, nil
default:
return nil, fmt.Errorf("unsupported scheme %q", scheme)
}
}

View File

@@ -0,0 +1,39 @@
package config
import (
"strings"
"testing"
)
func TestNewFS(t *testing.T) {
f := func(path, expStr string) {
t.Helper()
fs, err := newFS(path)
if err != nil {
t.Fatalf("unexpected err: %s", err)
}
if fs.String() != expStr {
t.Fatalf("expected FS %q; got %q", expStr, fs.String())
}
}
f("/foo/bar", "Local FS{MatchPattern: \"/foo/bar\"}")
f("fs:///foo/bar", "Local FS{MatchPattern: \"/foo/bar\"}")
}
func TestNewFSNegative(t *testing.T) {
f := func(path, expErr string) {
t.Helper()
_, err := newFS(path)
if err == nil {
t.Fatalf("expected to have err: %s", expErr)
}
if !strings.Contains(err.Error(), expErr) {
t.Fatalf("expected to have err %q; got %q instead", expErr, err)
}
}
f("", "path cannot be empty")
f("fs://", "path cannot be empty")
f("foobar://baz", `unsupported scheme "foobar"`)
}

View File

@@ -0,0 +1,44 @@
package fslocal
import (
"fmt"
"os"
"path/filepath"
)
// FS represents a local file system
type FS struct {
// Pattern is used for matching one or multiple files.
// The pattern may describe hierarchical names such as
// /usr/*/bin/ed (assuming the Separator is '/').
Pattern string
}
// Init verifies that configured Pattern is correct
func (fs *FS) Init() error {
_, err := filepath.Glob(fs.Pattern)
return err
}
// String implements Stringer interface
func (fs *FS) String() string {
return fmt.Sprintf("Local FS{MatchPattern: %q}", fs.Pattern)
}
// Read returns a map of read files where
// key is the file name and value is file's content.
func (fs *FS) Read() (map[string][]byte, error) {
matches, err := filepath.Glob(fs.Pattern)
if err != nil {
return nil, fmt.Errorf("error while matching files via pattern %s: %w", fs.Pattern, err)
}
result := make(map[string][]byte)
for _, path := range matches {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("error while reading file %q: %w", path, err)
}
result[path] = data
}
return result, nil
}

View File

@@ -12,6 +12,7 @@ groups:
expr: vm_tcplistener_conns > 0
for: 3m
debug: true
update_entries_limit: 40
annotations:
labels: "Available labels: {{ $labels }}"
summary: Too high connection number for {{ $labels.instance }}
@@ -20,6 +21,7 @@ groups:
{{ end }}
description: "It is {{ $value }} connections for {{$labels.instance}}"
- alert: ExampleAlertAlwaysFiring
update_entries_limit: -1
expr: sum by(job)
(up == 1)
labels:

View File

@@ -7,6 +7,7 @@ groups:
- alert: Conns
expr: filterSeries(sumSeries(host.receiver.interface.cons),'last','>', 500)
for: 3m
annotations:
summary: Too high connection number for {{$labels.instance}}
description: "It is {{ $value }} connections for {{$labels.instance}}"

View File

@@ -72,6 +72,15 @@ func (m *Metric) AddLabel(key, value string) {
m.Labels = append(m.Labels, Label{Name: key, Value: value})
}
// DelLabel deletes the given label from the label set
func (m *Metric) DelLabel(key string) {
for i, l := range m.Labels {
if l.Name == key {
m.Labels = append(m.Labels[:i], m.Labels[i+1:]...)
}
}
}
// Label returns the given label value.
// If label is missing empty string will be returned
func (m *Metric) Label(key string) string {

View File

@@ -174,7 +174,7 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
}
func (s *VMStorage) newRequestPOST() (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
req, err := http.NewRequest(http.MethodPost, s.datasourceURL, nil)
if err != nil {
return nil, err
}

View File

@@ -158,23 +158,23 @@ func (g *Group) ID() uint64 {
}
// Restore restores alerts state for group rules
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, lookback time.Duration, labels map[string]string) error {
labels = mergeLabels(g.Name, "", labels, g.Labels)
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, ts time.Time, lookback time.Duration) error {
for _, rule := range g.Rules {
rr, ok := rule.(*AlertingRule)
ar, ok := rule.(*AlertingRule)
if !ok {
continue
}
if rr.For < 1 {
if ar.For < 1 {
continue
}
// ignore QueryParams on purpose, because they could contain
// query filters. This may affect the restore procedure.
q := qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: g.Type.String(),
Headers: g.Headers,
DataSourceType: g.Type.String(),
EvaluationInterval: g.Interval,
QueryParams: g.Params,
Headers: g.Headers,
Debug: ar.Debug,
})
if err := rr.Restore(ctx, q, lookback, labels); err != nil {
if err := ar.Restore(ctx, q, ts, lookback); err != nil {
return fmt.Errorf("error while restoring rule %q: %w", rule, err)
}
}
@@ -251,7 +251,7 @@ func (g *Group) close() {
var skipRandSleepOnGroupStart bool
func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *remotewrite.Client) {
func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *remotewrite.Client, rr datasource.QuerierBuilder) {
defer func() { close(g.finishedCh) }()
e := &executor{
@@ -259,26 +259,6 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
notifiers: nts,
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label)}
// Spread group rules evaluation over time in order to reduce load on VictoriaMetrics.
if !skipRandSleepOnGroupStart {
randSleep := uint64(float64(g.Interval) * (float64(g.ID()) / (1 << 64)))
sleepOffset := uint64(time.Now().UnixNano()) % uint64(g.Interval)
if randSleep < sleepOffset {
randSleep += uint64(g.Interval)
}
randSleep -= sleepOffset
sleepTimer := time.NewTimer(time.Duration(randSleep))
select {
case <-ctx.Done():
sleepTimer.Stop()
return
case <-g.doneCh:
sleepTimer.Stop()
return
case <-sleepTimer.C:
}
}
evalTS := time.Now()
logger.Infof("group %q started; interval=%v; concurrency=%d", g.Name, g.Interval, g.Concurrency)
@@ -309,6 +289,16 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
t := time.NewTicker(g.Interval)
defer t.Stop()
// restore the rules state after the first evaluation
// so only active alerts can be restored.
if rr != nil {
err := g.Restore(ctx, rr, evalTS, *remoteReadLookBack)
if err != nil {
logger.Errorf("error while restoring ruleState for group %q: %s", g.Name, err)
}
}
for {
select {
case <-ctx.Done():

View File

@@ -209,7 +209,7 @@ func TestGroupStart(t *testing.T) {
fs.add(m1)
fs.add(m2)
go func() {
g.start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil)
g.start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil, fs)
close(finished)
}()
@@ -460,7 +460,7 @@ func TestFaultyRW(t *testing.T) {
r := &RecordingRule{
Name: "test",
state: newRuleState(),
state: newRuleState(10),
q: fq,
}

View File

@@ -61,6 +61,49 @@ func (fq *fakeQuerier) Query(_ context.Context, _ string, _ time.Time) ([]dataso
return cp, req, nil
}
type fakeQuerierWithRegistry struct {
sync.Mutex
registry map[string][]datasource.Metric
}
func (fqr *fakeQuerierWithRegistry) set(key string, metrics ...datasource.Metric) {
fqr.Lock()
if fqr.registry == nil {
fqr.registry = make(map[string][]datasource.Metric)
}
fqr.registry[key] = metrics
fqr.Unlock()
}
func (fqr *fakeQuerierWithRegistry) reset() {
fqr.Lock()
fqr.registry = nil
fqr.Unlock()
}
func (fqr *fakeQuerierWithRegistry) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fqr
}
func (fqr *fakeQuerierWithRegistry) QueryRange(ctx context.Context, q string, _, _ time.Time) ([]datasource.Metric, error) {
req, _, err := fqr.Query(ctx, q, time.Now())
return req, err
}
func (fqr *fakeQuerierWithRegistry) Query(_ context.Context, expr string, _ time.Time) ([]datasource.Metric, *http.Request, error) {
fqr.Lock()
defer fqr.Unlock()
req, _ := http.NewRequest(http.MethodPost, "foo.com", nil)
metrics, ok := fqr.registry[expr]
if !ok {
return nil, req, nil
}
cp := make([]datasource.Metric, len(metrics))
copy(cp, metrics)
return cp, req, nil
}
type fakeNotifier struct {
sync.Mutex
alerts []notifier.Alert

View File

@@ -28,13 +28,19 @@ import (
)
var (
rulePath = flagutil.NewArrayString("rule", `Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
rulePath = flagutil.NewArrayString("rule", `Path to the files with alerting and/or recording rules.
Supports hierarchical patterns and regexpes.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
-rule="dir/*.yaml" -rule="/*.yaml" -rule="gcs://vmalert-rules/tenant_%{TENANT_ID}/prod".
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
`)
ruleTemplatesPath = flagutil.NewArrayString("rule.templates", `Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
@@ -49,14 +55,18 @@ absolute path to all .tpl files in root.`)
configCheckInterval = flag.Duration("configCheckInterval", 0, "Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
validateTemplates = flag.Bool("rule.validateTemplates", true, "Whether to validate annotation and label templates")
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
"which is by default equal to 3 evaluation intervals of the parent group.")
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
"which by default is 4 times evaluationInterval of the parent group.")
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param.")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager `+
@@ -69,7 +79,7 @@ absolute path to all .tpl files in root.`)
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup. DEPRECATED - this flag has no effect and will be removed in the next releases.")
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
@@ -90,6 +100,10 @@ func main() {
logger.Init()
pushmetrics.Init()
if !*remoteReadIgnoreRestoreErrors {
logger.Warnf("flag `remoteRead.ignoreRestoreErrors` is deprecated and will be removed in next releases.")
}
err := templates.Load(*ruleTemplatesPath, true)
if err != nil {
logger.Fatalf("failed to parse %q: %s", *ruleTemplatesPath, err)
@@ -168,7 +182,7 @@ func main() {
go configReload(ctx, manager, groupsCfg, sighupCh)
rh := &requestHandler{m: manager}
go httpserver.Serve(*httpListenAddr, rh.handler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, rh.handler)
sig := procutil.WaitForSigterm()
logger.Infof("service received signal %s", sig)

View File

@@ -6,6 +6,7 @@ import (
"net/url"
"sort"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
@@ -82,24 +83,38 @@ func (m *manager) close() {
m.wg.Wait()
}
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) error {
if restore && m.rr != nil {
err := group.Restore(ctx, m.rr, *remoteReadLookBack, m.labels)
if err != nil {
if !*remoteReadIgnoreRestoreErrors {
return fmt.Errorf("failed to restore ruleState for group %q: %w", group.Name, err)
}
logger.Errorf("error while restoring ruleState for group %q: %s", group.Name, err)
}
}
func (m *manager) startGroup(ctx context.Context, g *Group, restore bool) error {
m.wg.Add(1)
id := group.ID()
id := g.ID()
go func() {
group.start(ctx, m.notifiers, m.rw)
// Spread group rules evaluation over time in order to reduce load on VictoriaMetrics.
if !skipRandSleepOnGroupStart {
randSleep := uint64(float64(g.Interval) * (float64(g.ID()) / (1 << 64)))
sleepOffset := uint64(time.Now().UnixNano()) % uint64(g.Interval)
if randSleep < sleepOffset {
randSleep += uint64(g.Interval)
}
randSleep -= sleepOffset
sleepTimer := time.NewTimer(time.Duration(randSleep))
select {
case <-ctx.Done():
sleepTimer.Stop()
return
case <-g.doneCh:
sleepTimer.Stop()
return
case <-sleepTimer.C:
}
}
if restore {
g.start(ctx, m.notifiers, m.rw, m.rr)
} else {
g.start(ctx, m.notifiers, m.rw, nil)
}
m.wg.Done()
}()
m.groups[id] = group
m.groups[id] = g
return nil
}

View File

@@ -64,17 +64,18 @@ func TestManagerUpdateConcurrent(t *testing.T) {
wg := sync.WaitGroup{}
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
go func(n int) {
defer wg.Done()
r := rand.New(rand.NewSource(int64(n)))
for i := 0; i < iterations; i++ {
rnd := rand.Intn(len(paths))
rnd := r.Intn(len(paths))
cfg, err := config.Parse([]string{paths[rnd]}, notifier.ValidateTemplates, true)
if err != nil { // update can fail and this is expected
continue
}
_ = m.update(context.Background(), cfg, false)
}
}()
}(i)
}
wg.Wait()
}

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -41,7 +41,7 @@ type Alert struct {
LastSent time.Time
// Value stores the value returned from evaluating expression from Expr field
Value float64
// ID is the unique identifer for the Alert
// ID is the unique identifier for the Alert
ID uint64
// Restored is true if Alert was restored after restart
Restored bool

View File

@@ -64,7 +64,7 @@ func (am *AlertManager) send(ctx context.Context, alerts []Alert) error {
b := &bytes.Buffer{}
writeamRequest(b, alerts, am.argFunc, am.relabelConfigs)
req, err := http.NewRequest("POST", am.addr, b)
req, err := http.NewRequest(http.MethodPost, am.addr, b)
if err != nil {
return err
}

View File

@@ -29,7 +29,7 @@ type Config struct {
// ConsulSDConfigs contains list of settings for service discovery via Consul
// see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
ConsulSDConfigs []consul.SDConfig `yaml:"consul_sd_configs,omitempty"`
// DNSSDConfigs ontains list of settings for service discovery via DNS.
// DNSSDConfigs contains list of settings for service discovery via DNS.
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
DNSSDConfigs []dns.SDConfig `yaml:"dns_sd_configs,omitempty"`

View File

@@ -162,14 +162,15 @@ consul_sd_configs:
wg := sync.WaitGroup{}
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
go func(n int) {
defer wg.Done()
r := rand.New(rand.NewSource(int64(n)))
for i := 0; i < iterations; i++ {
rnd := rand.Intn(len(paths))
rnd := r.Intn(len(paths))
_ = cw.reload(paths[rnd]) // update can fail and this is expected
_ = cw.notifiers()
}
}()
}(i)
}
wg.Wait()
}

View File

@@ -17,7 +17,8 @@ var (
configPath = flag.String("notifier.config", "", "Path to configuration file for notifiers")
suppressDuplicateTargetErrors = flag.Bool("notifier.suppressDuplicateTargetErrors", false, "Whether to suppress 'duplicate target' errors during discovery")
addrs = flagutil.NewArrayString("notifier.url", "Prometheus alertmanager URL, e.g. http://127.0.0.1:9093")
addrs = flagutil.NewArrayString("notifier.url", "Prometheus Alertmanager URL, e.g. http://127.0.0.1:9093. "+
"List all Alertmanager URLs if it runs in the cluster mode to ensure high availability.")
basicAuthUsername = flagutil.NewArrayString("notifier.basicAuth.username", "Optional basic auth username for -notifier.url")
basicAuthPassword = flagutil.NewArrayString("notifier.basicAuth.password", "Optional basic auth password for -notifier.url")

View File

@@ -58,7 +58,6 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
Labels: cfg.Labels,
GroupID: group.ID(),
metrics: &recordingRuleMetrics{},
state: newRuleState(),
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: group.Type.String(),
EvaluationInterval: group.Interval,
@@ -67,6 +66,12 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
}),
}
if cfg.UpdateEntriesLimit != nil {
rr.state = newRuleState(*cfg.UpdateEntriesLimit)
} else {
rr.state = newRuleState(*ruleUpdateEntriesLimit)
}
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
rr.metrics.errors = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
func() float64 {
@@ -212,6 +217,7 @@ func (rr *RecordingRule) ToAPI() APIRule {
EvaluationTime: lastState.duration.Seconds(),
Health: "ok",
LastSamples: lastState.samples,
MaxUpdates: rr.state.size(),
Updates: rr.state.getAll(),
// encode as strings to avoid rounding

View File

@@ -19,7 +19,7 @@ func TestRecordingRule_Exec(t *testing.T) {
expTS []prompbmarshal.TimeSeries
}{
{
&RecordingRule{Name: "foo", state: newRuleState()},
&RecordingRule{Name: "foo"},
[]datasource.Metric{metricWithValueAndLabels(t, 10,
"__name__", "bar",
)},
@@ -30,7 +30,7 @@ func TestRecordingRule_Exec(t *testing.T) {
},
},
{
&RecordingRule{Name: "foobarbaz", state: newRuleState()},
&RecordingRule{Name: "foobarbaz"},
[]datasource.Metric{
metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "foo"),
metricWithValueAndLabels(t, 2, "__name__", "bar", "job", "bar"),
@@ -53,8 +53,7 @@ func TestRecordingRule_Exec(t *testing.T) {
},
{
&RecordingRule{
Name: "job:foo",
state: newRuleState(),
Name: "job:foo",
Labels: map[string]string{
"source": "test",
}},
@@ -80,6 +79,7 @@ func TestRecordingRule_Exec(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tc.rule.q = fq
tc.rule.state = newRuleState(10)
tss, err := tc.rule.Exec(context.TODO(), time.Now(), 0)
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
@@ -198,7 +198,7 @@ func TestRecordingRuleLimit(t *testing.T) {
metricWithValuesAndLabels(t, []float64{2, 3}, "__name__", "bar", "job", "bar"),
metricWithValuesAndLabels(t, []float64{4, 5, 6}, "__name__", "baz", "job", "baz"),
}
rule := &RecordingRule{Name: "job:foo", state: newRuleState(), Labels: map[string]string{
rule := &RecordingRule{Name: "job:foo", state: newRuleState(10), Labels: map[string]string{
"source": "test_limit",
}}
var err error
@@ -216,7 +216,7 @@ func TestRecordingRuleLimit(t *testing.T) {
func TestRecordingRule_ExecNegative(t *testing.T) {
rr := &RecordingRule{
Name: "job:foo",
state: newRuleState(),
state: newRuleState(10),
Labels: map[string]string{
"job": "test",
},

View File

@@ -225,7 +225,7 @@ func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
func (c *Client) send(ctx context.Context, data []byte) error {
r := bytes.NewReader(data)
req, err := http.NewRequest("POST", c.addr, r)
req, err := http.NewRequest(http.MethodPost, c.addr, r)
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}

View File

@@ -27,12 +27,13 @@ func TestClient_Push(t *testing.T) {
if err != nil {
t.Fatalf("failed to create client: %s", err)
}
r := rand.New(rand.NewSource(1))
const rowsN = 1e4
var sent int
for i := 0; i < rowsN; i++ {
s := prompbmarshal.TimeSeries{
Samples: []prompbmarshal.Sample{{
Value: rand.Float64(),
Value: r.Float64(),
Timestamp: time.Now().Unix(),
}},
}

View File

@@ -26,7 +26,7 @@ var (
"and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule."+
"Keep it equal or bigger than -remoteWrite.flushInterval.")
replayMaxDatapoints = flag.Int("replay.maxDatapointsPerQuery", 1e3,
"Max number of data points expected in one request. The higher the value, the less requests will be made during replay.")
"Max number of data points expected in one request. It affects the max time range for every `/query_range` request during the replay. The higher the value, the less requests will be made during replay.")
replayRuleRetryAttempts = flag.Int("replay.ruleRetryAttempts", 5,
"Defines how many retries to make before giving up on rule if request for it returns an error.")
disableProgressBar = flag.Bool("replay.disableProgressBar", false, "Whether to disable rendering progress bars during the replay. "+

View File

@@ -57,11 +57,12 @@ type ruleStateEntry struct {
curl string
}
const defaultStateEntriesLimit = 20
func newRuleState() *ruleState {
func newRuleState(size int) *ruleState {
if size < 1 {
size = 1
}
return &ruleState{
entries: make([]ruleStateEntry, defaultStateEntriesLimit),
entries: make([]ruleStateEntry, size),
}
}
@@ -71,6 +72,12 @@ func (s *ruleState) getLast() ruleStateEntry {
return s.entries[s.cur]
}
func (s *ruleState) size() int {
s.RLock()
defer s.RUnlock()
return len(s.entries)
}
func (s *ruleState) getAll() []ruleStateEntry {
entries := make([]ruleStateEntry, 0)

View File

@@ -6,8 +6,27 @@ import (
"time"
)
func TestRule_stateDisabled(t *testing.T) {
state := newRuleState(-1)
e := state.getLast()
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
}
state.add(ruleStateEntry{at: time.Now()})
state.add(ruleStateEntry{at: time.Now()})
state.add(ruleStateEntry{at: time.Now()})
if len(state.getAll()) != 1 {
// state should store at least one update at any circumstances
t.Fatalf("expected for state to have %d entries; got %d",
1, len(state.getAll()),
)
}
}
func TestRule_state(t *testing.T) {
state := newRuleState()
stateEntriesN := 20
state := newRuleState(stateEntriesN)
e := state.getLast()
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
@@ -39,7 +58,7 @@ func TestRule_state(t *testing.T) {
}
var last time.Time
for i := 0; i < defaultStateEntriesLimit*2; i++ {
for i := 0; i < stateEntriesN*2; i++ {
last = time.Now()
state.add(ruleStateEntry{at: last})
}
@@ -50,9 +69,9 @@ func TestRule_state(t *testing.T) {
e.at, last)
}
if len(state.getAll()) != defaultStateEntriesLimit {
if len(state.getAll()) != stateEntriesN {
t.Fatalf("expected for state to have %d entries only; got %d",
defaultStateEntriesLimit, len(state.getAll()),
stateEntriesN, len(state.getAll()),
)
}
}
@@ -61,7 +80,7 @@ func TestRule_state(t *testing.T) {
// execution of state updates.
// Should be executed with -race flag
func TestRule_stateConcurrent(t *testing.T) {
state := newRuleState()
state := newRuleState(20)
const workers = 50
const iterations = 100

View File

@@ -27,11 +27,11 @@ import (
"strconv"
"strings"
"sync"
textTpl "text/template"
"time"
textTpl "text/template"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/formatutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
@@ -225,7 +225,7 @@ func templateFuncs() textTpl.FuncMap {
"toLower": strings.ToLower,
// crlfEscape replaces '\n' and '\r' chars with `\\n` and `\\r`.
// This funcion is deprectated.
// This function is deprecated.
//
// It is better to use quotesEscape, jsonEscape, queryEscape or pathEscape instead -
// these functions properly escape `\n` and `\r` chars according to their purpose.
@@ -350,15 +350,7 @@ func templateFuncs() textTpl.FuncMap {
if math.Abs(v) <= 1 || math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v), nil
}
prefix := ""
for _, p := range []string{"ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"} {
if math.Abs(v) < 1024 {
break
}
prefix = p
v /= 1024
}
return fmt.Sprintf("%.4g%s", v, prefix), nil
return formatutil.HumanizeBytes(v), nil
},
// humanizeDuration converts given seconds to a human-readable duration

View File

@@ -1,6 +1,7 @@
package templates
import (
"math"
"strings"
"testing"
textTpl "text/template"
@@ -50,6 +51,31 @@ func TestTemplateFuncs(t *testing.T) {
if !ok {
t.Fatalf("unexpected mismatch")
}
formatting := func(funcName string, p interface{}, resultExpected string) {
t.Helper()
v := funcs[funcName]
fLocal := v.(func(s interface{}) (string, error))
result, err := fLocal(p)
if err != nil {
t.Fatalf("unexpected error for %s(%f): %s", funcName, p, err)
}
if result != resultExpected {
t.Fatalf("unexpected result for %s(%f); got\n%s\nwant\n%s", funcName, p, result, resultExpected)
}
}
formatting("humanize1024", float64(0), "0")
formatting("humanize1024", math.Inf(0), "+Inf")
formatting("humanize1024", math.NaN(), "NaN")
formatting("humanize1024", float64(127087), "124.1ki")
formatting("humanize1024", float64(130137088), "124.1Mi")
formatting("humanize1024", float64(133260378112), "124.1Gi")
formatting("humanize1024", float64(136458627186688), "124.1Ti")
formatting("humanize1024", float64(139733634239168512), "124.1Pi")
formatting("humanize1024", float64(143087241460908556288), "124.1Ei")
formatting("humanize1024", float64(146521335255970361638912), "124.1Zi")
formatting("humanize1024", float64(150037847302113650318245888), "124.1Yi")
formatting("humanize1024", float64(153638755637364377925883789312), "1.271e+05Yi")
}
func mkTemplate(current, replacement interface{}) textTemplate {

View File

@@ -57,7 +57,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/", "/vmalert", "/vmalert/":
if r.Method != "GET" {
if r.Method != http.MethodGet {
httpserver.Errorf(w, r, "path %q supports only GET method", r.URL.Path)
return false
}

View File

@@ -40,9 +40,9 @@
for _, g := range groups {
for _, r := range g.Rules {
if r.LastError != "" {
rNotOk[g.Name]++
rNotOk[g.ID]++
} else {
rOk[g.Name]++
rOk[g.ID]++
}
}
}
@@ -50,11 +50,11 @@
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
{% for _, g := range groups %}
<div class="group-heading{% if rNotOk[g.Name] > 0 %} alert-danger{% endif %}" data-bs-target="rules-{%s g.ID %}">
<div class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{% endif %}" data-bs-target="rules-{%s g.ID %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s)</a>
{% if rNotOk[g.Name] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.Name] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.Name] %}</span>
{% if rNotOk[g.ID] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.ID] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.ID] %}</span>
<p class="fs-6 fw-lighter">{%s g.File %}</p>
{% if len(g.Params) > 0 %}
<div class="fs-6 fw-lighter">Extra params
@@ -427,6 +427,16 @@
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Debug
</div>
<div class="col">
{%v rule.Debug %}
</div>
</div>
</div>
{% endif %}
<div class="container border-bottom p-2">
<div class="row">
@@ -440,7 +450,7 @@
</div>
<br>
<div class="display-6 pb-3">Last {%d len(rule.Updates) %} updates</span>:</div>
<div class="display-6 pb-3">Last {%d len(rule.Updates) %}/{%d rule.MaxUpdates %} updates</span>:</div>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>

View File

@@ -171,9 +171,9 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
for _, g := range groups {
for _, r := range g.Rules {
if r.LastError != "" {
rNotOk[g.Name]++
rNotOk[g.ID]++
} else {
rOk[g.Name]++
rOk[g.ID]++
}
}
}
@@ -189,7 +189,7 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
qw422016.N().S(`
<div class="group-heading`)
//line app/vmalert/web.qtpl:53
if rNotOk[g.Name] > 0 {
if rNotOk[g.ID] > 0 {
//line app/vmalert/web.qtpl:53
qw422016.N().S(` alert-danger`)
//line app/vmalert/web.qtpl:53
@@ -230,11 +230,11 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
qw422016.N().S(`s)</a>
`)
//line app/vmalert/web.qtpl:56
if rNotOk[g.Name] > 0 {
if rNotOk[g.ID] > 0 {
//line app/vmalert/web.qtpl:56
qw422016.N().S(`<span class="badge bg-danger" title="Number of rules with status Error">`)
//line app/vmalert/web.qtpl:56
qw422016.N().D(rNotOk[g.Name])
qw422016.N().D(rNotOk[g.ID])
//line app/vmalert/web.qtpl:56
qw422016.N().S(`</span> `)
//line app/vmalert/web.qtpl:56
@@ -243,7 +243,7 @@ func StreamListGroups(qw422016 *qt422016.Writer, r *http.Request, groups []APIGr
qw422016.N().S(`
<span class="badge bg-success" title="Number of rules withs status Ok">`)
//line app/vmalert/web.qtpl:57
qw422016.N().D(rOk[g.Name])
qw422016.N().D(rOk[g.ID])
//line app/vmalert/web.qtpl:57
qw422016.N().S(`</span>
<p class="fs-6 fw-lighter">`)
@@ -1313,10 +1313,24 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Debug
</div>
<div class="col">
`)
//line app/vmalert/web.qtpl:436
qw422016.E().V(rule.Debug)
//line app/vmalert/web.qtpl:436
qw422016.N().S(`
</div>
</div>
</div>
`)
//line app/vmalert/web.qtpl:430
//line app/vmalert/web.qtpl:440
}
//line app/vmalert/web.qtpl:430
//line app/vmalert/web.qtpl:440
qw422016.N().S(`
<div class="container border-bottom p-2">
<div class="row">
@@ -1325,17 +1339,17 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
</div>
<div class="col">
<a target="_blank" href="`)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.E().S(prefix)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.N().S(`groups#group-`)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.E().S(rule.GroupID)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.E().S(rule.GroupID)
//line app/vmalert/web.qtpl:437
//line app/vmalert/web.qtpl:447
qw422016.N().S(`</a>
</div>
</div>
@@ -1343,9 +1357,13 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<br>
<div class="display-6 pb-3">Last `)
//line app/vmalert/web.qtpl:443
//line app/vmalert/web.qtpl:453
qw422016.N().D(len(rule.Updates))
//line app/vmalert/web.qtpl:443
//line app/vmalert/web.qtpl:453
qw422016.N().S(`/`)
//line app/vmalert/web.qtpl:453
qw422016.N().D(rule.MaxUpdates)
//line app/vmalert/web.qtpl:453
qw422016.N().S(` updates</span>:</div>
<table class="table table-striped table-hover table-sm">
<thead>
@@ -1360,201 +1378,201 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<tbody>
`)
//line app/vmalert/web.qtpl:456
//line app/vmalert/web.qtpl:466
for _, u := range rule.Updates {
//line app/vmalert/web.qtpl:456
//line app/vmalert/web.qtpl:466
qw422016.N().S(`
<tr`)
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
if u.err != nil {
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
}
//line app/vmalert/web.qtpl:457
//line app/vmalert/web.qtpl:467
qw422016.N().S(`>
<td>
<span class="badge bg-primary rounded-pill me-3" title="Updated at">`)
//line app/vmalert/web.qtpl:459
//line app/vmalert/web.qtpl:469
qw422016.E().S(u.time.Format(time.RFC3339))
//line app/vmalert/web.qtpl:459
//line app/vmalert/web.qtpl:469
qw422016.N().S(`</span>
</td>
<td class="text-center" wi>`)
//line app/vmalert/web.qtpl:461
//line app/vmalert/web.qtpl:471
qw422016.N().D(u.samples)
//line app/vmalert/web.qtpl:461
//line app/vmalert/web.qtpl:471
qw422016.N().S(`</td>
<td class="text-center">`)
//line app/vmalert/web.qtpl:462
//line app/vmalert/web.qtpl:472
qw422016.N().FPrec(u.duration.Seconds(), 3)
//line app/vmalert/web.qtpl:462
//line app/vmalert/web.qtpl:472
qw422016.N().S(`s</td>
<td class="text-center">`)
//line app/vmalert/web.qtpl:463
//line app/vmalert/web.qtpl:473
qw422016.E().S(u.at.Format(time.RFC3339))
//line app/vmalert/web.qtpl:463
//line app/vmalert/web.qtpl:473
qw422016.N().S(`</td>
<td>
<textarea class="curl-area" rows="1" onclick="this.focus();this.select()">`)
//line app/vmalert/web.qtpl:465
//line app/vmalert/web.qtpl:475
qw422016.E().S(u.curl)
//line app/vmalert/web.qtpl:465
//line app/vmalert/web.qtpl:475
qw422016.N().S(`</textarea>
</td>
</tr>
</li>
`)
//line app/vmalert/web.qtpl:469
//line app/vmalert/web.qtpl:479
if u.err != nil {
//line app/vmalert/web.qtpl:469
//line app/vmalert/web.qtpl:479
qw422016.N().S(`
<tr`)
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
if u.err != nil {
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
}
//line app/vmalert/web.qtpl:470
//line app/vmalert/web.qtpl:480
qw422016.N().S(`>
<td colspan="5">
<span class="alert-danger">`)
//line app/vmalert/web.qtpl:472
//line app/vmalert/web.qtpl:482
qw422016.E().V(u.err)
//line app/vmalert/web.qtpl:472
//line app/vmalert/web.qtpl:482
qw422016.N().S(`</span>
</td>
</tr>
`)
//line app/vmalert/web.qtpl:475
//line app/vmalert/web.qtpl:485
}
//line app/vmalert/web.qtpl:475
//line app/vmalert/web.qtpl:485
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:476
//line app/vmalert/web.qtpl:486
}
//line app/vmalert/web.qtpl:476
//line app/vmalert/web.qtpl:486
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:478
//line app/vmalert/web.qtpl:488
tpl.StreamFooter(qw422016, r)
//line app/vmalert/web.qtpl:478
//line app/vmalert/web.qtpl:488
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
}
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
func WriteRuleDetails(qq422016 qtio422016.Writer, r *http.Request, rule APIRule) {
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
StreamRuleDetails(qw422016, r, rule)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
}
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
func RuleDetails(r *http.Request, rule APIRule) string {
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
WriteRuleDetails(qb422016, r, rule)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
return qs422016
//line app/vmalert/web.qtpl:479
//line app/vmalert/web.qtpl:489
}
//line app/vmalert/web.qtpl:483
//line app/vmalert/web.qtpl:493
func streambadgeState(qw422016 *qt422016.Writer, state string) {
//line app/vmalert/web.qtpl:483
//line app/vmalert/web.qtpl:493
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:485
//line app/vmalert/web.qtpl:495
badgeClass := "bg-warning text-dark"
if state == "firing" {
badgeClass = "bg-danger"
}
//line app/vmalert/web.qtpl:489
//line app/vmalert/web.qtpl:499
qw422016.N().S(`
<span class="badge `)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.E().S(badgeClass)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.E().S(state)
//line app/vmalert/web.qtpl:490
//line app/vmalert/web.qtpl:500
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
}
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
func writebadgeState(qq422016 qtio422016.Writer, state string) {
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
streambadgeState(qw422016, state)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
}
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
func badgeState(state string) string {
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
writebadgeState(qb422016, state)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
return qs422016
//line app/vmalert/web.qtpl:491
//line app/vmalert/web.qtpl:501
}
//line app/vmalert/web.qtpl:493
//line app/vmalert/web.qtpl:503
func streambadgeRestored(qw422016 *qt422016.Writer) {
//line app/vmalert/web.qtpl:493
//line app/vmalert/web.qtpl:503
qw422016.N().S(`
<span class="badge bg-warning text-dark" title="Alert state was restored after the service restart from remote storage">restored</span>
`)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
}
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
func writebadgeRestored(qq422016 qtio422016.Writer) {
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
streambadgeRestored(qw422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
}
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
func badgeRestored() string {
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
writebadgeRestored(qb422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
return qs422016
//line app/vmalert/web.qtpl:495
//line app/vmalert/web.qtpl:505
}

View File

@@ -17,7 +17,7 @@ func TestHandler(t *testing.T) {
alerts: map[uint64]*notifier.Alert{
0: {State: notifier.StateFiring},
},
state: newRuleState(),
state: newRuleState(10),
}
g := &Group{
Name: "group",

View File

@@ -113,16 +113,20 @@ type APIRule struct {
// Additional fields
// Type of the rule: recording or alerting
// DatasourceType of the rule: prometheus or graphite
DatasourceType string `json:"datasourceType"`
LastSamples int `json:"lastSamples"`
// ID is a unique Alert's ID within a group
ID string `json:"id"`
// GroupID is an unique Group's ID
GroupID string `json:"group_id"`
// Debug shows whether debug mode is enabled
Debug bool `json:"debug"`
// MaxUpdates is the max number of recorded ruleStateEntry objects
MaxUpdates int `json:"max_updates_entries"`
// Updates contains the ordered list of recorded ruleStateEntry objects
Updates []ruleStateEntry `json:"updates"`
Updates []ruleStateEntry `json:"-"`
}
// WebLink returns a link to the alert which can be used in UI.

View File

@@ -28,7 +28,36 @@ accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.co
## Load balancing
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls.
In the latter case `vmauth` balances load among the configured urls in least-loaded round-robin manner.
`vmauth` retries failing `GET` requests across the configured list of urls.
This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes
in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
## Concurrency limiting
`vmauth` limits the number of concurrent requests it can proxy according to the following command-line flags:
- `-maxConcurrentRequests` limits the global number of concurrent requests `vmauth` can serve across all the configured users.
- `-maxConcurrentPerUserRequests` limits the number of concurrent requests `vmauth` can serve per each configured user.
It is also possible to set individual limits on the number of concurrent requests per each user
with the `max_concurrent_requests` option - see [auth config example](#auth-config).
`vmauth` responds with `429 Too Many Requests` HTTP error when the number of concurrent requests exceeds the configured limits.
The following [metrics](#monitoring) related to concurrency limits are exposed by `vmauth`:
- `vmauth_concurrent_requests_capacity` - the global limit on the number of concurrent requests `vmauth` can serve.
It is set via `-maxConcurrentRequests` command-line flag.
- `vmauth_concurrent_requests_current` - the current number of concurrent requests `vmauth` processes.
- `vmauth_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
because of the global concurrency limit has been reached.
- `vmauth_user_concurrent_requests_capacity{username="..."}` - the limit on the number of concurrent requests for the given `username`.
- `vmauth_user_concurrent_requests_current{username="..."}` - the current number of concurrent requests for the given `username`.
- `vmauth_user_concurrent_requests_limit_reached_total{username="foo"}` - the number of requests rejected with `429 Too Many Requests` error
because of the concurrency limit has been reached for the given `username`.
## Auth config
@@ -55,26 +84,27 @@ users:
headers:
- "X-Scope-OrgID: foobar"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://localhost:8428 .
# are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
#
# The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
# Excess concurrent requests are rejected with 429 HTTP status code.
# See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
max_concurrent_requests: 10
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# are proxied to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
- username: "local-single-node2"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -84,10 +114,8 @@ users:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
@@ -261,7 +289,11 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8427")
TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol (default ":8427")
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-logInvalidAuthTokens
Whether to log requests with invalid auth tokens. Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
-loggerDisableTimestamps
@@ -270,6 +302,8 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -278,8 +312,12 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentPerUserRequests int
The maximum number of concurrent requests vmauth can process per each configured user. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option in per-user config (default 300)
-maxConcurrentRequests int
The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options (default 1000)
-maxIdleConnsPerBackend int
The maximum number of idle connections vmauth can open per each backend host (default 100)
The maximum number of idle connections vmauth can open per each backend host. See also -maxConcurrentRequests (default 100)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
@@ -299,6 +337,8 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Supports an array of values separated by comma or specified via multiple flags.
-reloadAuthKey string
Auth key for /-/reload http endpoint. It must be passed as authKey=...
-responseTimeout duration
The timeout for receiving a response from backend (default 5m0s)
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -13,6 +13,7 @@ import (
"sync/atomic"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
@@ -32,17 +33,43 @@ type AuthConfig struct {
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMap []URLMap `yaml:"url_map,omitempty"`
Headers []Header `yaml:"headers,omitempty"`
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
Headers []Header `yaml:"headers,omitempty"`
MaxConcurrentRequests int `yaml:"max_concurrent_requests,omitempty"`
concurrencyLimitCh chan struct{}
concurrencyLimitReached *metrics.Counter
requests *metrics.Counter
}
func (ui *UserInfo) beginConcurrencyLimit() error {
select {
case ui.concurrencyLimitCh <- struct{}{}:
return nil
default:
ui.concurrencyLimitReached.Inc()
return fmt.Errorf("cannot handle more than %d concurrent requests from user %s", ui.getMaxConcurrentRequests(), ui.name())
}
}
func (ui *UserInfo) endConcurrencyLimit() {
<-ui.concurrencyLimitCh
}
func (ui *UserInfo) getMaxConcurrentRequests() int {
mcr := ui.MaxConcurrentRequests
if mcr <= 0 {
mcr = *maxConcurrentPerUserRequests
}
return mcr
}
// Header is `Name: Value` http header, which must be added to the proxied request.
type Header struct {
Name string
@@ -83,16 +110,77 @@ type SrcPath struct {
re *regexp.Regexp
}
// URLPrefix represents pased `url_prefix`
// URLPrefix represents passed `url_prefix`
type URLPrefix struct {
n uint32
urls []*url.URL
n uint32
bus []*backendURL
}
func (up *URLPrefix) getNextURL() *url.URL {
type backendURL struct {
brokenDeadline uint64
concurrentRequests int32
url *url.URL
}
func (bu *backendURL) isBroken() bool {
ct := fasttime.UnixTimestamp()
return ct < atomic.LoadUint64(&bu.brokenDeadline)
}
func (bu *backendURL) setBroken() {
deadline := fasttime.UnixTimestamp() + 3
atomic.StoreUint64(&bu.brokenDeadline, deadline)
}
func (bu *backendURL) put() {
atomic.AddInt32(&bu.concurrentRequests, -1)
}
func (up *URLPrefix) getBackendsCount() int {
return len(up.bus)
}
// getLeastLoadedBackendURL returns the backendURL with the minimum number of concurrent requests.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
bus := up.bus
if len(bus) == 1 {
// Fast path - return the only backend url.
bu := bus[0]
atomic.AddInt32(&bu.concurrentRequests, 1)
return bu
}
// Slow path - select other backend urls.
n := atomic.AddUint32(&up.n, 1)
idx := n % uint32(len(up.urls))
return up.urls[idx]
for i := uint32(0); i < uint32(len(bus)); i++ {
idx := (n + i) % uint32(len(bus))
bu := bus[idx]
if bu.isBroken() {
continue
}
if atomic.CompareAndSwapInt32(&bu.concurrentRequests, 0, 1) {
// Fast path - return the backend with zero concurrently executed requests.
return bu
}
}
// Slow path - return the backend with the minimum number of concurrently executed requests.
buMin := bus[n%uint32(len(bus))]
minRequests := atomic.LoadInt32(&buMin.concurrentRequests)
for _, bu := range bus {
if bu.isBroken() {
continue
}
if n := atomic.LoadInt32(&bu.concurrentRequests); n < minRequests {
buMin = bu
minRequests = n
}
}
atomic.AddInt32(&buMin.concurrentRequests, 1)
return buMin
}
// UnmarshalYAML unmarshals up from yaml.
@@ -121,31 +209,33 @@ func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
default:
return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
}
pus := make([]*url.URL, len(urls))
bus := make([]*backendURL, len(urls))
for i, u := range urls {
pu, err := url.Parse(u)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
}
pus[i] = pu
bus[i] = &backendURL{
url: pu,
}
}
up.urls = pus
up.bus = bus
return nil
}
// MarshalYAML marshals up to yaml.
func (up *URLPrefix) MarshalYAML() (interface{}, error) {
var b []byte
if len(up.urls) == 1 {
u := up.urls[0].String()
if len(up.bus) == 1 {
u := up.bus[0].url.String()
b = strconv.AppendQuote(b, u)
return string(b), nil
}
b = append(b, '[')
for i, pu := range up.urls {
u := pu.String()
for i, bu := range up.bus {
u := bu.url.String()
b = strconv.AppendQuote(b, u)
if i+1 < len(up.urls) {
if i+1 < len(up.bus) {
b = append(b, ',')
}
}
@@ -284,7 +374,7 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
return nil, err
}
}
for _, e := range ui.URLMap {
for _, e := range ui.URLMaps {
if len(e.SrcPaths) == 0 {
return nil, fmt.Errorf("missing `src_paths` in `url_map`")
}
@@ -295,32 +385,47 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
return nil, err
}
}
if len(ui.URLMap) == 0 && ui.URLPrefix == nil {
if len(ui.URLMaps) == 0 && ui.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix`")
}
name := ui.name()
if ui.BearerToken != "" {
name := "bearer_token"
if ui.Name != "" {
name = ui.Name
}
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
}
if ui.Username != "" {
name := ui.Username
if ui.Name != "" {
name = ui.Name
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
}
mcr := ui.getMaxConcurrentRequests()
ui.concurrencyLimitCh = make(chan struct{}, mcr)
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_concurrent_requests_limit_reached_total{username=%q}`, name))
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_capacity{username=%q}`, name), func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
byAuthToken[at1] = ui
byAuthToken[at2] = ui
}
return byAuthToken, nil
}
func (ui *UserInfo) name() string {
if ui.Name != "" {
return ui.Name
}
if ui.Username != "" {
return ui.Username
}
if ui.BearerToken != "" {
return "bearer_token"
}
return ""
}
func getAuthTokens(bearerToken, username, password string) (string, string) {
if bearerToken != "" {
// Accept the bearerToken as Basic Auth username with empty password
@@ -342,12 +447,12 @@ func getAuthToken(bearerToken, username, password string) string {
}
func (up *URLPrefix) sanitize() error {
for i, pu := range up.urls {
puNew, err := sanitizeURLPrefix(pu)
for _, bu := range up.bus {
puNew, err := sanitizeURLPrefix(bu.url)
if err != nil {
return err
}
up.urls[i] = puNew
bu.url = puNew
}
return nil
}

View File

@@ -218,11 +218,13 @@ users:
- username: foo
password: bar
url_prefix: http://aaa:343/bbb
max_concurrent_requests: 5
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
MaxConcurrentRequests: 5,
},
})
@@ -278,7 +280,7 @@ users:
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
BearerToken: "foo",
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
@@ -304,7 +306,7 @@ users:
},
getAuthToken("", "foo", ""): {
BearerToken: "foo",
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
@@ -390,15 +392,17 @@ func mustParseURL(u string) *URLPrefix {
}
func mustParseURLs(us []string) *URLPrefix {
pus := make([]*url.URL, len(us))
bus := make([]*backendURL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
pus[i] = pu
bus[i] = &backendURL{
url: pu,
}
}
return &URLPrefix{
urls: pus,
bus: bus,
}
}

View File

@@ -18,26 +18,27 @@ users:
headers:
- "X-Scope-OrgID: foobar"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://localhost:8428 .
# are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
#
# The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
# Excess concurrent requests are rejected with 429 HTTP status code.
# See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
max_concurrent_requests: 10
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# are proxied to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
- username: "local-single-node2"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -47,10 +48,8 @@ users:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write

View File

@@ -3,8 +3,10 @@ package main
import (
"flag"
"fmt"
"io"
"net"
"net/http"
"net/http/httputil"
"net/textproto"
"net/url"
"os"
"strings"
@@ -12,20 +14,31 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host. "+
"See also -maxConcurrentRequests")
responseTimeout = flag.Duration("responseTimeout", 5*time.Minute, "The timeout for receiving a response from backend")
maxConcurrentRequests = flag.Int("maxConcurrentRequests", 1000, "The maximum number of concurrent requests vmauth can process. Other requests are rejected with "+
"'429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options")
maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
"in per-user config")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
)
@@ -41,7 +54,7 @@ func main() {
logger.Infof("starting vmauth at %q...", *httpListenAddr)
startTime := time.Now()
initAuthConfig()
go httpserver.Serve(*httpListenAddr, requestHandler)
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
logger.Infof("started vmauth in %.3f seconds", time.Since(startTime).Seconds())
sig := procutil.WaitForSigterm()
@@ -60,9 +73,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/-/reload":
authKey := r.FormValue("authKey")
if authKey != *reloadAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -reloadAuthKey command line flag", authKey)
if !httpserver.CheckAuthFlag(w, r, *reloadAuthKey, "reloadAuthKey") {
return true
}
configReloadRequests.Inc()
@@ -81,44 +92,175 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
// See https://docs.influxdata.com/influxdb/v2.0/api/
authToken = strings.Replace(authToken, "Token", "Bearer", 1)
}
ac := authConfig.Load().(map[string]*UserInfo)
ui := ac[authToken]
if ui == nil {
invalidAuthTokenRequests.Inc()
err := fmt.Errorf("cannot find the provided auth token %q in config", authToken)
if *logInvalidAuthTokens {
httpserver.Errorf(w, r, "cannot find the provided auth token %q in config", authToken)
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
} else {
errStr := fmt.Sprintf("cannot find the provided auth token %q in config", authToken)
http.Error(w, errStr, http.StatusBadRequest)
http.Error(w, err.Error(), http.StatusUnauthorized)
}
return true
}
ui.requests.Inc()
targetURL, headers, err := createTargetURL(ui, r.URL)
if err != nil {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
// Limit the concurrency of requests to backends
concurrencyLimitOnce.Do(concurrencyLimitInit)
select {
case concurrencyLimitCh <- struct{}{}:
if err := ui.beginConcurrencyLimit(); err != nil {
handleConcurrencyLimitError(w, r, err)
<-concurrencyLimitCh
return true
}
default:
concurrentRequestsLimitReached.Inc()
err := fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh))
handleConcurrencyLimitError(w, r, err)
return true
}
r.Header.Set("vm-target-url", targetURL.String())
for _, h := range headers {
r.Header.Set(h.Name, h.Value)
}
proxyRequest(w, r)
processRequest(w, r, ui)
ui.endConcurrencyLimit()
<-concurrencyLimitCh
return true
}
func proxyRequest(w http.ResponseWriter, r *http.Request) {
defer func() {
err := recover()
if err == nil || err == http.ErrAbortHandler {
// Suppress http.ErrAbortHandler panic.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
u := normalizeURL(r.URL)
up, headers, err := ui.getURLPrefixAndHeaders(u)
if err != nil {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
return
}
maxAttempts := up.getBackendsCount()
for i := 0; i < maxAttempts; i++ {
bu := up.getLeastLoadedBackendURL()
targetURL := mergeURLs(bu.url, u)
ok := tryProcessingRequest(w, r, targetURL, headers)
bu.put()
if ok {
return
}
// Forward other panics to the caller.
panic(err)
}()
getReverseProxy().ServeHTTP(w, r)
bu.setBroken()
}
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("all the backends for the user %q are unavailable", ui.name()),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, headers []Header) bool {
// This code has been copied from net/http/httputil/reverseproxy.go
req := sanitizeRequestHeaders(r)
req.URL = targetURL
for _, h := range headers {
req.Header.Set(h.Name, h.Value)
}
transportOnce.Do(transportInit)
res, err := transport.RoundTrip(req)
if err != nil {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
if r.Method == http.MethodPost || r.Method == http.MethodPut {
// It is impossible to retry POST and PUT requests,
// since we already proxied the request body to the backend.
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot proxy the request to %q: %w", targetURL, err),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
return true
}
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying the request to %q: %s", remoteAddr, requestURI, targetURL, err)
return false
}
removeHopHeaders(res.Header)
copyHeader(w.Header(), res.Header)
w.WriteHeader(res.StatusCode)
copyBuf := copyBufPool.Get()
copyBuf.B = bytesutil.ResizeNoCopyNoOverallocate(copyBuf.B, 16*1024)
_, err = io.CopyBuffer(w, res.Body, copyBuf.B)
copyBufPool.Put(copyBuf)
_ = res.Body.Close()
if err != nil && !netutil.IsTrivialNetworkError(err) {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
return true
}
return true
}
var copyBufPool bytesutil.ByteBufferPool
func copyHeader(dst, src http.Header) {
for k, vv := range src {
for _, v := range vv {
dst.Add(k, v)
}
}
}
func sanitizeRequestHeaders(r *http.Request) *http.Request {
// This code has been copied from net/http/httputil/reverseproxy.go
req := r.Clone(r.Context())
removeHopHeaders(req.Header)
if clientIP, _, err := net.SplitHostPort(req.RemoteAddr); err == nil {
// If we aren't the first proxy retain prior
// X-Forwarded-For information as a comma+space
// separated list and fold multiple headers into one.
prior := req.Header["X-Forwarded-For"]
if len(prior) > 0 {
clientIP = strings.Join(prior, ", ") + ", " + clientIP
}
req.Header.Set("X-Forwarded-For", clientIP)
}
return req
}
func removeHopHeaders(h http.Header) {
// remove hop-by-hop headers listed in the "Connection" header of h.
// See RFC 7230, section 6.1
for _, f := range h["Connection"] {
for _, sf := range strings.Split(f, ",") {
if sf = textproto.TrimString(sf); sf != "" {
h.Del(sf)
}
}
}
// Remove hop-by-hop headers to the backend. Especially
// important is "Connection" because we want a persistent
// connection, regardless of what the client sent to us.
for _, key := range hopHeaders {
h.Del(key)
}
}
// Hop-by-hop headers. These are removed when sent to the backend.
// As of RFC 7230, hop-by-hop headers are required to appear in the
// Connection header field. These are the headers defined by the
// obsoleted RFC 2616 (section 13.5.1) and are used for backward
// compatibility.
var hopHeaders = []string{
"Connection",
"Proxy-Connection", // non-standard but still sent by libcurl and rejected by e.g. google
"Keep-Alive",
"Proxy-Authenticate",
"Proxy-Authorization",
"Te", // canonicalized version of "TE"
"Trailer", // not Trailers per URL above; https://www.rfc-editor.org/errata_search.php?eid=4522
"Transfer-Encoding",
"Upgrade",
}
var (
@@ -128,43 +270,41 @@ var (
)
var (
reverseProxy *httputil.ReverseProxy
reverseProxyOnce sync.Once
transport *http.Transport
transportOnce sync.Once
)
func getReverseProxy() *httputil.ReverseProxy {
reverseProxyOnce.Do(initReverseProxy)
return reverseProxy
func transportInit() {
tr := http.DefaultTransport.(*http.Transport).Clone()
tr.ResponseHeaderTimeout = *responseTimeout
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
transport = tr
}
// initReverseProxy must be called after flag.Parse(), since it uses command-line flags.
func initReverseProxy() {
reverseProxy = &httputil.ReverseProxy{
Director: func(r *http.Request) {
targetURL := r.Header.Get("vm-target-url")
target, err := url.Parse(targetURL)
if err != nil {
logger.Panicf("BUG: unexpected error when parsing targetURL=%q: %s", targetURL, err)
}
r.URL = target
},
Transport: func() *http.Transport {
tr := http.DefaultTransport.(*http.Transport).Clone()
// Automatic compression must be disabled in order to fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/535
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
if tr.MaxIdleConns != 0 && tr.MaxIdleConns < tr.MaxIdleConnsPerHost {
tr.MaxIdleConns = tr.MaxIdleConnsPerHost
}
return tr
}(),
FlushInterval: time.Second,
ErrorLog: logger.StdErrorLogger(),
}
var (
concurrencyLimitCh chan struct{}
concurrencyLimitOnce sync.Once
)
func concurrencyLimitInit() {
concurrencyLimitCh = make(chan struct{}, *maxConcurrentRequests)
_ = metrics.NewGauge("vmauth_concurrent_requests_capacity", func() float64 {
return float64(*maxConcurrentRequests)
})
_ = metrics.NewGauge("vmauth_concurrent_requests_current", func() float64 {
return float64(len(concurrencyLimitCh))
})
}
var concurrentRequestsLimitReached = metrics.NewCounter("vmauth_concurrent_requests_limit_reached_total")
func usage() {
const s = `
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
@@ -173,3 +313,12 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
`
flagutil.Usage(s)
}
func handleConcurrencyLimitError(w http.ResponseWriter, r *http.Request, err error) {
w.Header().Add("Retry-After", "10")
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusTooManyRequests,
}
httpserver.Errorf(w, r, "%s", err)
}

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -7,11 +7,6 @@ import (
"strings"
)
func (up *URLPrefix) mergeURLs(requestURI *url.URL) *url.URL {
pu := up.getNextURL()
return mergeURLs(pu, requestURI)
}
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
targetURL := *uiURL
targetURL.Path += requestURI.Path
@@ -35,12 +30,27 @@ func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
return &targetURL
}
func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
func (ui *UserInfo) getURLPrefixAndHeaders(u *url.URL) (*URLPrefix, []Header, error) {
for _, e := range ui.URLMaps {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix, e.Headers, nil
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix, ui.Headers, nil
}
missingRouteRequests.Inc()
return nil, nil, fmt.Errorf("missing route for %q", u.String())
}
func normalizeURL(uOrig *url.URL) *url.URL {
u := *uOrig
// Prevent from attacks with using `..` in r.URL.Path
u.Path = path.Clean(u.Path)
if !strings.HasSuffix(u.Path, "/") && strings.HasSuffix(uOrig.Path, "/") {
// The path.Clean() removes traling slash.
// The path.Clean() removes trailing slash.
// Return it back if needed.
// This should fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1752
u.Path += "/"
@@ -52,16 +62,5 @@ func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1554
u.Path = ""
}
for _, e := range ui.URLMap {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix.mergeURLs(&u), e.Headers, nil
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix.mergeURLs(&u), ui.Headers, nil
}
missingRouteRequests.Inc()
return nil, nil, fmt.Errorf("missing route for %q", u.String())
return &u
}

View File

@@ -13,10 +13,14 @@ func TestCreateTargetURLSuccess(t *testing.T) {
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, headers, err := createTargetURL(ui, u)
u = normalizeURL(u)
up, headers, err := ui.getURLPrefixAndHeaders(u)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
bu := up.getLeastLoadedBackendURL()
target := mergeURLs(bu.url, u)
bu.put()
if target.String() != expectedTarget {
t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
}
@@ -54,7 +58,7 @@ func TestCreateTargetURLSuccess(t *testing.T) {
// Complex routing with `url_map`
ui := &UserInfo{
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
@@ -86,7 +90,7 @@ func TestCreateTargetURLSuccess(t *testing.T) {
// Complex routing regexp paths in `url_map`
ui = &UserInfo{
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query(_range)?", "/api/v1/label/[^/]+/values"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
@@ -119,20 +123,21 @@ func TestCreateTargetURLFailure(t *testing.T) {
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, headers, err := createTargetURL(ui, u)
u = normalizeURL(u)
up, headers, err := ui.getURLPrefixAndHeaders(u)
if err == nil {
t.Fatalf("expecting non-nil error")
}
if target != nil {
t.Fatalf("unexpected target=%q; want empty string", target)
if up != nil {
t.Fatalf("unexpected non-empty up=%#v", up)
}
if headers != nil {
t.Fatalf("unexpected headers=%q; want empty string", headers)
t.Fatalf("unexpected non-empty headers=%q", headers)
}
}
f(&UserInfo{}, "/foo/bar")
f(&UserInfo{
URLMap: []URLMap{
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: mustParseURL("http://foobar/baz"),

View File

@@ -225,6 +225,8 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"fmt"
"net/url"
"os"
"path/filepath"
"strings"
@@ -41,25 +42,36 @@ func main() {
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
flagutil.RegisterSecretFlag("snapshot.createURL")
flagutil.RegisterSecretFlag("snapshot.deleteURL")
envflag.Parse()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
if len(*snapshotCreateURL) > 0 {
// create net/url object
createUrl, err := url.Parse(*snapshotCreateURL)
if err != nil {
logger.Fatalf("cannot parse snapshotCreateURL: %s", err)
}
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", *snapshotCreateURL)
logger.Infof("Snapshot create url %s", createUrl.Redacted())
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
if err != nil {
logger.Fatalf("Failed to set snapshot.deleteURL flag: %v", err)
}
}
logger.Infof("Snapshot delete url %s", *snapshotDeleteURL)
deleteUrl, err := url.Parse(*snapshotDeleteURL)
if err != nil {
logger.Fatalf("cannot parse snapshotDeleteURL: %s", err)
}
logger.Infof("Snapshot delete url %s", deleteUrl.Redacted())
name, err := snapshot.Create(*snapshotCreateURL)
name, err := snapshot.Create(createUrl.String())
if err != nil {
logger.Fatalf("cannot create snapshot: %s", err)
}
@@ -69,7 +81,7 @@ func main() {
}
defer func() {
err := snapshot.Delete(*snapshotDeleteURL, name)
err := snapshot.Delete(deleteUrl.String(), name)
if err != nil {
logger.Fatalf("cannot delete snapshot: %s", err)
}
@@ -81,7 +93,7 @@ func main() {
logger.Fatalf("invalid -snapshotName=%q: %s", *snapshotName, err)
}
go httpserver.Serve(*httpListenAddr, nil)
go httpserver.Serve(*httpListenAddr, false, nil)
srcFS, err := newSrcFS()
if err != nil {
@@ -118,7 +130,7 @@ func main() {
func usage() {
const s = `
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3, azblob
or local filesystem. Backed up data can be restored with vmrestore.
See the docs at https://docs.victoriametrics.com/vmbackup.html .

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -67,7 +67,7 @@ credentials.json
"project_id": "<project>",
"private_key_id": "",
"private_key": "-----BEGIN PRIVATE KEY-----\-----END PRIVATE KEY-----\n",
"client_email": test@<project>.iam.gserviceaccount.com",
"client_email": "test@<project>.iam.gserviceaccount.com",
"client_id": "",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
@@ -158,7 +158,7 @@ The result on the GCS bucket. We see only 3 daily backups:
* GET `/api/v1/backups` - returns list of backups in remote storage.
Example output:
```json
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2022-11-30","size_bytes":26664689,"size":"25.429Mi"},{"name":"daily/2022-12-01","size_bytes":40160965,"size":"38.300Mi"},{"name":"hourly/2022-11-30:12","size_bytes":5846529,"size":"5.576Mi"},{"name":"hourly/2022-11-30:13","size_bytes":17651847,"size":"16.834Mi"},{"name":"hourly/2022-11-30:13:22","size_bytes":8797831,"size":"8.390Mi"},{"name":"hourly/2022-11-30:14","size_bytes":10680454,"size":"10.186Mi"}]
```
* POST `/api/v1/restore` - saves backup name to restore when [performing restore](#restore-commands).
@@ -211,7 +211,7 @@ It can be changed by using flag:
`vmbackupmanager backup list` lists backups in remote storage:
```console
$ ./vmbackupmanager backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2022-11-30","size_bytes":26664689,"size":"25.429Mi"},{"name":"daily/2022-12-01","size_bytes":40160965,"size":"38.300Mi"},{"name":"hourly/2022-11-30:12","size_bytes":5846529,"size":"5.576Mi"},{"name":"hourly/2022-11-30:13","size_bytes":17651847,"size":"16.834Mi"},{"name":"hourly/2022-11-30:13:22","size_bytes":8797831,"size":"8.390Mi"},{"name":"hourly/2022-11-30:14","size_bytes":10680454,"size":"10.186Mi"}]
```
### Restore commands
@@ -270,7 +270,15 @@ If restore mark doesn't exist at `storageDataPath`(restore wasn't requested) `vm
### How to restore in Kubernetes
1. Enter container running `vmbackupmanager`
1. Ensure there is an init container with `vmbackupmanager restore` in `vmstorage` or `vmsingle` pod.
For [VictoriaMetrics operator](https://docs.victoriametrics.com/operator/VictoriaMetrics-Operator.html) deployments it is required to add:
```yaml
vmbackup:
restore:
onStart: "true"
```
See operator `VMStorage` schema [here](https://docs.victoriametrics.com/operator/api.html#vmstorage) and `VMSingle` [here](https://docs.victoriametrics.com/operator/api.html#vmsinglespec).
2. Enter container running `vmbackupmanager`
2. Use `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
@@ -287,6 +295,43 @@ If restore mark doesn't exist at `storageDataPath`(restore wasn't requested) `vm
```
4. Restart pod
#### Restore cluster into another cluster
These steps are assuming that [VictoriaMetrics operator](https://docs.victoriametrics.com/operator/VictoriaMetrics-Operator.html) is used to manage `VMCluster`.
Clusters here are referred to as `source` and `destination`.
1. Create a new cluster with access to *source* cluster `vmbackupmanager` storage and same number of storage nodes.
Add the following section in order to enable restore on start (operator `VMStorage` schema can be found [here](https://docs.victoriametrics.com/operator/api.html#vmstorage):
```yaml
vmbackup:
restore:
onStart: "true"
```
Note: it is safe to leave this section in the cluster configuration, since it will be ignored if restore mark doesn't exist.
> Important! Use different `-dst` for *destination* cluster to avoid overwriting backup data of the *source* cluster.
2. Enter container running `vmbackupmanager` in *source* cluster
2. Use `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
```
3. Use `vmbackupmanager restore create` to create restore mark at each pod of the *destination* cluster.
Each pod in *destination* cluster should be restored from backup of respective pod in *source* cluster.
For example: `vmstorage-source-0` in *source* cluster should be restored from `vmstorage-destination-0` in *destination* cluster.
```console
$ /vmbackupmanager-prod restore create s3://source_cluster/vmstorage-source-0/daily/2022-10-06
```
## Monitoring
`vmbackupmanager` exports various metrics in Prometheus exposition format at `http://vmbackupmanager:8300/metrics` page. It is recommended setting up regular scraping of this page
either via [vmagent](https://docs.victoriametrics.com/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/17798-victoriametrics-backupmanager/) for `vmbackupmanager` overview.
Graphs on this dashboard contain useful hints - hover the `i` icon in the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard.
## Configuration
### Flags
@@ -372,6 +417,8 @@ command-line flags:
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -0,0 +1,60 @@
package backoff
import (
"context"
"errors"
"fmt"
"math"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
const (
backoffRetries = 5
backoffFactor = 1.7
backoffMinDuration = time.Second
)
// retryableFunc describes call back which will repeat on errors
type retryableFunc func() error
var ErrBadRequest = errors.New("bad request")
// Backoff describes object with backoff policy params
type Backoff struct {
retries int
factor float64
minDuration time.Duration
}
// New initialize backoff object
func New() *Backoff {
return &Backoff{
retries: backoffRetries,
factor: backoffFactor,
minDuration: backoffMinDuration,
}
}
// Retry process retries until all attempts are completed
func (b *Backoff) Retry(ctx context.Context, cb retryableFunc) (uint64, error) {
var attempt uint64
for i := 0; i < b.retries; i++ {
// @TODO we should use context to cancel retries
err := cb()
if err == nil {
return attempt, nil
}
if errors.Is(err, ErrBadRequest) {
logger.Errorf("unrecoverable error: %s", err)
return attempt, err // fail fast if not recoverable
}
attempt++
backoff := float64(b.minDuration) * math.Pow(b.factor, float64(i))
dur := time.Duration(backoff)
logger.Errorf("got error: %s on attempt: %d; will retry in %v", err, attempt, dur)
time.Sleep(time.Duration(backoff))
}
return attempt, fmt.Errorf("execution failed after %d retry attempts", b.retries)
}

View File

@@ -0,0 +1,96 @@
package backoff
import (
"context"
"fmt"
"testing"
"time"
)
func TestRetry_Do(t *testing.T) {
counter := 0
tests := []struct {
name string
backoffRetries int
backoffFactor float64
backoffMinDuration time.Duration
retryableFunc retryableFunc
ctx context.Context
withCancel bool
want uint64
wantErr bool
}{
{
name: "return bad request",
retryableFunc: func() error {
return ErrBadRequest
},
ctx: context.Background(),
want: 0,
wantErr: true,
},
{
name: "empty retries values",
retryableFunc: func() error {
time.Sleep(time.Millisecond * 100)
return nil
},
ctx: context.Background(),
want: 0,
wantErr: false,
},
{
name: "only one retry test",
backoffRetries: 5,
backoffFactor: 1.7,
backoffMinDuration: time.Millisecond * 10,
retryableFunc: func() error {
t := time.NewTicker(time.Millisecond * 5)
defer t.Stop()
for range t.C {
counter++
if counter%2 == 0 {
return fmt.Errorf("got some error")
}
if counter%3 == 0 {
return nil
}
}
return nil
},
ctx: context.Background(),
want: 1,
wantErr: false,
},
{
name: "all retries failed test",
backoffRetries: 5,
backoffFactor: 0.1,
backoffMinDuration: time.Millisecond * 10,
retryableFunc: func() error {
t := time.NewTicker(time.Millisecond * 5)
defer t.Stop()
for range t.C {
return fmt.Errorf("got some error")
}
return nil
},
ctx: context.Background(),
want: 5,
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
r := New()
got, err := r.Retry(tt.ctx, tt.retryableFunc)
if (err != nil) != tt.wantErr {
t.Errorf("Retry() error = %v, wantErr %v", err, tt.wantErr)
return
}
if got != tt.want {
t.Errorf("Retry() got = %v, want %v", got, tt.want)
}
})
}
}

View File

@@ -353,7 +353,7 @@ var (
},
&cli.StringFlag{
Name: vmNativeStepInterval,
Usage: fmt.Sprintf("Split export data into chunks. Requires setting --%s. Valid values are '%s','%s','%s'.", vmNativeFilterTimeStart, stepper.StepMonth, stepper.StepDay, stepper.StepHour),
Usage: fmt.Sprintf("Split export data into chunks. Requires setting --%s. Valid values are '%s','%s','%s','%s'.", vmNativeFilterTimeStart, stepper.StepMonth, stepper.StepDay, stepper.StepHour, stepper.StepMinute),
},
&cli.StringFlag{
Name: vmNativeSrcAddr,
@@ -410,19 +410,20 @@ var (
)
const (
remoteRead = "remote-read"
remoteReadUseStream = "remote-read-use-stream"
remoteReadConcurrency = "remote-read-concurrency"
remoteReadFilterTimeStart = "remote-read-filter-time-start"
remoteReadFilterTimeEnd = "remote-read-filter-time-end"
remoteReadFilterLabel = "remote-read-filter-label"
remoteReadFilterLabelValue = "remote-read-filter-label-value"
remoteReadStepInterval = "remote-read-step-interval"
remoteReadSrcAddr = "remote-read-src-addr"
remoteReadUser = "remote-read-user"
remoteReadPassword = "remote-read-password"
remoteReadHTTPTimeout = "remote-read-http-timeout"
remoteReadHeaders = "remote-read-headers"
remoteRead = "remote-read"
remoteReadUseStream = "remote-read-use-stream"
remoteReadConcurrency = "remote-read-concurrency"
remoteReadFilterTimeStart = "remote-read-filter-time-start"
remoteReadFilterTimeEnd = "remote-read-filter-time-end"
remoteReadFilterLabel = "remote-read-filter-label"
remoteReadFilterLabelValue = "remote-read-filter-label-value"
remoteReadStepInterval = "remote-read-step-interval"
remoteReadSrcAddr = "remote-read-src-addr"
remoteReadUser = "remote-read-user"
remoteReadPassword = "remote-read-password"
remoteReadHTTPTimeout = "remote-read-http-timeout"
remoteReadHeaders = "remote-read-headers"
remoteReadInsecureSkipVerify = "remote-read-insecure-skip-verify"
)
var (
@@ -493,6 +494,11 @@ var (
"For example, --remote-read-headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding remote source storage. \n" +
"Multiple headers must be delimited by '^^': --remote-read-headers='header1:value1^^header2:value2'",
},
&cli.BoolFlag{
Name: remoteReadInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to the remote read address",
Value: false,
},
}
)

View File

@@ -20,7 +20,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native/stream"
)
func main() {
@@ -65,7 +65,7 @@ func main() {
// disable progress bars since openTSDB implementation
// does not use progress bar pool
vmCfg.DisableProgressBar = true
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -100,7 +100,7 @@ func main() {
}
vmCfg := initConfigVM(c)
importer, err = vm.NewImporter(vmCfg)
importer, err = vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -121,14 +121,15 @@ func main() {
Flags: mergeFlags(globalFlags, remoteReadFlags, vmFlags),
Action: func(c *cli.Context) error {
rr, err := remoteread.NewClient(remoteread.Config{
Addr: c.String(remoteReadSrcAddr),
Username: c.String(remoteReadUser),
Password: c.String(remoteReadPassword),
Timeout: c.Duration(remoteReadHTTPTimeout),
UseStream: c.Bool(remoteReadUseStream),
Headers: c.String(remoteReadHeaders),
LabelName: c.String(remoteReadFilterLabel),
LabelValue: c.String(remoteReadFilterLabelValue),
Addr: c.String(remoteReadSrcAddr),
Username: c.String(remoteReadUser),
Password: c.String(remoteReadPassword),
Timeout: c.Duration(remoteReadHTTPTimeout),
UseStream: c.Bool(remoteReadUseStream),
Headers: c.String(remoteReadHeaders),
LabelName: c.String(remoteReadFilterLabel),
LabelValue: c.String(remoteReadFilterLabelValue),
InsecureSkipVerify: c.Bool(remoteReadInsecureSkipVerify),
})
if err != nil {
return fmt.Errorf("error create remote read client: %s", err)
@@ -136,7 +137,7 @@ func main() {
vmCfg := initConfigVM(c)
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -162,7 +163,7 @@ func main() {
fmt.Println("Prometheus import mode")
vmCfg := initConfigVM(c)
importer, err = vm.NewImporter(vmCfg)
importer, err = vm.NewImporter(ctx, vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
@@ -246,7 +247,7 @@ func main() {
return cli.Exit(fmt.Errorf("cannot open exported block at path=%q err=%w", blockPath, err), 1)
}
var blocksCount uint64
if err := parser.ParseStream(f, isBlockGzipped, func(block *parser.Block) error {
if err := stream.Parse(f, isBlockGzipped, func(block *stream.Block) error {
atomic.AddUint64(&blocksCount, 1)
return nil
}); err != nil {

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -102,6 +102,7 @@ func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
if err != nil {
return fmt.Errorf("failed to read block: %s", err)
}
var it chunkenc.Iterator
for ss.Next() {
var name string
var labels []vm.LabelPair
@@ -123,7 +124,7 @@ func (pp *prometheusProcessor) do(b tsdb.BlockReader) error {
var timestamps []int64
var values []float64
it := series.Iterator()
it = series.Iterator(it)
for {
typ := it.Next()
if typ == chunkenc.ValNone {

View File

@@ -1,6 +1,7 @@
package main
import (
"context"
"os"
"testing"
"time"
@@ -58,7 +59,7 @@ func Test_prometheusProcessor_run(t *testing.T) {
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}
@@ -95,7 +96,7 @@ func Test_prometheusProcessor_run(t *testing.T) {
return client
},
im: func(vmCfg vm.Config) *vm.Importer {
importer, err := vm.NewImporter(vmCfg)
importer, err := vm.NewImporter(context.Background(), vmCfg)
if err != nil {
t.Fatalf("error init importer: %s", err)
}

View File

@@ -0,0 +1,319 @@
package main
import (
"context"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/remoteread"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/stepper"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/testdata/servers_integration_test"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/prometheus/prometheus/prompb"
)
func TestRemoteRead(t *testing.T) {
var testCases = []struct {
name string
remoteReadConfig remoteread.Config
vmCfg vm.Config
start string
end string
numOfSamples int64
numOfSeries int64
rrp remoteReadProcessor
chunk string
remoteReadSeries func(start, end, numOfSeries, numOfSamples int64) []*prompb.TimeSeries
expectedSeries []vm.TimeSeries
}{
{
name: "step minute on minute time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*"},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-11-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMinute,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{0, 0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{100, 100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{200, 200},
},
},
},
{
name: "step month on month time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*"},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-09-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMonth,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1664184185000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1664184185000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1664184185000},
Values: []float64{200},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1666819415000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1666819415000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1666819415000},
Values: []float64{200}},
},
},
}
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
ctx := context.Background()
remoteReadServer := remote_read_integration.NewRemoteReadServer(t)
defer remoteReadServer.Close()
remoteWriteServer := remote_read_integration.NewRemoteWriteServer(t)
defer remoteWriteServer.Close()
tt.remoteReadConfig.Addr = remoteReadServer.URL()
rr, err := remoteread.NewClient(tt.remoteReadConfig)
if err != nil {
t.Fatalf("error create remote read client: %s", err)
}
start, err := time.Parse(time.RFC3339, tt.start)
if err != nil {
t.Fatalf("Error parse start time: %s", err)
}
end, err := time.Parse(time.RFC3339, tt.end)
if err != nil {
t.Fatalf("Error parse end time: %s", err)
}
rrs := tt.remoteReadSeries(start.Unix(), end.Unix(), tt.numOfSeries, tt.numOfSamples)
remoteReadServer.SetRemoteReadSeries(rrs)
remoteWriteServer.ExpectedSeries(tt.expectedSeries)
tt.vmCfg.Addr = remoteWriteServer.URL()
importer, err := vm.NewImporter(ctx, tt.vmCfg)
if err != nil {
t.Fatalf("failed to create VM importer: %s", err)
}
defer importer.Close()
rmp := remoteReadProcessor{
src: rr,
dst: importer,
filter: remoteReadFilter{
timeStart: &start,
timeEnd: &end,
chunk: tt.chunk,
},
cc: 1,
}
err = rmp.run(ctx, true, false)
if err != nil {
t.Fatalf("failed to run remote read processor: %s", err)
}
})
}
}
func TestSteamRemoteRead(t *testing.T) {
var testCases = []struct {
name string
remoteReadConfig remoteread.Config
vmCfg vm.Config
start string
end string
numOfSamples int64
numOfSeries int64
rrp remoteReadProcessor
chunk string
remoteReadSeries func(start, end, numOfSeries, numOfSamples int64) []*prompb.TimeSeries
expectedSeries []vm.TimeSeries
}{
{
name: "step minute on minute time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*", UseStream: true},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-11-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMinute,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{0, 0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{100, 100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{200, 200},
},
},
},
{
name: "step month on month time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*", UseStream: true},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-09-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMonth,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1664184185000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1664184185000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1664184185000},
Values: []float64{200},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1666819415000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1666819415000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1666819415000},
Values: []float64{200}},
},
},
}
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
ctx := context.Background()
remoteReadServer := remote_read_integration.NewRemoteReadStreamServer(t)
defer remoteReadServer.Close()
remoteWriteServer := remote_read_integration.NewRemoteWriteServer(t)
defer remoteWriteServer.Close()
tt.remoteReadConfig.Addr = remoteReadServer.URL()
rr, err := remoteread.NewClient(tt.remoteReadConfig)
if err != nil {
t.Fatalf("error create remote read client: %s", err)
}
start, err := time.Parse(time.RFC3339, tt.start)
if err != nil {
t.Fatalf("Error parse start time: %s", err)
}
end, err := time.Parse(time.RFC3339, tt.end)
if err != nil {
t.Fatalf("Error parse end time: %s", err)
}
rrs := tt.remoteReadSeries(start.Unix(), end.Unix(), tt.numOfSeries, tt.numOfSamples)
remoteReadServer.InitMockStorage(rrs)
remoteWriteServer.ExpectedSeries(tt.expectedSeries)
tt.vmCfg.Addr = remoteWriteServer.URL()
importer, err := vm.NewImporter(ctx, tt.vmCfg)
if err != nil {
t.Fatalf("failed to create VM importer: %s", err)
}
defer importer.Close()
rmp := remoteReadProcessor{
src: rr,
dst: importer,
filter: remoteReadFilter{
timeStart: &start,
timeEnd: &end,
chunk: tt.chunk,
},
cc: 1,
}
err = rmp.run(ctx, true, false)
if err != nil {
t.Fatalf("failed to run remote read processor: %s", err)
}
})
}
}

View File

@@ -10,6 +10,7 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/gogo/protobuf/proto"
@@ -60,6 +61,8 @@ type Config struct {
// LabelName, LabelValue stands for label=~value pair used for read requests.
// Is optional.
LabelName, LabelValue string
// TLSSkipVerify defines whether to skip TLS certificate verification when connecting to the remote read address.
InsecureSkipVerify bool
}
// Filter defines a list of filters applied to requested data
@@ -100,7 +103,7 @@ func NewClient(cfg Config) (*Client, error) {
c := &Client{
c: &http.Client{
Timeout: cfg.Timeout,
Transport: http.DefaultTransport.(*http.Transport).Clone(),
Transport: utils.Transport(cfg.Addr, cfg.InsecureSkipVerify),
},
addr: strings.TrimSuffix(cfg.Addr, "/"),
user: cfg.Username,
@@ -154,7 +157,7 @@ func (c *Client) do(req *http.Request) (*http.Response, error) {
// Ping checks the health of the read source
func (c *Client) Ping() error {
url := c.addr + healthPath
req, err := http.NewRequest("GET", url, nil)
req, err := http.NewRequest(http.MethodGet, url, nil)
if err != nil {
return fmt.Errorf("cannot create request to %q: %s", url, err)
}
@@ -171,7 +174,7 @@ func (c *Client) Ping() error {
func (c *Client) fetch(ctx context.Context, data []byte, streamCb StreamCallback) error {
r := bytes.NewReader(data)
url := c.addr + remoteReadPath
req, err := http.NewRequest("POST", url, r)
req, err := http.NewRequest(http.MethodPost, url, r)
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}

View File

@@ -0,0 +1,368 @@
package remote_read_integration
import (
"context"
"fmt"
"io"
"net/http"
"net/http/httptest"
"strconv"
"strings"
"testing"
"github.com/gogo/protobuf/proto"
"github.com/golang/snappy"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/prompb"
"github.com/prometheus/prometheus/storage/remote"
"github.com/prometheus/prometheus/tsdb/chunks"
)
const (
maxBytesInFrame = 1024 * 1024
)
type RemoteReadServer struct {
server *httptest.Server
series []*prompb.TimeSeries
storage *MockStorage
}
// NewRemoteReadServer creates a remote read server. It exposes a single endpoint and responds with the
// passed series based on the request to the read endpoint. It returns a server which should be closed after
// being used.
func NewRemoteReadServer(t *testing.T) *RemoteReadServer {
rrs := &RemoteReadServer{
series: make([]*prompb.TimeSeries, 0),
}
rrs.server = httptest.NewServer(rrs.getReadHandler(t))
return rrs
}
// Close closes the server.
func (rrs *RemoteReadServer) Close() {
rrs.server.Close()
}
func (rrs *RemoteReadServer) URL() string {
return rrs.server.URL
}
func (rrs *RemoteReadServer) SetRemoteReadSeries(series []*prompb.TimeSeries) {
rrs.series = append(rrs.series, series...)
}
func (rrs *RemoteReadServer) getReadHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !validateReadHeaders(t, r) {
t.Fatalf("invalid read headers")
}
compressed, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("error read body: %s", err)
}
reqBuf, err := snappy.Decode(nil, compressed)
if err != nil {
t.Fatalf("error decode compressed data:%s", err)
}
var req prompb.ReadRequest
if err := proto.Unmarshal(reqBuf, &req); err != nil {
t.Fatalf("error unmarshal read request: %s", err)
}
resp := &prompb.ReadResponse{
Results: make([]*prompb.QueryResult, len(req.Queries)),
}
for i, r := range req.Queries {
startTs := r.StartTimestampMs
endTs := r.EndTimestampMs
ts := make([]*prompb.TimeSeries, len(rrs.series))
for i, s := range rrs.series {
var samples []prompb.Sample
for _, sample := range s.Samples {
if sample.Timestamp >= startTs && sample.Timestamp < endTs {
samples = append(samples, sample)
}
}
var series prompb.TimeSeries
if len(samples) > 0 {
series.Labels = s.Labels
series.Samples = samples
}
ts[i] = &series
}
resp.Results[i] = &prompb.QueryResult{Timeseries: ts}
data, err := proto.Marshal(resp)
if err != nil {
t.Fatalf("error marshal response: %s", err)
}
compressed = snappy.Encode(nil, data)
w.Header().Set("Content-Type", "application/x-protobuf")
w.Header().Set("Content-Encoding", "snappy")
w.WriteHeader(http.StatusOK)
if _, err := w.Write(compressed); err != nil {
t.Fatalf("snappy encode error: %s", err)
}
}
})
}
func NewRemoteReadStreamServer(t *testing.T) *RemoteReadServer {
rrs := &RemoteReadServer{
series: make([]*prompb.TimeSeries, 0),
}
rrs.server = httptest.NewServer(rrs.getStreamReadHandler(t))
return rrs
}
func (rrs *RemoteReadServer) InitMockStorage(series []*prompb.TimeSeries) {
rrs.storage = NewMockStorage(series)
}
func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !validateStreamReadHeaders(t, r) {
t.Fatalf("invalid read headers")
}
f, ok := w.(http.Flusher)
if !ok {
t.Fatalf("internal http.ResponseWriter does not implement http.Flusher interface")
}
stream := remote.NewChunkedWriter(w, f)
data, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("error read body: %s", err)
}
decodedData, err := snappy.Decode(nil, data)
if err != nil {
t.Fatalf("error decode compressed data:%s", err)
}
var req prompb.ReadRequest
if err := proto.Unmarshal(decodedData, &req); err != nil {
t.Fatalf("error unmarshal read request: %s", err)
}
var chks []prompb.Chunk
ctx := context.Background()
for idx, r := range req.Queries {
startTs := r.StartTimestampMs
endTs := r.EndTimestampMs
var matchers []*labels.Matcher
cb := func() (int64, error) { return 0, nil }
c := remote.NewSampleAndChunkQueryableClient(rrs.storage, nil, matchers, true, cb)
q, err := c.ChunkQuerier(ctx, startTs, endTs)
if err != nil {
t.Fatalf("error init chunk querier: %s", err)
}
ss := q.Select(false, nil, matchers...)
var iter chunks.Iterator
for ss.Next() {
series := ss.At()
iter = series.Iterator(iter)
labels := remote.MergeLabels(labelsToLabelsProto(series.Labels()), nil)
frameBytesLeft := maxBytesInFrame
for _, lb := range labels {
frameBytesLeft -= lb.Size()
}
isNext := iter.Next()
for isNext {
chunk := iter.At()
if chunk.Chunk == nil {
t.Fatalf("error found not populated chunk returned by SeriesSet at ref: %v", chunk.Ref)
}
chks = append(chks, prompb.Chunk{
MinTimeMs: chunk.MinTime,
MaxTimeMs: chunk.MaxTime,
Type: prompb.Chunk_Encoding(chunk.Chunk.Encoding()),
Data: chunk.Chunk.Bytes(),
})
frameBytesLeft -= chks[len(chks)-1].Size()
// We are fine with minor inaccuracy of max bytes per frame. The inaccuracy will be max of full chunk size.
isNext = iter.Next()
if frameBytesLeft > 0 && isNext {
continue
}
resp := &prompb.ChunkedReadResponse{
ChunkedSeries: []*prompb.ChunkedSeries{
{Labels: labels, Chunks: chks},
},
QueryIndex: int64(idx),
}
b, err := proto.Marshal(resp)
if err != nil {
t.Fatalf("error marshal response: %s", err)
}
if _, err := stream.Write(b); err != nil {
t.Fatalf("error write to stream: %s", err)
}
chks = chks[:0]
rrs.storage.Reset()
}
if err := iter.Err(); err != nil {
t.Fatalf("error iterate over chunk series: %s", err)
}
}
}
})
}
func validateReadHeaders(t *testing.T, r *http.Request) bool {
if r.Method != http.MethodPost {
t.Fatalf("got %q method, expected %q", r.Method, http.MethodPost)
}
if r.Header.Get("Content-Encoding") != "snappy" {
t.Fatalf("got %q content encoding header, expected %q", r.Header.Get("Content-Encoding"), "snappy")
}
if r.Header.Get("Content-Type") != "application/x-protobuf" {
t.Fatalf("got %q content type header, expected %q", r.Header.Get("Content-Type"), "application/x-protobuf")
}
remoteReadVersion := r.Header.Get("X-Prometheus-Remote-Read-Version")
if remoteReadVersion == "" {
t.Fatalf("got empty prometheus remote read header")
}
if !strings.HasPrefix(remoteReadVersion, "0.1.") {
t.Fatalf("wrong remote version defined")
}
return true
}
func validateStreamReadHeaders(t *testing.T, r *http.Request) bool {
if r.Method != http.MethodPost {
t.Fatalf("got %q method, expected %q", r.Method, http.MethodPost)
}
if r.Header.Get("Content-Encoding") != "snappy" {
t.Fatalf("got %q content encoding header, expected %q", r.Header.Get("Content-Encoding"), "snappy")
}
if r.Header.Get("Content-Type") != "application/x-streamed-protobuf; proto=prometheus.ChunkedReadResponse" {
t.Fatalf("got %q content type header, expected %q", r.Header.Get("Content-Type"), "application/x-streamed-protobuf; proto=prometheus.ChunkedReadResponse")
}
remoteReadVersion := r.Header.Get("X-Prometheus-Remote-Read-Version")
if remoteReadVersion == "" {
t.Fatalf("got empty prometheus remote read header")
}
if !strings.HasPrefix(remoteReadVersion, "0.1.") {
t.Fatalf("wrong remote version defined")
}
return true
}
func GenerateRemoteReadSeries(start, end, numOfSeries, numOfSamples int64) []*prompb.TimeSeries {
var ts []*prompb.TimeSeries
j := 0
for i := 0; i < int(numOfSeries); i++ {
if i%3 == 0 {
j++
}
timeSeries := prompb.TimeSeries{
Labels: []prompb.Label{
{Name: labels.MetricName, Value: fmt.Sprintf("vm_metric_%d", j)},
{Name: "job", Value: strconv.Itoa(i)},
},
}
ts = append(ts, &timeSeries)
}
for i := range ts {
ts[i].Samples = generateRemoteReadSamples(i, start, end, numOfSamples)
}
return ts
}
func generateRemoteReadSamples(idx int, startTime, endTime, numOfSamples int64) []prompb.Sample {
samples := make([]prompb.Sample, 0)
delta := (endTime - startTime) / numOfSamples
t := startTime
for t != endTime {
v := 100 * int64(idx)
samples = append(samples, prompb.Sample{
Timestamp: t * 1000,
Value: float64(v),
})
t = t + delta
}
return samples
}
type MockStorage struct {
query *prompb.Query
store []*prompb.TimeSeries
}
func NewMockStorage(series []*prompb.TimeSeries) *MockStorage {
return &MockStorage{store: series}
}
func (ms *MockStorage) Read(_ context.Context, query *prompb.Query) (*prompb.QueryResult, error) {
if ms.query != nil {
return nil, fmt.Errorf("expected only one call to remote client got: %v", query)
}
ms.query = query
q := &prompb.QueryResult{Timeseries: make([]*prompb.TimeSeries, 0, len(ms.store))}
for _, s := range ms.store {
var samples []prompb.Sample
for _, sample := range s.Samples {
if sample.Timestamp >= query.StartTimestampMs && sample.Timestamp < query.EndTimestampMs {
samples = append(samples, sample)
}
}
var series prompb.TimeSeries
if len(samples) > 0 {
series.Labels = s.Labels
series.Samples = samples
}
q.Timeseries = append(q.Timeseries, &series)
}
return q, nil
}
func (ms *MockStorage) Reset() {
ms.query = nil
}
func labelsToLabelsProto(labels labels.Labels) []prompb.Label {
result := make([]prompb.Label, 0, len(labels))
for _, l := range labels {
result = append(result, prompb.Label{
Name: l.Name,
Value: l.Value,
})
}
return result
}

View File

@@ -0,0 +1,86 @@
package remote_read_integration
import (
"bufio"
"net/http"
"net/http/httptest"
"reflect"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
)
type RemoteWriteServer struct {
server *httptest.Server
series []vm.TimeSeries
}
// NewRemoteWriteServer prepares test remote write server
func NewRemoteWriteServer(t *testing.T) *RemoteWriteServer {
rws := &RemoteWriteServer{series: make([]vm.TimeSeries, 0)}
mux := http.NewServeMux()
mux.Handle("/api/v1/import", rws.getWriteHandler(t))
mux.Handle("/health", rws.handlePing())
rws.server = httptest.NewServer(mux)
return rws
}
// Close closes the server.
func (rws *RemoteWriteServer) Close() {
rws.server.Close()
}
func (rws *RemoteWriteServer) ExpectedSeries(series []vm.TimeSeries) {
rws.series = append(rws.series, series...)
}
func (rws *RemoteWriteServer) URL() string {
return rws.server.URL
}
func (rws *RemoteWriteServer) getWriteHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var tss []vm.TimeSeries
scanner := bufio.NewScanner(r.Body)
var rows parser.Rows
for scanner.Scan() {
rows.Unmarshal(scanner.Text())
for _, row := range rows.Rows {
var labelPairs []vm.LabelPair
var ts vm.TimeSeries
nameValue := ""
for _, tag := range row.Tags {
if string(tag.Key) == "__name__" {
nameValue = string(tag.Value)
continue
}
labelPairs = append(labelPairs, vm.LabelPair{Name: string(tag.Key), Value: string(tag.Value)})
}
ts.Values = append(ts.Values, row.Values...)
ts.Timestamps = append(ts.Timestamps, row.Timestamps...)
ts.Name = nameValue
ts.LabelPairs = labelPairs
tss = append(tss, ts)
}
rows.Reset()
}
if !reflect.DeepEqual(tss, rws.series) {
w.WriteHeader(http.StatusInternalServerError)
t.Fatalf("datasets not equal, expected: %#v; \n got: %#v", rws.series, tss)
return
}
w.WriteHeader(http.StatusNoContent)
})
}
func (rws *RemoteWriteServer) handlePing() http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("OK"))
})
}

Some files were not shown because too many files have changed in this diff Show More