Compare commits

...

204 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
9785b3f57f docs/CHANGELOG.md: cut v1.86.2 2023-01-18 12:01:07 -08:00
Aliaksandr Valialkin
9c62391a5c .github/workflows: remove obsolete make targets: install-goling and install-errcheck
These targets became obsolete after ec2c82e800
2023-01-18 11:48:29 -08:00
Max Golionko
59b97f26c0 CI: split js and go codeql, split test and build, enable matrix for test (#3670)
* split js and go codeql, split test and build, enable matrix for test

* checkout before go setup

* enable build for PRs as well

* update filter
2023-01-18 11:42:27 -08:00
Aliaksandr Valialkin
19e28ce7b6 docs/CHANGELOG.md: document 777038fe44
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3672
2023-01-18 11:39:48 -08:00
Tobias Jungel
777038fe44 app/vmbackup: prevent password leaks (#3672)
This prevents vmbackup from leaking passwords into logs like shown below.

2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:12   build version: vmbackup-20221214-211706-tags-v1.85.1-0-g09a70d3e9
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:13   command-line flags
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:20     -dst="fs:///vm-backups/latest"
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:20     -snapshot.createURL="http://user:super_sercret123@victoriametricspshot/create"
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/lib/logger/flag.go:20     -storageDataPath="/storage"
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/app/vmbackup/main.go:53 Snapshot create url http://user:super_sercret123@victoriametrics:8428/snapshot/create
2023-01-11T15:00:01.050Z        info    VictoriaMetrics/app/vmbackup/main.go:60 Snapshot delete url http://user:super_sercret123@victoriametrics:8428/snapshot/delete
2023-01-18 11:35:21 -08:00
Aliaksandr Valialkin
e867df5ef5 app/vmui: increase perceived performance by 2.5x by reducing the delay before the query execution from 0.8s to 0.3s
The delay cannot be removed, since it is used for limiting the rate of queries sent to VictoriaMetrics during graph scrolling.
2023-01-18 01:33:48 -08:00
Aliaksandr Valialkin
2ac530eb28 lib/{storage,mergeset}: wake up background merges as soon as there is a potential work for them
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
2023-01-18 01:10:18 -08:00
Aliaksandr Valialkin
b8409d6600 lib/{storage,mergeset}: do not run assisted merges when flushing pending samples to parts
Assisted merges are intended to be performed by goroutines, which accept the incoming samples,
in order to limit the data ingestion rate.

The worker, which converts pending samples to parts, shouldn't be penalized by assisted merges,
since this may result in increased number of pending rows as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647#issuecomment-1385039142
when the assisted merge takes too much time.
2023-01-18 00:20:58 -08:00
Aliaksandr Valialkin
1ac025bbc9 lib/storage: use better naming for a function returning new []rawRows - newRawRowsBlock() -> newRawRows() 2023-01-18 00:01:03 -08:00
Aliaksandr Valialkin
0c625185cb app/vmselect/promql: updates tests for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3664 2023-01-17 23:25:45 -08:00
Aliaksandr Valialkin
68463c9e87 lib/promscrape: follow-up for d79f1b106c
- Document the fix at docs/CHANGELOG.md
- Limit the concurrency for sendStaleMarkers() function in order to limit its memory usage
  when big number of targets disappear and staleness markers are sent
  for all the metrics exposed by these targets.
- Make sure that the writeRequestCtx is returned to the pool
  when there is no need to send staleness markers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3668
2023-01-17 23:11:56 -08:00
lzfhust
d79f1b106c using writeRequestCtxPool when delete kubernetes clusters from kubernetes_sd_configs (#3669) 2023-01-17 22:57:56 -08:00
Zakhar Bessarab
322d96bfe5 discovery/{consul,nomad}: fix cancelling serviceWatcher in-flight requests (#3658)
* lib/promscrape/discovery/{consul,nomad}: fix background service update watches not canceling requests on serviceWatcher stop

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/{consul,nomad}: fix closing serviseWatcher during scrape job restart

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-17 21:47:11 -08:00
Scott Kevill
46b3b76d6d lib/fs: use unix.Statfs() / unix.Statvfs() when using a path (#3663) 2023-01-17 21:19:26 -08:00
Roman Khavronenko
5cb8ce8174 ci: disable JS codeQL check (#3659)
We have limited amount of time used by Github CI runners
and JS analysis accounts for a half of it.
Since JS represents only a small fraction of the codebase
and is solely maintained by one person - I suggest to disable
the CodeQL check in order to save CI runners time.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-17 21:05:27 -08:00
Yury Molodov
fcef2ff6b2 vmui: correctly display range results in Table view (#3657)
* fix: properly display range results

* fix: set range values to empty array

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-17 21:03:28 -08:00
Aliaksandr Valialkin
1ffa793322 docs/CHANGELOG.md: fix the link to feature request implemented at e58921aa8f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3571
2023-01-17 20:27:43 -08:00
Yury Molodov
e58921aa8f vmui: give more visually different colors to graph lines (#3656)
* feat: make more different colors of graph lines

* docs/CHANGELOG.md: give more visually different colors to graph lines
2023-01-17 20:25:37 -08:00
Aliaksandr Valialkin
c94020f7dc app/vmui: do not round the number in formatPrettyNumber() if the range isn't set 2023-01-17 19:50:44 -08:00
Aliaksandr Valialkin
006af394ff vendor: update github.com/VictoriaMetrics/metricsql from v0.51.1 to v0.51.2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3664
2023-01-17 11:26:41 -08:00
Aliaksandr Valialkin
289af65071 lib/promscrape: properly apply series limit
Fix the following issues:

- Series limit wasn't applied when staleness tracking was disabled.
- Series limit didn't prevent from sending staleness markers for new series exceeding the limit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Thanks to @hagen1778 for the initial attempt to fix the issue
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3665
2023-01-17 10:14:49 -08:00
Aliaksandr Valialkin
e06168f489 docs/CHANGELOG.md: document the 74f196c37813a18c2c8f28831696ed7d9f535604 2023-01-17 09:10:41 -08:00
Aliaksandr Valialkin
09d7fa2737 lib/{mergeset,storage}: do not slow down concurrently executed queries during assisted merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2023-01-16 14:31:52 -08:00
Aliaksandr Valialkin
094ae82df5 vendor: make vendor-update 2023-01-15 14:16:34 -08:00
Aliaksandr Valialkin
092e2c8f2d github.com/VictoriaMetrics/metrics: update from v1.23.0 to v1.23.1
See https://github.com/VictoriaMetrics/metrics/issues/42
2023-01-15 14:06:43 -08:00
Yury Molodov
04c408e986 vmui: make the step input field global across all the tabs and views (#3644)
* feat: make the step input field global

* fix: correct get step from url

* fix: set minimumSignificantDigits to 1

* app/vmselect/vmui: `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-15 13:47:08 -08:00
Aliaksandr Valialkin
cdb6d651e9 app/vminsert: return 200 OK status code when importing data in pushgateway format
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1415
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
2023-01-15 13:29:53 -08:00
Aliaksandr Valialkin
207a62a3c2 app/vmselect/promql: reduce memory allocations when searching for time series pairs with identical labelsets in q1 op q2 queries
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
2023-01-15 13:03:23 -08:00
Aliaksandr Valialkin
27afe7bc38 app/vmselect/promql: reduce the number of memory allocations inside getCommonLabelFilters()
This should improve performance a bit for `q1 op q2` queries
2023-01-15 13:03:23 -08:00
Denys Holius
ffe6e6fe59 guides/guide-delete-or-replace-metrics.md: fixed wrong curl command (#3652) 2023-01-15 13:02:52 -08:00
Yury Molodov
7fd82c0d3a feat: make nav menu as links (#3646) 2023-01-13 09:45:07 +01:00
Corporte Gadfly
f32a79189b docs: fix typo (#3648) 2023-01-13 09:39:11 +01:00
Aliaksandr Valialkin
be8fba9b6a app/vmselect/netstorage: tune the number of blocks per series which should be unpacked by a single goroutine instead of spinning up multiple goroutines
This reduces overhead on time series data unpacking for typical cases,
this reducing CPU usage at vmselect
2023-01-12 09:31:44 -08:00
Aliaksandr Valialkin
98de06ff38 docs/CHANGELOG.md: update the description of the change at 20f28eb9d6 2023-01-12 08:58:52 -08:00
Nikolay
20f28eb9d6 /lib/promscrape: use correct err logger for scrape unmarshalling (#3645)
/lib/promscrape: use correct err logger for scrape unmarshalling
It correctly suppresses scrape errors and adds correct context for err msg
2023-01-12 17:40:42 +01:00
Roman Khavronenko
ec7c3f45ba dashboards: bump operator dash to v9 of Grafana (#3642)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-12 16:31:26 +01:00
Aliaksandr Valialkin
7067e8206c app/vmselect/promql: reduce memory allocations at getCommonLabelFilters() function
Intern tag keys and values there
2023-01-12 01:27:41 -08:00
Aliaksandr Valialkin
e2498af530 lib/promscrape: log the number of unsuccessful scrapes during the last -promscrape.suppressScrapeErrorsDelay
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3413
Thanks to @jelmd for the pull request.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2575
2023-01-12 01:09:32 -08:00
Aliaksandr Valialkin
9f5b5708ff app/vmselect: handle the /custom-dashboards request from /graph/ page in the same way as from the /vmui/ page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3322
2023-01-11 23:41:04 -08:00
Aliaksandr Valialkin
9fdd1a10c6 docs/Cluster-VictoriaMetrics.md: mention the -vmui.customDashboardsPath command-line flag at vmselect
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3322
2023-01-11 23:39:55 -08:00
Aliaksandr Valialkin
a194982117 app/vmselect: follow-up after 820312a2b1
- Move the feature description at the correct place at docs/CHANGELOG.md
- Run `make vmui-update`
- Various cosmetic fixes

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3322
2023-01-11 23:28:00 -08:00
Dmytro Kozlov
820312a2b1 app/vmui: define custom path for dashboards json file (#3545)
* app/vmui: define custom path for dashboards json file

* app/vmui: remove unneeded code

* app/vmui: move handler to own file, fix show dashboards,

* app/vmui: move flag to handler, add flag description

* app/vmauth: fix part of the comments

* feat: add store for dashboards

* fix: prevent fetch dashboards for app mode

* app/vmauth: use simple cache for predefined dashboards

* app/vmauth: update dashboards doc

* app/vmauth: fix ci

* app/vmui: decrease timeout

* app/vmselect: removed cache, fix comments

* app/vmselect: remove unused const

* app/vmselect: fix error log, use slice byte instead of struct

Co-authored-by: Yury Moladau <yurymolodov@gmail.com>
2023-01-11 23:06:07 -08:00
Aliaksandr Valialkin
ec23ab6bc2 lib/promscrape/discovery: missing changes after b4ad3a3b4c 2023-01-11 23:02:45 -08:00
Dmytro Kozlov
20af29294e app/vmctl: add remote read protocol integration tests (#3626) 2023-01-11 23:00:10 -08:00
Aliaksandr Valialkin
b4ad3a3b4c lib/promscrape: follow-up for 8537533beb
- Add a comment describing the purpose of the `role` field inside `apiConfig` struct
- Revert changes at lib/promscrape/discovery/dockerswarm/dockerswarm.go ,
  since they reduce code readability. E.g. the reader needs to look up the named string constants
  in order to get their values.
2023-01-11 22:54:18 -08:00
Zakhar Bessarab
8537533beb lib/promscrape/discovery/dockerswarm: fix discovery filters being applied to all objects (#3632)
* lib/promscrape/discovery/dockerswarm: fix discovery filters being applied to all objects

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-11 22:50:34 -08:00
Yury Molodov
5c8bb029b5 vmui: small changes on explore metrics page (#3634)
* fix: change issue link

* fix: remove legend toggle

* fix: move select graph size

* feat: save url params on explore metrics page

* wip

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-11 22:16:10 -08:00
Artem Navoiev
13a1a7f826 fix operator docs hugo schema
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-11 17:28:34 +01:00
Artem Navoiev
83c87f822a docs: prepare operator docs to migration
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-11 18:19:40 +02:00
Artem Navoiev
54aec2b1ba docs: operator *.MD -> *.md
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-11 16:29:29 +01:00
Roman Khavronenko
b3a70b8284 dasbhoards: fix the tooltip info for 1.86 (#3628)
See c63755c316 (diff-bba263a473e7fbc9d0fde075ebef6b3d4e32c322ee1210a3e07182292c7723aaR18)

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-11 11:30:12 +01:00
Aliaksandr Valialkin
351fc152e0 docs/CHANGELOG.md: cut v1.86.1 2023-01-11 01:26:43 -08:00
Aliaksandr Valialkin
975bb8722f lib/vmselectapi: properly calculate query timeout
vmselect passes query timeout to vmstorage in seconds.
The commit 20e9598254 treated it as timeout in nanoseconds.
Fix this in order to prevent from the following errors under vmstorage load:

cannot process vmselect request: cannot execute "search_v7": couldn't start executing the request in 0.000 seconds,
since -search.maxConcurrentRequests=... concurrent requests are already executed.
2023-01-11 01:22:33 -08:00
Roman Khavronenko
f73953a049 app/vmselect: fix typo (#3629)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-11 01:09:47 -08:00
Aliaksandr Valialkin
dff47c73b7 app/vmselect: improve logging when the incoming query cannot be executed because of timeout in the wait queue 2023-01-11 01:06:05 -08:00
Aliaksandr Valialkin
65b9dcfcca app/vmselect/promql: typo fix after 0771d57860 2023-01-11 01:05:31 -08:00
Aliaksandr Valialkin
0771d57860 app/vmselect/promql: make a copy of per-series timestamps before their modification
The per-series timestamps are usually shared among series, so it is unsafe modifying them.

The issue has been appeared after the optimization at 2f3ddd4884
2023-01-11 00:59:13 -08:00
Aliaksandr Valialkin
31fc29599f app/vmselect/promql: move the eval function args in parallel query trace outside the loop 2023-01-10 22:23:30 -08:00
Aliaksandr Valialkin
a0f9cb27f9 docs/CHANGELOG.md: document v1.79.7 LTS release 2023-01-10 21:24:28 -08:00
Aliaksandr Valialkin
4faf7ea41e vendor: make vendor-update 2023-01-10 18:58:34 -08:00
Aliaksandr Valialkin
c449714c0a deployment/docker: update Go builder from v1.19.4 to v1.19.5
See https://github.com/golang/go/issues?q=milestone%3AGo1.19.5+label%3ACherryPickApproved
2023-01-10 18:43:04 -08:00
Aliaksandr Valialkin
37fe04999c docs/CHANGELOG.md: add release date for v1.86.0 2023-01-10 17:45:57 -08:00
Aliaksandr Valialkin
eda940106a deployment/docker: update Docker tag for VictoriaMetrics components from v1.85.3 to v1.86.0 2023-01-10 17:26:37 -08:00
Aliaksandr Valialkin
28df7c2a96 docs/CHANGELOG.md: cut v1.86.0 2023-01-10 16:22:26 -08:00
Aliaksandr Valialkin
2d294cca59 deployment/docker: update Alpine base image from v3.17.0 to v3.17.1
See https://alpinelinux.org/posts/Alpine-3.17.1-released.html
2023-01-10 16:14:40 -08:00
Aliaksandr Valialkin
95ce1ba6ce lib/httpserver: directly pass flag value to CheckAuthFlag()
There is no sense in passing a pointer to flag value there.

This is a follow-up for 4225a0bd75
2023-01-10 15:52:23 -08:00
Zakhar Bessarab
4225a0bd75 Use httpAuth.* flags as a fallback for endpoints protected by *AuthKey flags (#3582)
* {lib/server, app/}: use `httpAuth.*` flag as fallback for `*AuthKey` if it is not set

* lib/ingestserver/opentsdbhttp: fix opentdb HTTP handler not respecting `httpAuth.*` flags

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-10 15:46:13 -08:00
Dmytro Kozlov
0811000bb0 app/vmctl: Add insecure skip verify flag for remote read protocol (#3611)
* app/vmctl: Add insecure skip verify flag for remote read protocol
2023-01-10 23:18:49 +01:00
Artem Navoiev
37c52ccaf4 vmagent: add minimal scrape file exampe for vmagent quick start (#3627)
* vmagent: add minimal scrape file exampe for vmagent quick start

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* replace example with link to your prometheus.yml in docker

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-10 23:16:10 +01:00
Aliaksandr Valialkin
cbe62f23ba lib/promscrape/discovery/gce: follow-up for b2ccdaaa2f
- Use promutils.Labels.GetLabels() instead of comparing promutils.Labels.Labels to nil.
  This make the code more consistent with other places.

- Mention the release where the issue has been introduced at docs/CHANGELOG.md.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3624
2023-01-10 13:51:03 -08:00
Aliaksandr Valialkin
53d871d0b1 app/vmselect/netstorage: reduce tail latency during query processing
Previously the selected time series were split evenly among available CPU cores
for further processing - e.g unpacking the data and applying the given rollup
function to the unpacked data.
Some time series could be processed slower than others.
This could result in uneven work distribution among available CPU cores,
e.g. some CPU cores could complete their work sooner than others.
This could slow down query execution.

The new algorithm allows stealing time series to process from other CPU cores
when all the local work is done. This should reduce the maximum time
needed for query execution (aka tail latency).

The new algorithm should also scale better on systems with many CPU cores,
since every CPU processes locally assigned time series without inter-CPU communications.

The inter-CPU communications are used only when all the local work is finished
and the pending work from other CPUs needs to be stealed.
2023-01-10 13:43:14 -08:00
Zakhar Bessarab
b2ccdaaa2f lib/promscrape/discovery/gce: fix crash in case instance does not have any labels set (#3625)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-10 11:07:11 +01:00
Denys Holius
b06e795a1e docs/Release-Guide.md: added missed link to rpm repository (#3623) 2023-01-10 10:14:56 +01:00
Aliaksandr Valialkin
e640ff72f1 app/vmselect/netstorage: reduce memory allocations when unpacking time series
Unpack time series with less than 400K samples in the currently running goroutine.
Previously a new goroutine was being started for unpacking the samples.
This was requiring additional memory allocations.
2023-01-09 23:18:17 -08:00
Aliaksandr Valialkin
9a563a6aef app/vmselect/promql: eliminate memory allocation when sorting values inside float64s 2023-01-09 23:06:46 -08:00
Aliaksandr Valialkin
30ed33fae0 app/vmselect/promql: pre-allocate memory for values to be merged in mergeTimeseries()
This should reduce the number of memory re-allocations
2023-01-09 22:51:17 -08:00
Aliaksandr Valialkin
645c24dc5f app/vmselect/promql: consistently intern series names obtained from marshalMetricNameSorted
This reduces memory allocations when the returned series names are used as map keys later
2023-01-09 22:45:40 -08:00
Aliaksandr Valialkin
2f3ddd4884 app/vmselect/promql: avoid memory allocations and copying from source timeseries to the returned result at timeseriesToResult() 2023-01-09 22:38:59 -08:00
Aliaksandr Valialkin
26cf680468 app/vmselect/promql: remove memory allocations from sortMetricTags() 2023-01-09 22:22:15 -08:00
Aliaksandr Valialkin
4f0c11ee93 app/vmselect/promql: intern output series names inside timeseriesToResult()
This reduces the number of memory allocations for repeated queries,
which return (almost) the same set of time series.
2023-01-09 22:19:56 -08:00
Aliaksandr Valialkin
562d6bca08 app/vmselect/promql: intern output series names during normal aggregation 2023-01-09 22:15:24 -08:00
Aliaksandr Valialkin
21ee9a1fab app/vmselect/promql: intern output series names during incremental aggregation
This should reduce the number of memory allocations for repeated queries
2023-01-09 22:11:36 -08:00
Aliaksandr Valialkin
df2a494a7c app/vmselect/netstorage: pre-allocate 4 block references per each time series during querying
Usually the number of blocks returned per each time series during queries is around 4.
So it is a good idea to pre-allocate 4 block references per time series
in order to reduce the number of memory allocations.
2023-01-09 22:03:23 -08:00
Aliaksandr Valialkin
c5e0f527bc app/vmselect/netstorage: cache canonical MetricName for time series returned from the storage
This reduces memory allocations for repeated queries, which return (almost) the same set of time series.
2023-01-09 21:53:10 -08:00
Aliaksandr Valialkin
7afcca0c51 all: use metricsql.CompileRegexp instead of regexp.Compile for compiling regexps used in graphite queries
This should speed up repeated queries, since metricsql.CompileRegexp returns regexps from the cache
on subsequent calls for the same input regexp.
2023-01-09 21:43:08 -08:00
Aliaksandr Valialkin
67ab49baa9 vendor: make vendor-update 2023-01-09 21:34:34 -08:00
Aliaksandr Valialkin
e5eca54951 lib/promscrape/discovery/nomad: sync nomad_sd_configs fields with the Prometheus implementation
See the list of configs supported by Prometheus at f88a0a7d83/discovery/nomad/nomad.go (L76-L84)

- Removed "token" option. In can be set either via NOMAD_TOKEN env var or via `bearer_token` config option.
- Removed "scheme" option. It is automatically detected depending on whether the `tls_config` is set.
- Removed "services" and "tags" options, since they aren't supported by Prometheus.
- Added "region" option. If it is missing, then the region is read from NOMAD_REGION env var.
  If this var is empty, then it is set to "global" in the same way as Nomad client does.
  See 865ee8d37c/api/api.go (L297)
  and 865ee8d37c/api/api.go (L555-L556)
- If the "server" option is missing, then it is read from NOMAD_ADDR in the same way
  as Nomad client does - see 865ee8d37c/api/api.go (L294-L296)

This is a follow-up for 8aee209c53

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367
2023-01-09 21:14:48 -08:00
Aliaksandr Valialkin
c38a10e143 app/vmselect/netstorage: eliminate memory allocation for sortBlocksHeap arg when calling mergeSortBlocks() 2023-01-09 21:08:51 -08:00
Aliaksandr Valialkin
1f9d605988 app/vmselect/netstorage: consistently select the sample with the biggest value out of samples with identical timestamps
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3333

This fix is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3620 ,
but doesn't slow down the common case with merging replicated data blocks so significantly.

Benchmark results:

Before the change:

BenchmarkMergeSortBlocks/replicationFactor-1-4         	   13968	     85643 ns/op	 956.53 MB/s	    1700 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-2-4         	   10806	    109171 ns/op	1500.77 MB/s	    2191 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-3-4         	    8887	    130623 ns/op	1881.45 MB/s	    2660 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-4-4         	    7440	    157348 ns/op	2082.52 MB/s	    3174 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-5-4         	    6534	    184473 ns/op	2220.38 MB/s	    3612 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4  	   13419	     85205 ns/op	 961.44 MB/s	    2213 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 	     579	   1894900 ns/op	  43.23 MB/s	   46760 B/op	       1 allocs/op

After the change:

BenchmarkMergeSortBlocks/replicationFactor-1-4         	   13832	     85298 ns/op	 960.40 MB/s	    1716 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-2-4         	    8833	    134222 ns/op	1220.66 MB/s	    2675 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-3-4         	    6487	    184830 ns/op	1329.65 MB/s	    3636 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-4-4         	    4977	    236318 ns/op	1386.61 MB/s	    4733 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/replicationFactor-5-4         	    4088	    296734 ns/op	1380.36 MB/s	    5761 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-bestcase-4  	   14083	     84067 ns/op	 974.47 MB/s	    2110 B/op	       1 allocs/op
BenchmarkMergeSortBlocks/overlapped-blocks-worstcase-4 	     536	   2043534 ns/op	  40.09 MB/s	   50511 B/op	       1 allocs/op
2023-01-09 13:01:48 -08:00
Denys Holius
fe0e199859 deployment/docker: update Alertmanager tag from v0.24.0 to v0.25.0 in docker-compose files (#3619)
deployment/docker: bump alertmanager to latest v0.25.0
2023-01-09 12:37:14 +01:00
Roman Khavronenko
8aee209c53 lib/promscrape: remove datacenter field from nomad_sd_config (#3612)
Looks like `datacenter` field isn't part of `/v1/services` API.
See https://developer.hashicorp.com/nomad/api-docs/services#list-services
and https://developer.hashicorp.com/nomad/api-docs/services#read-service

Related issues:
https://github.com/traefik/traefik/issues/9109
https://github.com/prometheus/prometheus/issues/11776

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-09 09:07:40 +01:00
Aliaksandr Valialkin
28f8dc41b0 lib/promscrape/discoveryutils: cleanup after 5df9fddaf2
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468
2023-01-07 01:26:54 -08:00
Zakhar Bessarab
5df9fddaf2 lib/promscrape/discoveryutils: use correct timeout for blocking requests (#3609)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-07 01:13:03 -08:00
Aliaksandr Valialkin
41e00a0df7 lib/storage: simplify the fix from 488940502c
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3566
2023-01-07 01:04:43 -08:00
Dmytro Kozlov
488940502c lib/storage: fix returning camelcase label names (#3608)
* lib/storage: fix returning camelcase label names

* doc: add change log

* Update docs/CHANGELOG.md

* Update docs/CHANGELOG.md

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-07 00:50:14 -08:00
Aliaksandr Valialkin
5fe7ff24c2 lib/streamaggr: limit the the number of concurrent flushes of the aggregate data to the exact number of available CPUs
This should reduce the maximum memory usage during concurrent flushes of the aggregate data
2023-01-07 00:18:51 -08:00
Aliaksandr Valialkin
ad5bfe3089 lib/promscrape: reduce the number of concurrently executed processScrapedData calls from 2x of the number of CPUs to the number of CPUs
This should reduce the maximum memory usage for processScrapedData() function by 2x.
The only part, which can be IO-bound in the processScrapedData() is pushData() call,
when it buffers data to persistent queue if the remote storage cannot keep up
with the data ingestion speed. In this case it is OK if the scrape pace will be limited.
2023-01-07 00:14:30 -08:00
Aliaksandr Valialkin
af263fe881 all: small improvements in error messages and command-line flag descriptions related to concurrency limiters 2023-01-07 00:11:44 -08:00
Aliaksandr Valialkin
45f39e291e lib/writeconcurrencylimiter: moved the error generation from incConcurrency() to the caller place 2023-01-06 23:45:58 -08:00
Aliaksandr Valialkin
986a05e18d lib/promscrape: limit the concurrency during parsing and relabeling the scraped samples
This should reduce memory usage when scraping big number of targets,
since this limits the summary memory usage during concurrent parsing and relabeling
by the number of available CPU cores.
2023-01-06 22:59:17 -08:00
Aliaksandr Valialkin
293e4dc77b app/{vminsert,vmstorage}: add comments on why storage.AddRows() is called without limiting the number of concurrent calls 2023-01-06 22:40:07 -08:00
Aliaksandr Valialkin
5c4bd4f7c1 lib/streamaggr: limit the number of concurrent flushes of aggregate metrics in order to limit memory usage 2023-01-06 22:39:13 -08:00
Aliaksandr Valialkin
c63755c316 lib/writeconcurrencylimiter: improve the logic behind -maxConcurrentInserts limit
Previously the -maxConcurrentInserts was limiting the number of established client connections,
which write data to VictoriaMetrics. Some of these connections could be idle.
Such connections do not consume big amounts of CPU and RAM, so there is a little sense in limiting
the number of such connections. So now the -maxConcurrentInserts command-line option
limits the number of concurrently executed insert requests, not including idle connections.

It is recommended removing -maxConcurrentInserts command-line option, since the default value
for this option should work good for most cases.
2023-01-06 22:20:19 -08:00
Aliaksandr Valialkin
f299d2ca1a lib/vmselectapi: limit the number of concurrently executed requests
This should prevent from out of memory errors when big number of vmselect
nodes send many concurrent requests to vmstorage

The limit can be controlled at vmstorage via the following command-line flags:
- search.maxConcurrentRequests
- search.maxQueueDuration

See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#resource-usage-limits
2023-01-06 22:11:34 -08:00
Aliaksandr Valialkin
e7637885a6 app/vmselect: improve error message when the request cannot be started because too many concurrent requests are already executed 2023-01-06 22:10:42 -08:00
Aliaksandr Valialkin
463b957e54 lib/promscrape/discovery/{consul,nomad}: wait until the deleted serviceWatchers are stopped inside updateServices() call
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367
2023-01-05 21:52:33 -08:00
Aliaksandr Valialkin
f392913d00 lib/promscrape: follow-up after bced9fb978
- Document the bugfix at docs/CHANGELOG.md
- Wait until all the worker goroutines are done in consulWatcher.mustStop()
- Do not log `context canceled` errors when discovering consul serviceNames
- Removed explicit handling of gzipped responses at lib/promscrape/discoveryutils.Client,
  since this handling is automatically performed by net/http.Transport.
  See DisableCompression option at https://pkg.go.dev/net/http#Transport .
- Remove explicit handling of the proxyURL, since it is automatically handled
  by net/http.Transport. See Proxy option at https://pkg.go.dev/net/http#Transport .
- Expliticly set MaxIdleConnsPerHost, since its default value equals to 2.
  Such a small value may result in excess tcp connection churn
  when more than 2 concurrent requests are processed by lib/promscrape/discoveryutils.Client.
- Do not set explicitly the `Host` request header, since it is automatically set by net/http.Client.
- Backport the bugfix to the recently added nomad_sd_configs - see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3468
2023-01-05 21:13:06 -08:00
Zakhar Bessarab
bced9fb978 lib/promscrape/discoveryutils: switch to native http client from fasthttp (#3568) 2023-01-05 19:34:47 -08:00
Roman Khavronenko
5bdd880142 vmstorage: add more context to the flock acquiring msg (#3584)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3578

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-05 18:30:42 -08:00
Aliaksandr Valialkin
9f348cf8a1 lib/promscrape/discovery/nomad: follow-up after 48f371a46c
- Remove undocumented `username` and `password` config options from `nomad_sd_config`.
  TODO: probably, remove these options from `consul_sd_config` too?
  These options exist there for backwards compatibility purposes.

- Add __meta_nomad_service_alloc_id and __meta_nomad_service_job_id meta-labels
  These labels contain AllocID and JobID fields for the discovered Nomad services.

- Various typo fixes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3367
2023-01-05 18:07:20 -08:00
Aliaksandr Valialkin
cad8553c01 Makefile: remove trailing space after golangci-lint run command
It is left after ec2c82e800
2023-01-05 16:59:07 -08:00
Aliaksandr Valialkin
1a28f0e5b3 lib/promrelabel: pass query args via query string at /metric-relabel-debug and /target-relabel-debug pages if their length doesnt exceed 1000
This allows copy-n-pasting the url to another browser window and seeing the same result.

The limit in 1000 chars is selected in order to prevent from potential issues with systems
which limit the url length such as Internet Explorer - see https://stackoverflow.com/questions/812925/what-is-the-maximum-possible-length-of-a-query-string

If the limit is exceeded, then query args are sent via POST method and aren't visible in the url.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3580
2023-01-05 16:48:04 -08:00
Karan Sharma
48f371a46c lib/promscrape: add Prometheus-compatible service discovery for Nomad (#3549)
Add nomad_sd_config support for service discovery
2023-01-05 23:03:58 +01:00
Denys Holius
043b28c725 .github/workflows/nightly-build.yml: added dockerhub login (#3594) 2023-01-05 16:54:14 +01:00
Luke Palmer
ec2c82e800 Lint and errcheck using golangci-lint (#3558) 2023-01-05 16:12:46 +01:00
Zakhar Bessarab
01bc0c94ab doc: add vmbackupmanager monitoring section (#3605)
* doc: add vmbackupmanager monitoring section

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-05 16:03:06 +01:00
Thomas Danielsson
9d1104d812 dashboards: fix operator datasource variable (#3604)
Got "Failed to upgrade legacy queries Datasource $ds was not found" in
Grafana on operator dashboard.
It's datasource variable was incorrectly named `datasource`.

Also made the rest of the dashboards have homogeneous datasource-variable
names and selections, matching vmagent dashboard.
2023-01-05 14:59:56 +01:00
Artem Navoiev
8b763175ff Add Understand Your Setup Size Guide (#3572)
docs: add Understand Your Setup Size Guide

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-05 14:56:50 +01:00
Aliaksandr Valialkin
2ee81a5dbb docs/CHANGELOG.md: add missing dot 2023-01-05 03:35:02 -08:00
Zakhar Bessarab
185cdcd813 lib/promscrape/discovery/dockerswarm: fix query encoding of filters (#3586)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-05 03:34:25 -08:00
Aliaksandr Valialkin
0dea3b71da lib/promscrape: pre-fetch metric_relabel_configs rules when debugging metric relabeling for a particular target
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3407
2023-01-05 03:26:49 -08:00
Aliaksandr Valialkin
a1076abcbf lib/promscrape: follow-up for a7e29c38bc
- Document the bugfix at docs/CHANGELOG.md
- Make the fix more durable against future changes when droppedTargetsMap.Register may be called from other places.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3580
2023-01-05 02:52:08 -08:00
Zakhar Bessarab
a7e29c38bc lib/promscrape/targetstatus: fix crash during droppedTarget registration (#3595)
* lib/promscrape/targetstatus: fix crash during droppedTarget registration in case original labels are not present

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/targetstatus: address review comment

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-01-05 02:39:31 -08:00
Yury Molodov
2460e0f51e vmui: improve Explore metrics (#3598)
* feat: add multiple select

* feat: improve explore interface

* app/vmselect/vmui: `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-01-05 02:23:04 -08:00
Aliaksandr Valialkin
0e1f0ade31 lib/streamaggr: sort by and without labels in the aggregate output metric name
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-05 02:08:44 -08:00
Aliaksandr Valialkin
04dff34de4 vendor: update github.com/VictoriaMetrics/metricsql from v0.50.0 to v0.51.0
Updates https://github.com/VictoriaMetrics/metricsql/pull/7
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3589
2023-01-05 01:50:10 -08:00
Aliaksandr Valialkin
66947ee5a2 lib/streamaggr: remove unused fields 2023-01-04 13:33:46 -08:00
Roman Khavronenko
9d0e1f8e68 dashboards: add backupmanager dashboard (#3599)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-04 17:26:15 +01:00
Roman Khavronenko
63bf583b3c github: rm plaintext render (#3597)
`render: plain text` makes fields not formattable and prevents
 from pasting screenshots.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-01-04 14:48:17 +01:00
Denys Holius
09fe346d18 docs/Release-Guide.md: adds release scenario for RPM LTS packages (#3588) 2023-01-04 14:36:14 +01:00
Zakhar Bessarab
59f20c1034 github: use github templates for filling in feature requests or bug reports (#3587)
github: use github templates for filling in feature requests or bug reports
2023-01-04 14:34:19 +01:00
Artem Navoiev
0ff85d00a4 update year in License
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-04 09:46:38 +01:00
Aliaksandr Valialkin
fafece1af8 vendor: make vendor-update 2023-01-03 23:36:42 -08:00
Aliaksandr Valialkin
5bca3a5be2 app/vmselect: remove dependency on lib/promscrape from app/vmselect 2023-01-03 23:28:27 -08:00
Aliaksandr Valialkin
fd175ad80b docs: update -help outputs for vm* tools 2023-01-03 23:27:06 -08:00
Aliaksandr Valialkin
fa13bbc48a app/{vmagent,vminsert}: add support for streaming aggregation
See https://docs.victoriametrics.com/stream-aggregation.html

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3460
2023-01-03 22:19:21 -08:00
Aliaksandr Valialkin
add2c4bf07 lib/bytesutil: add InternBytes() function as a shortcut to InternString(ToUnsafeString(..)) 2023-01-03 22:16:22 -08:00
Aliaksandr Valialkin
f33e687723 app/vmagent/remotewrite: improve descriptions for -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig
- Cross-reference these command-line flags.
- Add a link to https://docs.victoriametrics.com/vmagent.html#relabeling
2023-01-03 22:04:49 -08:00
Aliaksandr Valialkin
d3a1c36842 docs/Single-server-VictoriaMetrics.md: impromve formatting for prominent features chapter 2023-01-03 21:58:21 -08:00
Aliaksandr Valialkin
6db5c3801e app/vmbackup: remove superflouos whitespace after 8fe21ec707 2023-01-03 21:52:40 -08:00
Aliaksandr Valialkin
189f85b24c docs/vmbackupmanager.md: run make docs-sync after fa842d6534 2023-01-03 21:50:28 -08:00
Aliaksandr Valialkin
7b264b0c23 lib/promrelabel: allow calling Match on nil IfExpression
This simplifies the caller side of IfExpression
2023-01-03 21:44:03 -08:00
yanggang
8fe21ec707 Fix flag help message for the backups types. (#3577)
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-01-03 10:56:42 +01:00
yanggang
fa842d6534 fix typo for json values. (#3576)
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-01-03 10:55:38 +01:00
yanggang
93e935bcaa Fix vmctl command hint for vm-native-step-interval (#3575)
Signed-off-by: yanggang <gang.yang@daocloud.io>
2023-01-03 10:54:53 +01:00
Artem Navoiev
0a519c93ef run checks only for master/cluster branches (#3581)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-01-03 11:08:44 +04:00
Roman Khavronenko
4b3d8eb573 vmalert: mention specifics of Alertmanager HA mode (#3573)
Stress the importance of specifying of all Alertmanager
URLs in vmalert's `-notifier.url` or `notifier.config`
if it runs in cluster mode.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3547

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-30 09:54:03 +01:00
Aliaksandr Valialkin
05fa11b296 app/vmui: reuse the series color in line tooltip 2022-12-29 15:32:41 -08:00
Aliaksandr Valialkin
1794f3d46e app/vmui: small usability improvements
- Show in the line tooltip the number of the query which generates the given line.
  This simplifies comparison of lines generated by multiple queries.

- Show metric name as __name__ label in the line tooltip in the same way as other labels are shown there.
  This makes the label information in the tooltip more consistent.

- Properly quote label values with JSON.stringify(). This prevents from improper formatting
  when label values contain doublequote chars.

- Remove double curly braces artifact at graph legend for lines without names and labels.

- Properly use modifier for regular expressions across the code.
2022-12-29 14:52:51 -08:00
Aliaksandr Valialkin
59e1e84a92 docs/CHANGELOG.md: document 1720bddb4f
Updats https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3530
2022-12-29 12:31:48 -08:00
Aliaksandr Valialkin
b8bc62431a app/vmselect/vmui: make vmui-update after 1720bddb4f 2022-12-29 12:20:06 -08:00
Yury Molodov
1720bddb4f fix: display correct graph tooltip title (#3562) 2022-12-29 12:06:48 -08:00
Roman Khavronenko
2cedb3e883 csvimport: support empty values (#3565)
Before, if the imported line contained multiple metrics and one
or more of them had an empty values - the whole line was ignored.

Now, only metrics with empty values are ignored, and the rest
of the metrics are accepted successfully.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3540

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-12-29 11:52:10 -08:00
Roman Khavronenko
83870aeb8d tests: attempt to fix flaky graphite test (#3567)
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-29 11:48:47 -08:00
Aliaksandr Valialkin
7af8857b68 docs/CHANGELOG.md: add a link to the docs and the pull request for update_entries_limit option in vmalert
This is a follow up for 6588fcbfca
2022-12-29 10:38:26 -08:00
Aliaksandr Valialkin
c90752a8be vendor: update github.com/valyala/fastjson/fastfloat from v1.6.3 to v1.6.4
This should properly parse floating-point numbers with missing integer or fractional parts.
For example, 123. or .123

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3544
2022-12-29 10:34:11 -08:00
ChenyuanHu
0a9e1d64fe app/vmselect/prometheus: no need manually call queryDuration.UpdateDuration (#3564)
There is no need to manually call `queryDuration.UpdateDuration(startTime)`, because `defer queryDuration.UpdateDuration(startTime)` is executed at the beginning of the function(L660).
2022-12-29 14:18:00 +01:00
Roman Khavronenko
6588fcbfca vmalert: allow configuring the default number of stored rule's update states (#3556)
Allow configuring the default number of stored rule's update states in memory
 via global `-rule.updateEntriesLimit` command-line flag or per-rule via rule's
 `update_entries_limit` configuration param.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-29 12:36:44 +01:00
Aliaksandr Valialkin
3dc684634e app/vmselect/searchutils: accept partial RFC3339 values at time, start and end query args
This simplifies manual usage of the APIs. For example, the following query
would return the results over the 2022 year.

  /api/v1/query_range?start=2022&end=2023&step=1d&query=...

This is equivalent to:

  /api/v1/query_range?start=2022-01-01T00:00:00Z&end=2023-01-01T00:00:00Z&step=1d&query=...
2022-12-28 19:41:54 -08:00
Yury Molodov
d2f89b55b7 vmui: fix step field (#3561)
* feat: use a unit next to the step value

* app/vmselect/vmui: `make vmui-update`

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-12-28 16:00:51 -08:00
Aliaksandr Valialkin
3d082ed6db vendor: make vendor-update 2022-12-28 15:00:02 -08:00
Aliaksandr Valialkin
c4229a1bba lib/promscrape: log the actual response size in the error message when the response size exceeds -promscrape.maxScrapeSize
This is a follow-up for 7ad9fff7e5
2022-12-28 14:42:11 -08:00
Aliaksandr Valialkin
1b16118e17 lib/{storage,mergeset}: tune the threshold for assisted merge
The https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425#issuecomment-1359117221
reveals that CPU usage for incoming queries may significantly increase when the number
of in-memory parts becomes too big.

This commit reduces the maximum number of in-memory parts before starting the assisted merge
during data ingestion. This should reduce CPU usage for incoming queries,
since they need to inspect lower number of in-memory parts.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
2022-12-28 14:39:24 -08:00
Clément Nussbaumer
7ad9fff7e5 fix(promscrape): check MaxScrapeSize after gzip decompression (#3550) 2022-12-28 12:19:41 -08:00
Aliaksandr Valialkin
293dda7169 lib/snapshot: improve log message on unexpected status code during attempts to create or delete snapshots
Use "unexpected status code returned from %q: %d; expecting %d" log message format
instead of less clear format "unexpected status code returned from %q; expecting %d; got %d"

This is a follow-up for c612bb165e
2022-12-28 11:41:50 -08:00
Aliaksandr Valialkin
ed9161a04f docs: remove a case study for Dreamteam.gg, since it looks like the company no longer exists 2022-12-28 11:34:40 -08:00
Zakhar Bessarab
c612bb165e lib/snapshot: fix error message format for failed HTTP request (#3559) 2022-12-28 18:04:11 +01:00
Artem Navoiev
34c705988e fix broken links
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2022-12-27 13:17:22 +02:00
Artem Navoiev
7d9c4bebc0 update links to grafana dashboards (#3534)
docs: update links to grafana dashboards

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2022-12-25 17:36:20 +01:00
Aliaksandr Valialkin
b11c806d1c app/vmui: show min, max and avg lines at Explore metrics graphs when instance is selected in the same way as when only the job is selected
This improves consistency of the graphs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-23 23:21:11 -08:00
Aliaksandr Valialkin
2a7e392bb3 app/vmselect/vmui: make vmui-update after 0dca224ec3 2022-12-23 22:25:04 -08:00
Aliaksandr Valialkin
0dca224ec3 app/vmui: small improvements for header panel
- Rename `Custom panel` tab to more clear `Query` tab
- Rename `Cardinality` tab to `Explore cardinality`, so it becomes consistent with `Explore metrics` tab
- Move `Dashboards` tab to the end, since it isn't used too much
2022-12-23 22:24:31 -08:00
Aliaksandr Valialkin
8ef1fe2047 app/vmui: move the Explore metrics tab closer to Custom panel tab
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-23 22:16:21 -08:00
Aliaksandr Valialkin
f599e1bd34 app/vmui: show less lines at metrics explorer when the instance isn't selected
Show min, max and avg graphs across instances for the selected job.
This should improve usability of such a graphs when the job contains many instances.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-23 22:15:01 -08:00
Aliaksandr Valialkin
4deea604bf app/vmui: follow-up after f6d31f5216
- Document the feature at docs/CHANGELOG.md.
- Document the metrics explorer at https://docs.victoriametrics.com/#metrics-explorer .
- Properly set `start` and `end` args for the selected time range
  when performing the request, which returns metric names.
- Improve queries, so they return lower number of lines and labels.
  This should improve metrics' exploration.
- Properly encode label filters and query args before passing them to VictoriaMetrics.
- Various cosmetic fixes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3386
2022-12-22 17:17:01 -08:00
Yury Molodov
f6d31f5216 vmui: add explore tab for exploration of metrics, which belong to a particular job/instance (#3470)
* feat: add "Explore" page

* feat: add graphs for explore page

* vmui: add explore tab for exploration of metrics, which belong to a particular job/instance

* refactor: rename variables

* refactor: extract graph to ExploreMetricItemGraph.tsx

* feat: add searchable for Select.tsx

* feat: improve metrics explorer

* feat: set document title by page

* feat: add page to view icons

* fix: improve styles

* fix: add encodeURIComponent to query
2022-12-22 15:24:40 -08:00
Aliaksandr Valialkin
f6ac045933 docs/CHANGELOG.md: document 1df87807fd
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3513
2022-12-22 15:22:09 -08:00
Yury Molodov
1df87807fd vmui: step (#3521)
* feat: add step rounding

* fix: change step in URL parameters

* refactor: change comment for roundStep
2022-12-22 14:54:28 -08:00
Aliaksandr Valialkin
0076422350 lib/promscrape/discovery/azure: typo fix 2022-12-21 21:25:16 -08:00
Aliaksandr Valialkin
f1441a598f app/vmselect/promql: add tests for d3de110070 2022-12-21 20:25:21 -08:00
Aliaksandr Valialkin
fa236c5a84 lib/promrelabel: make fmt after d3de110070 2022-12-21 20:24:57 -08:00
Aliaksandr Valialkin
d3de110070 app/vmselect/promql: make sure that label_replace() doesn't create an empty dst_label if the src_label doesn't match regex 2022-12-21 20:20:01 -08:00
Aliaksandr Valialkin
31886aef3d lib/promrelabel: add support for keepequal and dropequal relabeling actions
These actions are supported by Prometheus starting from v2.41.0

See https://github.com/prometheus/prometheus/pull/11564 ,
https://github.com/prometheus/prometheus/issues/11556
and https://github.com/prometheus/prometheus/issues/3756

Side note:

It's a pity that Prometheus developers decided inventing `keepequal` and `dropequal`
relabeling actions instead of adding support for `keep_if_equal` and `drop_if_equal` relabeling
actions supported by VictoriaMetrics since June 2020 - see 2a39ba639d .
2022-12-21 20:04:55 -08:00
Aliaksandr Valialkin
3300546eab lib/bytesutil: make sure that the cleanup code is performed only by a single goroutine out of many concurrently running goroutines
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3466
2022-12-21 13:07:24 -08:00
Yury Molodov
0f8ffc7df9 vmui: fix change timezone (#3519)
vmui: fix time picker with changed time zone
2022-12-21 17:22:37 +01:00
Aliaksandr Valialkin
192db0f0b1 deployment/docker: update VictoriaMetrics tag from v1.85.2 to v1.85.3 in docker-compose files 2022-12-20 15:34:23 -08:00
Aliaksandr Valialkin
f443fad56d docs/CHANGELOG.md: cut v1.85.3 2022-12-20 14:51:23 -08:00
Aliaksandr Valialkin
77874d6055 app/vmselect/vmui: make vmui-update after 731d189fa9 2022-12-20 14:51:07 -08:00
Roman Khavronenko
7ca49290b6 docs: fix the image link (#3509)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3495
Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2022-12-20 14:36:25 -08:00
Roman Khavronenko
e4e9dfb785 vmagent: respect -usePromCompatibleNaming if no relabeling is set (#3511)
* vmagent: respect `-usePromCompatibleNaming` if no relabeling is set

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3493

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmagent: upd test

Signed-off-by: hagen1778 <roman@victoriametrics.com>

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2022-12-20 14:32:45 -08:00
Yury Molodov
731d189fa9 fix: change the logic for hide query (#3514) 2022-12-20 14:29:46 -08:00
Zakhar Bessarab
4be4645142 app/vmbackupmanager: add metrics for better observability (#488)
* app/vmbackupmanager: add metrics for better observability, include more information to `/api/v1/backups` API call response

* app/vmbackupmanager: drop old metrics before creating new ones

* app/vmbackupmanager: use `_total` postfix for counter metrics

* app/vmbackupmanager: remove `_total` postfix for gauge-like metrics

* app/vmbackupmanager: add `_last_run_failed` metrics for backups and retention

* app/vmbackupmanager: address review feedback

* app/vmbackupmanager: fix metric name

* app/vmbackupmanager: address review feedback, remove background updates of metrics, add restoring state of `_last_run_failed` metric from remote storage

* app/vmbackupmanager: improve performance for backup size calculation

* app/vmbackupmanager: refactor backup and retention runs to deduplicate each run logic

* {app/vmbackupmanager,lib/formatutil}: move HumanizeBytes into lib package

* app/vmbackupmanager: fix creating new metrics instead of reusing existing ones

* lit/formatutil: add comment to make linter happy

* app/vmbackupmanager: address review feedback
2022-12-20 14:18:06 -08:00
Aliaksandr Valialkin
4e55b67a44 lib/storage: clear the err if it is set to io.EOF when searching for the TSID by metricID
This is expected error after when recently added indexdb data isn't available for search yet
or wasn't flushed to disk after unclean shutdown of VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3515
2022-12-20 14:05:29 -08:00
Aliaksandr Valialkin
8f5e822565 Makefile: update golangci-lint version from v1.48.0 to v1.50.1 2022-12-20 13:09:40 -08:00
Aliaksandr Valialkin
cad90c7ac1 Makefile: publish release docker images at DockerHub with the stable tag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2911
2022-12-20 12:27:06 -08:00
Aliaksandr Valialkin
a48510573e Revert "docs/Release-Guide.md: add LATEST_TAG=stable env var for make publish-release in order to create stable tag for the published components at DockerHub"
This reverts commit 8afa7ef837.
2022-12-20 12:24:27 -08:00
Aliaksandr Valialkin
8afa7ef837 docs/Release-Guide.md: add LATEST_TAG=stable env var for make publish-release in order to create stable tag for the published components at DockerHub
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2911
2022-12-20 12:22:42 -08:00
Aliaksandr Valialkin
1d7b5cb83c docs/CHANGELOG.md: document the change at 547f07463b29c09c62c9af35eac9cee6764b3286
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2612
2022-12-20 10:22:07 -08:00
Zakhar Bessarab
18e55d14c6 app/vmbackupmanager: update doc to include cluster to cluster restore example (#3506) 2022-12-20 14:54:56 +01:00
Roman Khavronenko
8122191368 docs: fix link typo in operator docs (#3508) 2022-12-20 14:52:47 +01:00
Aliaksandr Valialkin
6bf46c7bf5 docs/CHANGELOG.md: formatting fix 2022-12-20 01:06:34 -08:00
Aliaksandr Valialkin
9fa3f1dc57 app/vmselect/promql: do not extend too short lookbehind window for rate() function if it is set explicitly
Previously too short lookbehind window d for rate(m[d]) could be automatically extended
if it didn't cover at least two raw samples. This was needed in order to guarantee
non-empty results from rate(m[d]) on short time ranges.

Now the lookbehind window isn't extended if it is set explicitly,
since it is expected that the user knows what he is doing.

The lookbehind window continues to be extended when needed if it isn't set explicitly.
For example, in the case of rate(m).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3483
2022-12-20 00:18:20 -08:00
Aliaksandr Valialkin
eabb8762ee docs/Articles.md: add a link to https://www.youtube.com/watch?v=Mesc6JBFNhQ 2022-12-19 21:40:51 -08:00
Michal Kralik
fd53f86c84 build: fix issue with missing docker scan (#3501) 2022-12-19 15:22:45 -08:00
525 changed files with 20887 additions and 6584 deletions

View File

@@ -1,46 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
It would be great to [upgrade](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verify whether the bug is reproducible there.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html).
**To Reproduce**
Steps to reproduce the behavior.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Logs**
Check if any warnings or errors were logged by VictoriaMetrics components
or components in communication with VictoriaMetrics (e.g. Prometheus, Grafana).
**Screenshots**
If applicable, add screenshots to help explain your problem.
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
**Version**
The line returned when passing `--version` command line flag to the binary. For example:
```
$ ./victoria-metrics-prod --version
victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
**Used command-line flags**
Please provide the command-line flags used for running VictoriaMetrics and its components.

86
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View File

@@ -0,0 +1,86 @@
name: Bug report
description: Create a report to help us improve
labels: [bug]
body:
- type: markdown
attributes:
value: |
Before filling a bug report it would be great to [upgrade](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verify whether the bug is reproducible there.
It's also recommended to read the [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting.html) first.
- type: textarea
id: describe-the-bug
attributes:
label: Describe the bug
description: |
A clear and concise description of what the bug is.
placeholder: |
When I do `A` VictoriaMetrics does `B`. I expect it to do `C`.
validations:
required: true
- type: textarea
id: to-reproduce
attributes:
label: To Reproduce
description: |
Steps to reproduce the behavior.
If reproducing an issue requires some specific configuration file, please paste it here.
placeholder: |
Steps to reproduce the behavior.
validations:
required: true
- type: textarea
id: version
attributes:
label: Version
description: |
The line returned when passing `--version` command line flag to the binary. For example:
```
$ ./victoria-metrics-prod --version
victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
validations:
required: true
- type: textarea
id: logs
attributes:
label: Logs
description: |
Check if any warnings or errors were logged by VictoriaMetrics components
or components in communication with VictoriaMetrics (e.g. Prometheus, Grafana).
validations:
required: false
- type: textarea
id: screenshots
attributes:
label: Screenshots
description: |
If applicable, add screenshots to help explain your problem.
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [monitoring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
validations:
required: false
- type: textarea
id: flags
attributes:
label: Used command-line flags
description: |
Please provide the command-line flags used for running VictoriaMetrics and its components.
validations:
required: false
- type: textarea
id: additional-info
attributes:
label: Additional information
placeholder: |
Additional information that doesn't fit elsewhere
validations:
required: false

View File

@@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: Ask on Slack
url: https://slack.victoriametrics.com/
about: You can ask for help here!

View File

@@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@@ -0,0 +1,43 @@
name: Feature request
description: Suggest an idea for this project
labels: [enhancement]
body:
- type: textarea
id: describe-the-problem
attributes:
label: Is your feature request related to a problem? Please describe
description: |
A clear and concise description of what the problem is.
placeholder: |
Ex. I'm always frustrated when [...]
validations:
required: false
- type: textarea
id: describe-the-solution
attributes:
label: Describe the solution you'd like
description: |
A clear and concise description of what you want to happen.
validations:
required: true
- type: textarea
id: alternative-solutions
attributes:
label: Describe alternatives you've considered
description: |
A clear and concise description of any alternative solutions or features you've considered.
placeholder: |
I have tried to do `A`, but that doesn't solve a problem completely.
I have tried to do `A` and `B`, but implementing this would be better.
validations:
required: false
- type: textarea
id: feature-additional-info
attributes:
label: Additional information
description: |
Additional information which you consider helpful for implementing this feature.
placeholder: |
Add any other context or screenshots about the feature request here.
validations:
required: false

View File

@@ -17,7 +17,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.4
go-version: 1.19.5
id: go
- name: Code checkout
uses: actions/checkout@master

View File

@@ -0,0 +1,42 @@
name: "CodeQL - JS"
on:
push:
branches: [master, cluster]
paths:
- "**.js"
pull_request:
# The branches below must be a subset of the branches above
branches: [master, cluster]
paths:
- "**.js"
schedule:
- cron: "30 18 * * 2"
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ["javascript"]
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "javascript"

View File

@@ -13,12 +13,20 @@ name: "CodeQL"
on:
push:
branches: [ master, cluster ]
branches: [master, cluster]
paths-ignore:
- "**.md"
- "**.txt"
- "**.js"
pull_request:
# The branches below must be a subset of the branches above
branches: [ master, cluster ]
branches: [master, cluster]
paths-ignore:
- "**.md"
- "**.txt"
- "**.js"
schedule:
- cron: '30 18 * * 2'
- cron: "30 18 * * 2"
jobs:
analyze:
@@ -32,45 +40,47 @@ jobs:
strategy:
fail-fast: false
matrix:
language: [ 'go', 'javascript' ]
language: ["go"]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://git.io/codeql-language-support
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.19
if: ${{ matrix.language == 'go' }}
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
if: ${{ matrix.language == 'go' }}
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

66
.github/workflows/main-test.yml vendored Normal file
View File

@@ -0,0 +1,66 @@
name: main - test
on:
push:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
permissions:
contents: read
jobs:
lint:
name: lint
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: Dependencies
run: |
make install-golangci-lint
make check-all
git diff --exit-code
test:
needs: lint
strategy:
matrix:
scenario: ["test-full", "test-pure", "test-full-386"]
name: test
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: run tests
run: |
make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt

View File

@@ -1,13 +1,10 @@
name: main
on:
push:
paths-ignore:
- 'docs/**'
- '**.md'
pull_request:
paths-ignore:
- 'docs/**'
- '**.md'
workflow_run:
workflows: ["main - test"]
types:
- completed
permissions:
contents: read
@@ -16,30 +13,17 @@ jobs:
name: Build
runs-on: ubuntu-latest
steps:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.4
id: go
- name: Code checkout
uses: actions/checkout@master
- name: Dependencies
run: |
make install-golint
make install-errcheck
make install-golangci-lint
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: Build
run: |
export PATH=$PATH:$(go env GOPATH)/bin # temporary fix. See https://github.com/actions/setup-go/issues/14
make check-all
git diff --exit-code
make test-full
make test-pure
make test-full-386
make victoria-metrics-crossbuild
make vmuitils-crossbuild
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt

View File

@@ -12,13 +12,28 @@ jobs:
name: Build
runs-on: ubuntu-latest
steps:
- name: Setup Go
-
name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.4
go-version: 1.19.5
id: go
- name: Code checkout
-
name: Setup docker scan
run: |
mkdir -p ~/.docker/cli-plugins && \
curl https://github.com/docker/scan-cli-plugin/releases/latest/download/docker-scan_linux_amd64 -L -s -S -o ~/.docker/cli-plugins/docker-scan &&\
chmod +x ~/.docker/cli-plugins/docker-scan
-
name: Code checkout
uses: actions/checkout@master
- name: Publish
-
name: Publish
run: |
LATEST_TAG=nightly PKG_TAG=nightly make publish

15
.golangci.yml Normal file
View File

@@ -0,0 +1,15 @@
run:
timeout: 2m
enable:
- revive
issues:
exclude-rules:
- linters:
- staticcheck
text: "SA(4003|1019|5011):"
linters-settings:
errcheck:
exclude: ./errcheck_excludes.txt

View File

@@ -175,7 +175,7 @@
END OF TERMS AND CONDITIONS
Copyright 2019-2022 VictoriaMetrics, Inc.
Copyright 2019-2023 VictoriaMetrics, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@@ -167,10 +167,10 @@ vmutils-crossbuild: \
vmutils-windows-amd64
publish-release:
git checkout $(TAG) && $(MAKE) release publish && \
git checkout $(TAG)-cluster && $(MAKE) release publish && \
git checkout $(TAG)-enterprise && $(MAKE) release publish && \
git checkout $(TAG)-enterprise-cluster && $(MAKE) release publish
git checkout $(TAG) && LATEST_TAG=stable $(MAKE) release publish && \
git checkout $(TAG)-cluster && LATEST_TAG=cluster-stable $(MAKE) release publish && \
git checkout $(TAG)-enterprise && LATEST_TAG=enterprise-stable $(MAKE) release publish && \
git checkout $(TAG)-enterprise-cluster && LATEST_TAG=enterprise-cluster-stable $(MAKE) release publish
release: \
release-victoria-metrics \
@@ -315,21 +315,7 @@ vet:
go vet ./lib/...
go vet ./app/...
lint: install-golint
golint lib/...
golint app/...
install-golint:
which golint || go install golang.org/x/lint/golint@latest
errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./lib/...
errcheck -exclude=errcheck_excludes.txt ./app/...
install-errcheck:
which errcheck || go install github.com/kisielk/errcheck@latest
check-all: fmt vet lint errcheck golangci-lint govulncheck
check-all: fmt vet golangci-lint govulncheck
test:
go test ./lib/... ./app/...
@@ -380,10 +366,10 @@ install-qtc:
golangci-lint: install-golangci-lint
golangci-lint run --exclude '(SA4003|SA1019|SA5011):' -D errcheck -D structcheck --timeout 2m
golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.48.0
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.50.1
govulncheck: install-govulncheck
govulncheck ./...

125
README.md
View File

@@ -40,19 +40,36 @@ VictoriaMetrics has the following prominent features:
* It can be used as a drop-in replacement for Prometheus in Grafana, because it supports [Prometheus querying API](#prometheus-querying-api-usage).
* It can be used as a drop-in replacement for Graphite in Grafana, because it supports [Graphite API](#graphite-api-usage).
* It features easy setup and operation:
* VictoriaMetrics consists of a single [small executable](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d) without external dependencies.
* VictoriaMetrics consists of a single [small executable](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d)
without external dependencies.
* All the configuration is done via explicit command-line flags with reasonable defaults.
* All the data is stored in a single directory pointed by `-storageDataPath` command-line flag.
* Easy and fast backups from [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) to S3 or GCS can be done with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) / [vmrestore](https://docs.victoriametrics.com/vmrestore.html) tools. See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
* It implements PromQL-based query language - [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which provides improved functionality on top of PromQL.
* Easy and fast backups from [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
can be done with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) / [vmrestore](https://docs.victoriametrics.com/vmrestore.html) tools.
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
* It implements PromQL-like query language - [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which provides improved functionality on top of PromQL.
* It provides global query view. Multiple Prometheus instances or any other data sources may ingest data into VictoriaMetrics. Later this data may be queried via a single query.
* It provides high performance and good vertical and horizontal scalability for both [data ingestion](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b) and [data querying](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4). It [outperforms InfluxDB and TimescaleDB by up to 20x](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).
* It [uses 10x less RAM than InfluxDB](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) and [up to 7x less RAM than Prometheus, Thanos or Cortex](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f) when dealing with millions of unique time series (aka [high cardinality](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality)).
* It provides high performance and good vertical and horizontal scalability for both
[data ingestion](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b)
and [data querying](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4).
It [outperforms InfluxDB and TimescaleDB by up to 20x](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).
* It [uses 10x less RAM than InfluxDB](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893)
and [up to 7x less RAM than Prometheus, Thanos or Cortex](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f)
when dealing with millions of unique time series (aka [high cardinality](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality)).
* It is optimized for time series with [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate).
* It provides high data compression, so [up to 70x more data points](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4) may be crammed into limited storage comparing to TimescaleDB and [up to 7x less storage space is required compared to Prometheus, Thanos or Cortex](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f).
* It is optimized for storage with high-latency IO and low IOPS (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc). See [disk IO graphs from these benchmarks](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b).
* A single-node VictoriaMetrics may substitute moderately sized clusters built with competing solutions such as Thanos, M3DB, Cortex, InfluxDB or TimescaleDB. See [vertical scalability benchmarks](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae), [comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683) and [Remote Write Storage Wars](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) talk from [PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/).
* It protects the storage from data corruption on unclean shutdown (i.e. OOM, hardware reset or `kill -9`) thanks to [the storage architecture](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
* It provides high data compression, so up to 70x more data points may be stored into limited storage comparing to TimescaleDB
according to [these benchmarks](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4)
and up to 7x less storage space is required compared to Prometheus, Thanos or Cortex
according to [this benchmark](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f).
* It is optimized for storage with high-latency IO and low IOPS (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc).
See [disk IO graphs from these benchmarks](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b).
* A single-node VictoriaMetrics may substitute moderately sized clusters built with competing solutions such as Thanos, M3DB, Cortex, InfluxDB or TimescaleDB.
See [vertical scalability benchmarks](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae),
[comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683)
and [Remote Write Storage Wars](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) talk
from [PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/).
* It protects the storage from data corruption on unclean shutdown (i.e. OOM, hardware reset or `kill -9`) thanks to
[the storage architecture](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
* It supports metrics' scraping, ingestion and [backfilling](#backfilling) via the following protocols:
* [Metrics scraping from Prometheus exporters](#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* [Prometheus remote write API](#prometheus-setup).
@@ -65,11 +82,15 @@ VictoriaMetrics has the following prominent features:
* [Arbitrary CSV data](#how-to-import-csv-data).
* [Native binary format](#how-to-import-data-in-native-format).
* [DataDog agent or DogStatsD](#how-to-send-data-from-datadog-agent).
* It supports powerful [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html), which can be used as a [statsd](https://github.com/statsd/statsd) alternative.
* It supports metrics [relabeling](#relabeling).
* It can deal with [high cardinality issues](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality) and [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) issues via [series limiter](#cardinality-limiter).
* It ideally works with big amounts of time series data from APM, Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data and various [Enterprise workloads](https://docs.victoriametrics.com/enterprise.html).
* It can deal with [high cardinality issues](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality) and
[high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) issues via [series limiter](#cardinality-limiter).
* It ideally works with big amounts of time series data from APM, Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data
and various [Enterprise workloads](https://docs.victoriametrics.com/enterprise.html).
* It has open source [cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
* It can store data on [NFS-based storages](https://en.wikipedia.org/wiki/Network_File_System) such as [Amazon EFS](https://aws.amazon.com/efs/) and [Google Filestore](https://cloud.google.com/filestore).
* It can store data on [NFS-based storages](https://en.wikipedia.org/wiki/Network_File_System) such as [Amazon EFS](https://aws.amazon.com/efs/)
and [Google Filestore](https://cloud.google.com/filestore).
See also [various Articles about VictoriaMetrics](https://docs.victoriametrics.com/Articles.html).
@@ -84,7 +105,6 @@ Case studies:
* [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch)
* [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern)
* [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl)
* [Dreamteam](https://docs.victoriametrics.com/CaseStudies.html#dreamteam)
* [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio)
* [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence)
* [Grammarly](https://docs.victoriametrics.com/CaseStudies.html#grammarly)
@@ -271,6 +291,8 @@ Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](
VictoriaMetrics provides UI for query troubleshooting and exploration. The UI is available at `http://victoriametrics:8428/vmui`.
The UI allows exploring query results via graphs and tables.
It also provides the following features:
- [metrics explorer](#metrics-explorer)
- [cardinality explorer](#cardinality-explorer)
- [query tracer](#query-tracing)
- [top queries explorer](#top-queries)
@@ -306,9 +328,21 @@ See the [example VMUI at VictoriaMetrics playground](https://play.victoriametric
* queries with the biggest average execution duration;
* queries that took the most summary time for execution.
## Metrics explorer
[VMUI](#vmui) provides an ability to explore metrics exported by a particular `job` / `instance` in the following way:
1. Open the `vmui` at `http://victoriametrics:8428/vmui/`.
2. Click the `Explore metrics` tab.
3. Select the `job` you want to explore.
4. Optionally select the `instance` for the selected job to explore.
5. Select metrics you want to explore and compare.
It is possible to change the selected time range for the graphs in the top right corner.
## Cardinality explorer
VictoriaMetrics provides an ability to explore time series cardinality at `cardinality` tab in [vmui](#vmui) in the following ways:
VictoriaMetrics provides an ability to explore time series cardinality at `Explore cardinality` tab in [vmui](#vmui) in the following ways:
- To identify metric names with the highest number of series.
- To identify labels with the highest number of series.
@@ -363,7 +397,7 @@ DataDog agent allows configuring destinations for metrics sending via ENV variab
or via [configuration file](https://docs.datadoghq.com/agent/guide/agent-configuration-files/) in section `dd_url`.
<p align="center">
<img src="Single-server-VictoriaMetrics-sending_DD_metrics_to_VM.png" width="800">
<img src="docs/Single-server-VictoriaMetrics-sending_DD_metrics_to_VM.png" width="800">
</p>
To configure DataDog agent via ENV variable add the following prefix:
@@ -397,7 +431,7 @@ DataDog allows configuring [Dual Shipping](https://docs.datadoghq.com/agent/guid
sending via ENV variable `DD_ADDITIONAL_ENDPOINTS` or via configuration file `additional_endpoints`.
<p align="center">
<img src="Single-server-VictoriaMetrics-sending_DD_metrics_to_VM_and_DD.png" width="800">
<img src="docs/Single-server-VictoriaMetrics-sending_DD_metrics_to_VM_and_DD.png" width="800">
</p>
Run DataDog using the following ENV variable with VictoriaMetrics as additional metrics receiver:
@@ -701,8 +735,7 @@ VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query
VictoriaMetrics accepts optional `extra_filters[]=series_selector` query arg, which can be used for enforcing arbitrary label filters for queries. For example,
`/api/v1/query_range?extra_filters[]={env=~"prod|staging",user="xyz"}&query=<query>` would automatically add `{env=~"prod|staging",user="xyz"}` label filters to the given `<query>`. This functionality can be used for limiting the scope of time series visible to the given tenant. It is expected that the `extra_filters[]` query args are automatically set by auth proxy sitting in front of VictoriaMetrics. See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
VictoriaMetrics accepts multiple formats for `time`, `start` and `end` query args - see [these docs](#timestamp-formats).
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
@@ -727,6 +760,18 @@ Additionally, VictoriaMetrics provides the following handlers:
For example, request to `/api/v1/status/top_queries?topN=5&maxLifetime=30s` would return up to 5 queries per list, which were executed during the last 30 seconds.
VictoriaMetrics tracks the last `-search.queryStats.lastQueriesCount` queries with durations at least `-search.queryStats.minQueryDuration`.
### Timestamp formats
VictoriaMetrics accepts the following formats for `time`, `start` and `end` query args
in [query APIs](https://docs.victoriametrics.com/#prometheus-querying-api-usage) and
in [export APIs](https://docs.victoriametrics.com/#how-to-export-time-series).
- Unix timestamps in seconds with optional milliseconds after the point. For example, `1562529662.678`.
- [RFC3339](https://www.ietf.org/rfc/rfc3339.txt). For example, '2022-03-29T01:02:03Z`.
- Partial RFC3339. Examples: `2022`, `2022-03`, `2022-03-29`, `2022-03-29T01`, `2022-03-29T01:02`.
- Relative duration comparing to the current time. For example, `1h5m` means `one hour and five minutes ago`.
## Graphite API usage
VictoriaMetrics supports data ingestion in Graphite protocol - see [these docs](#how-to-send-data-from-graphite-compatible-agents-such-as-statsd) for details.
@@ -946,8 +991,9 @@ Each JSON line contains samples for a single time series. An example output:
{"metric":{"__name__":"up","job":"prometheus","instance":"localhost:9090"},"values":[1,1,1],"timestamps":[1549891461511,1549891476511,1549891491511]}
```
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data. These args may contain either
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
@@ -994,8 +1040,9 @@ where:
* `<timeseries_selector_for_export>` may contain any [time series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors)
for metrics to export.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data. These args may contain either
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
@@ -1021,8 +1068,9 @@ wget -O- -q 'http://your_victoriametrics_instance:8428/api/v1/series/count' | jq
# relaunch victoriametrics with search.maxExportSeries more than value from previous command
```
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data. These args may contain either
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
Optional `start` and `end` args may be added to the request in order to limit the time frame for the exported data.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
@@ -1258,7 +1306,8 @@ VictoriaMetrics exports [Prometheus-compatible federation data](https://promethe
at `http://<victoriametrics-addr>:8428/federate?match[]=<timeseries_selector_for_federation>`.
Optional `start` and `end` args may be added to the request in order to scrape the last point for each selected time series on the `[start ... end]` interval.
`start` and `end` may contain either unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
@@ -1426,8 +1475,8 @@ This increases overhead during data querying, since VictoriaMetrics needs to rea
bigger number of parts per each request. That's why it is recommended to have at least 20%
of free disk space under directory pointed by `-storageDataPath` command-line flag.
Information about merging process is available in [the dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
and [the dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176).
Information about merging process is available in [the dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
and [the dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/).
See more details in [monitoring docs](#monitoring).
See [this article](https://valyala.medium.com/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) for more details.
@@ -1594,8 +1643,8 @@ Alternatively, single-node VictoriaMetrics can self-scrape the metrics when `-se
set to duration greater than 0. For example, `-selfScrapeInterval=10s` would enable self-scraping of `/metrics` page
with 10 seconds interval.
Official Grafana dashboards available for [single-node](https://grafana.com/dashboards/10229)
and [clustered](https://grafana.com/grafana/dashboards/11176) VictoriaMetrics.
Official Grafana dashboards available for [single-node](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
and [clustered](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/) VictoriaMetrics.
See an [alternative dashboard for clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11831)
created by community.
@@ -1844,8 +1893,8 @@ The following metrics for each type of cache are exported at [`/metrics` page](#
* `vm_cache_misses_total` - the number of cache misses
* `vm_cache_entries` - the number of entries in the cache
Both Grafana dashboards for [single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176)
Both Grafana dashboards for [single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics/)
and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
contain `Caches` section with cache metrics visualized. The panels show the current
memory usage by each type of cache, and also a cache hit rate. If hit rate is close to 100%
then cache efficiency is already very high and does not need any tuning.
@@ -2175,6 +2224,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -2184,7 +2235,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 33554432)
@@ -2281,6 +2332,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.minResponseSizeForStreamParse size
The minimum target response size for automatic switching to stream parsing mode, which can reduce memory usage. See https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1000000)
-promscrape.nomad.waitTime duration
Wait time used by Nomad service discovery. Default value is used if not set
-promscrape.nomadSDCheckInterval duration
Interval for checking for changes in Nomad. This works only if nomad_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#nomad_sd_configs for details (default 30s)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.openstackSDCheckInterval duration
@@ -2427,6 +2482,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 10000000)
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-streamAggr.config string
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html .See also -remoteWrite.streamAggr.keepInput
-streamAggr.keepInput
Whether to keep input samples after the aggregation with -streamAggr.config .By default the input is dropped after the aggregation, so only the aggregate data is stored. See https://docs.victoriametrics.com/stream-aggregation.html
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
@@ -2444,4 +2503,6 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Show VictoriaMetrics version
-vmalert.proxyURL string
Optional URL for proxying requests to vmalert. For example, if -vmalert.proxyURL=http://vmalert:8880 , then alerting API requests such as /api/v1/rules from Grafana will be proxied to http://vmalert:8880/api/v1/rules
-vmui.customDashboardsPath string
Optional path to vmui dashboards. See https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmui/packages/vmui/public/dashboards
```

View File

@@ -261,7 +261,7 @@ func testRead(t *testing.T) {
for _, q := range test.Query {
q = testutil.PopulateTimeTplString(q, insertionTime)
if test.Issue != "" {
test.Issue = "Regression in " + test.Issue
test.Issue = "\nRegression in " + test.Issue
}
switch true {
case strings.HasPrefix(q, "/api/v1/export"):
@@ -284,7 +284,7 @@ func testRead(t *testing.T) {
queryResult := Query{}
httpReadStruct(t, testReadHTTPPath, q, &queryResult)
if err := checkQueryResult(queryResult, test.ResultQuery); err != nil {
t.Fatalf("Query. %s fails with error %s.%s", q, err, test.Issue)
t.Fatalf("Query. %s fails with error: %s.%s", q, err, test.Issue)
}
default:
t.Fatalf("unsupported read query %s", q)

View File

@@ -12,7 +12,7 @@ func TestPopulateTimeTplString(t *testing.T) {
}
f := func(s, resultExpected string) {
t.Helper()
result := PopulateTimeTplString(s, now)
result := PopulateTimeTplString(s, now.UTC())
if result != resultExpected {
t.Fatalf("unexpected result; got %q; want %q", result, resultExpected)
}

View File

@@ -6,7 +6,7 @@
"forms_daily_count;item=x 2 {TIME_S-2m}",
"forms_daily_count;item=y 3 {TIME_S-1m}",
"forms_daily_count;item=y 4 {TIME_S-2m}"],
"query": ["/api/v1/query?query=min%20by%20(item)%20(min_over_time(forms_daily_count[10m:1m]))&time={TIME_S-1m}"],
"query": ["/api/v1/query?query=min%20by%20(item)%20(min_over_time(forms_daily_count[10m:1m]))&time={TIME_S-1m}&latency_offset=1ms"],
"result_query": {
"status":"success",
"data":{"resultType":"vector","result":[{"metric":{"item":"x"},"value":["{TIME_S-1m}","2"]},{"metric":{"item":"y"},"value":["{TIME_S-1m}","4"]}]}

View File

@@ -24,8 +24,8 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
see [these docs](https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter).
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
* Can accept data via all the ingestion protocols supported by VictoriaMetrics - see [these docs](#how-to-push-data-to-vmagent).
* Can replicate collected metrics simultaneously to multiple remote storage systems -
see [these docs](#replication-and-high-availability).
* Can aggregate incoming samples by time and by labels before sending them to remote storage - see [these docs](https://docs.victoriametrics.com/stream-aggregation.html).
* Can replicate collected metrics simultaneously to multiple remote storage systems - see [these docs](#replication-and-high-availability).
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
@@ -60,6 +60,8 @@ Example command line:
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
Example of scrape configuration for `-promscrape.config` argument you can find [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/prometheus.yml)
See [how to scrape Prometheus-compatible targets](#how-to-collect-metrics-in-prometheus-format) for more details.
If you use single-node VictoriaMetrics, then you can discover and scrape Prometheus-compatible targets directly from VictoriaMetrics
@@ -126,6 +128,12 @@ If you use Prometheus only for scraping metrics from various targets and forward
then `vmagent` can replace Prometheus. Typically, `vmagent` requires lower amounts of RAM, CPU and network bandwidth compared with Prometheus.
See [these docs](#how-to-collect-metrics-in-prometheus-format) for details.
### Statsd alternative
`vmagent` can be used as an alternative to [statsd](https://github.com/statsd/statsd)
when [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) is enabled.
See [these docs](https://docs.victoriametrics.com/stream-aggregation.html#statsd-alternative) for details.
### Flexible metrics relay
`vmagent` can accept metrics in [various popular data ingestion protocols](#how-to-push-data-to-vmagent), apply [relabeling](#relabeling)
@@ -598,8 +606,8 @@ provide the following tools for debugging target-level and metric-level relabeli
- Target-level debugging (e.g. `relabel_configs` section at [scrape_configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs))
can be performed by navigating to `http://vmagent:8429/targets` page (`http://victoriametrics:8428/targets` page for single-node VictoriaMetrics)
and clicking the `debug` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual relabeling rules applied to the target labels.
and clicking the `debug target relabeling` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual target relabeling rules applied to the discovered target labels.
The `http://vmagent:8429/targets` page shows only active targets. If you need to understand why some target
is dropped during the relabeling, then navigate to `http://vmagent:8428/service-discovery` page
@@ -608,11 +616,9 @@ provide the following tools for debugging target-level and metric-level relabeli
which result to target drop.
- Metric-level debugging (e.g. `metric_relabel_configs` section at [scrape_configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs)
and all the relabeling, which can be set up via `-relabelConfig`, `-remoteWrite.relabelConfig` and `-remoteWrite.urlRelabelConfig`
command-line flags) can be performed by navigating to `http://vmagent:8429/metric-relabel-debug` page
(`http://victoriametrics:8428/metric-relabel-debug` page for single-node VictoriaMetrics)
and submitting there relabeling rules together with the metric to be relabeled.
The page will show step-by-step results for the entered relabeling rules executed against the entered metric.
can be performed by navigating to `http://vmagent:8429/targets` page (`http://victoriametrics:8428/targets` page for single-node VictoriaMetrics)
and clicking the `debug metrics relabeling` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual metric relabeling rules applied to the given target labels.
## Prometheus staleness markers
@@ -816,7 +822,7 @@ See also [cardinality explorer docs](https://docs.victoriametrics.com/#cardinali
We recommend setting up regular scraping of this page either through `vmagent` itself or by Prometheus
so that the exported metrics may be analyzed later.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683-victoriametrics-vmagent/) for `vmagent` state overview.
Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add a review to the dashboard.
@@ -1232,6 +1238,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -1241,7 +1249,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
The maximum number of concurrent insert requests. Default value should work for most cases, since it minimizes the memory usage. The default value can be increased when clients send data over slow networks. See also -insert.maxQueueDuration (default 8)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 33554432)
@@ -1332,6 +1340,10 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-promscrape.minResponseSizeForStreamParse size
The minimum target response size for automatic switching to stream parsing mode, which can reduce memory usage. See https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 1000000)
-promscrape.nomad.waitTime duration
Wait time used by Nomad service discovery. Default value is used if not set
-promscrape.nomadSDCheckInterval duration
Interval for checking for changes in Nomad. This works only if nomad_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#nomad_sd_configs for details (default 30s)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
-promscrape.openstackSDCheckInterval duration
@@ -1443,7 +1455,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.relabelConfig string
Optional path to file with relabel_config entries. The path can point either to local file or to http url. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details
Optional path to file with relabeling configs, which are applied to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Supports array of values separated by comma or specified via multiple flags.
@@ -1455,6 +1467,12 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.significantFigures array
The number of significant figures to leave in metric values before writing them to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.config array
Optional path to file with stream aggregation config. See https://docs.victoriametrics.com/stream-aggregation.html . See also -remoteWrite.streamAggr.keepInput
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.keepInput array
Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. See https://docs.victoriametrics.com/stream-aggregation.html
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. By default system CA is used
Supports an array of values separated by comma or specified via multiple flags.
@@ -1476,7 +1494,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelConfig array
Optional path to relabel config for the corresponding -remoteWrite.url. The path can point either to local file or to http url
Optional path to relabel configs for the corresponding -remoteWrite.url. See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
Supports an array of values separated by comma or specified via multiple flags.
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit

View File

@@ -10,7 +10,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -26,10 +25,8 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -10,7 +10,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,11 +27,9 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return insertRows(at, series, extraLabels)
})
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return insertRows(at, series, extraLabels)
})
}

View File

@@ -7,7 +7,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -20,9 +19,7 @@ var (
//
// See https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol
func InsertHandler(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, insertRows)
})
return parser.ParseStream(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -16,7 +16,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -37,10 +36,8 @@ var (
//
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
return parser.ParseStream(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
}
@@ -52,15 +49,13 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(at, db, rows, extraLabels)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(at, db, rows, extraLabels)
})
}

View File

@@ -38,7 +38,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -104,7 +103,6 @@ func main() {
startTime := time.Now()
remotewrite.Init()
common.StartUnmarshalWorkers()
writeconcurrencylimiter.Init()
if len(*influxListenAddr) > 0 {
influxServer = influxserver.MustStart(*influxListenAddr, func(r io.Reader) error {
return influx.InsertHandlerForReader(r, false)
@@ -349,12 +347,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/prometheus/config", "/config":
if *configAuthKey != "" && r.FormValue("authKey") != *configAuthKey {
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("The provided authKey doesn't match -configAuthKey"),
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
return true
}
promscrapeConfigRequests.Inc()
@@ -363,12 +356,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/prometheus/api/v1/status/config", "/api/v1/status/config":
// See https://prometheus.io/docs/prometheus/latest/querying/api/#config
if *configAuthKey != "" && r.FormValue("authKey") != *configAuthKey {
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("The provided authKey doesn't match -configAuthKey"),
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
return true
}
promscrapeStatusConfigRequests.Inc()

View File

@@ -12,7 +12,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,10 +30,8 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzip := req.Header.Get("Content-Encoding") == "gzip"
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return insertRows(at, block, extraLabels)
})
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return insertRows(at, block, extraLabels)
})
}

View File

@@ -7,7 +7,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -20,9 +19,7 @@ var (
//
// See http://opentsdb.net/docs/build/html/api_telnet/put.html
func InsertHandler(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, insertRows)
})
return parser.ParseStream(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -9,7 +9,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -25,10 +24,8 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -11,7 +11,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,21 +30,17 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
}, nil)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
}, nil)
}
// InsertHandlerForReader processes metrics from given reader with optional gzip format
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, 0, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
}, nil)
})
return parser.ParseStream(r, 0, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
}, nil)
}
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {

View File

@@ -13,7 +13,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -29,19 +28,15 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
}
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(at *auth.Token, r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, nil)
})
return parser.ParseStream(r, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, nil)
})
}

View File

@@ -15,11 +15,13 @@ import (
var (
unparsedLabelsGlobal = flagutil.NewArrayString("remoteWrite.label", "Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. "+
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabel_config entries. "+
"The path can point either to local file or to http url. These entries are applied to all the metrics "+
"before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details")
relabelConfigPaths = flagutil.NewArrayString("remoteWrite.urlRelabelConfig", "Optional path to relabel config for the corresponding -remoteWrite.url. "+
"The path can point either to local file or to http url")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabeling configs, which are applied "+
"to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. "+
"The path can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmagent.html#relabeling")
relabelConfigPaths = flagutil.NewArrayString("remoteWrite.urlRelabelConfig", "Optional path to relabel configs for the corresponding -remoteWrite.url. "+
"See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmagent.html#relabeling")
usePromCompatibleNaming = flag.Bool("usePromCompatibleNaming", false, "Whether to replace characters unsupported by Prometheus with underscores "+
"in the ingested metric names and label names. For example, foo.bar{a.b='c'} is transformed into foo_bar{a_b='c'} during data ingestion if this flag is set. "+
@@ -86,7 +88,7 @@ func initLabelsGlobal() {
}
func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, extraLabels []prompbmarshal.Label, pcs *promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries {
if len(extraLabels) == 0 && pcs.Len() == 0 {
if len(extraLabels) == 0 && pcs.Len() == 0 && !*usePromCompatibleNaming {
// Nothing to change.
return tss
}

View File

@@ -0,0 +1,49 @@
package remotewrite
import (
"reflect"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
func TestApplyRelabeling(t *testing.T) {
f := func(extraLabels []prompbmarshal.Label, pcs *promrelabel.ParsedConfigs, sTss, sExpTss string) {
rctx := &relabelCtx{}
tss, expTss := parseSeries(sTss), parseSeries(sExpTss)
gotTss := rctx.applyRelabeling(tss, extraLabels, pcs)
if !reflect.DeepEqual(gotTss, expTss) {
t.Fatalf("expected to have: \n%v;\ngot: \n%v", expTss, gotTss)
}
}
f(nil, nil, "up", "up")
f([]prompbmarshal.Label{{Name: "foo", Value: "bar"}}, nil, "up", `up{foo="bar"}`)
f([]prompbmarshal.Label{{Name: "foo", Value: "bar"}}, nil, `up{foo="baz"}`, `up{foo="bar"}`)
pcs, err := promrelabel.ParseRelabelConfigsData([]byte(`
- target_label: "foo"
replacement: "aaa"
- action: labeldrop
regex: "env.*"
`))
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
f(nil, pcs, `up{foo="baz", env="prod"}`, `up{foo="aaa"}`)
oldVal := *usePromCompatibleNaming
*usePromCompatibleNaming = true
f(nil, nil, `foo.bar`, `foo_bar`)
*usePromCompatibleNaming = oldVal
}
func parseSeries(data string) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
tss = append(tss, prompbmarshal.TimeSeries{
Labels: promutils.MustNewLabelsFromString(data).GetLabels(),
})
return tss
}

View File

@@ -21,6 +21,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
@@ -58,6 +59,13 @@ var (
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
maxDailySeries = flag.Int("remoteWrite.maxDailySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
streamAggrConfig = flagutil.NewArrayString("remoteWrite.streamAggr.config", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See also -remoteWrite.streamAggr.keepInput")
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. "+
"By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. "+
"See https://docs.victoriametrics.com/stream-aggregation.html")
)
var (
@@ -140,6 +148,7 @@ func Init() {
logger.Fatalf("cannot load relabel configs: %s", err)
}
allRelabelConfigs.Store(rcs)
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
@@ -435,9 +444,13 @@ var (
)
type remoteWriteCtx struct {
idx int
fq *persistentqueue.FastQueue
c *client
idx int
fq *persistentqueue.FastQueue
c *client
sas *streamaggr.Aggregators
streamAggrKeepInput bool
pss []*pendingSeries
pssNextIdx uint64
@@ -469,6 +482,7 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
}
c.init(argIdx, *queues, sanitizedURL)
// Initialize pss
sf := significantFigures.GetOptionalArgOrDefault(argIdx, 0)
rd := roundDigits.GetOptionalArgOrDefault(argIdx, 100)
pssLen := *queues
@@ -481,7 +495,8 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
for i := range pss {
pss[i] = newPendingSeries(fq.MustWriteBlock, sf, rd)
}
return &remoteWriteCtx{
rwctx := &remoteWriteCtx{
idx: argIdx,
fq: fq,
c: c,
@@ -490,6 +505,19 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
rowsPushedAfterRelabel: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_rows_pushed_after_relabel_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
rowsDroppedByRelabel: metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_relabel_metrics_dropped_total{path=%q, url=%q}`, queuePath, sanitizedURL)),
}
// Initialize sas
sasFile := streamAggrConfig.GetOptionalArg(argIdx)
if sasFile != "" {
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternal)
if err != nil {
logger.Fatalf("cannot initialize stream aggregators from -remoteWrite.streamAggrFile=%q: %s", sasFile, err)
}
rwctx.sas = sas
rwctx.streamAggrKeepInput = streamAggrKeepInput.GetOptionalArg(argIdx)
}
return rwctx
}
func (rwctx *remoteWriteCtx) MustStop() {
@@ -501,6 +529,8 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.fq.UnblockAllReaders()
rwctx.c.MustStop()
rwctx.c = nil
rwctx.sas.MustStop()
rwctx.sas = nil
rwctx.fq.MustClose()
rwctx.fq = nil
@@ -509,6 +539,7 @@ func (rwctx *remoteWriteCtx) MustStop() {
}
func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
// Apply relabeling
var rctx *relabelCtx
var v *[]prompbmarshal.TimeSeries
rcs := allRelabelConfigs.Load().(*relabelConfigs)
@@ -526,11 +557,17 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
rowsCountAfterRelabel := getRowsCount(tss)
rwctx.rowsDroppedByRelabel.Add(rowsCountBeforeRelabel - rowsCountAfterRelabel)
}
pss := rwctx.pss
idx := atomic.AddUint64(&rwctx.pssNextIdx, 1) % uint64(len(pss))
rowsCount := getRowsCount(tss)
rwctx.rowsPushedAfterRelabel.Add(rowsCount)
pss[idx].Push(tss)
// Apply stream aggregation if any
rwctx.sas.Push(tss)
if rwctx.sas == nil || rwctx.streamAggrKeepInput {
// Push samples to the remote storage
rwctx.pushInternal(tss)
}
// Return back relabeling contexts to the pool
if rctx != nil {
*v = prompbmarshal.ResetTimeSeries(tss)
tssRelabelPool.Put(v)
@@ -538,6 +575,12 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
}
}
func (rwctx *remoteWriteCtx) pushInternal(tss []prompbmarshal.TimeSeries) {
pss := rwctx.pss
idx := atomic.AddUint64(&rwctx.pssNextIdx, 1) % uint64(len(pss))
pss[idx].Push(tss)
}
var tssRelabelPool = &sync.Pool{
New: func() interface{} {
a := []prompbmarshal.TimeSeries{}

View File

@@ -13,7 +13,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,20 +30,16 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
})
return parser.ParseStream(r, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
})
}

View File

@@ -69,16 +69,17 @@ Then configure `vmalert` accordingly:
-external.label=replica=a # Multiple external labels may be set
```
Note there's a separate `remoteWrite.url` to allow writing results of
Note there's a separate `-remoteWrite.url` command-line flag to allow writing results of
alerting/recording rules into a different storage than the initial data that's
queried. This allows using `vmalert` to aggregate data from a short-term,
high-frequency, high-cardinality storage into a long-term storage with
decreased cardinality and a bigger interval between samples.
See also [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html).
See the full list of configuration flags in [configuration](#configuration) section.
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
to specify different `-external.label` command-line flags in order to define which `vmalert` generated rules or alerts.
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
@@ -191,6 +192,11 @@ expr: <string>
# Is applicable to alerting rules only.
[ debug: <bool> | default = false ]
# Defines the number of rule's updates entries stored in memory
# and available for view on rule's Details page.
# Overrides `rule.updateEntriesLimit` value for this specific rule.
[ update_entries_limit: <integer> | default 0 ]
# Labels to add or overwrite for each alert.
labels:
[ <labelname>: <tmpl_string> ]
@@ -319,6 +325,12 @@ expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
# Defines the number of rule's updates entries stored in memory
# and available for view on rule's Details page.
# Overrides `rule.updateEntriesLimit` value for this specific rule.
[ update_entries_limit: <integer> | default 0 ]
```
For recording rules to work `-remoteWrite.url` must be specified.
@@ -469,7 +481,8 @@ Alertmanager will automatically deduplicate alerts with identical labels, so ens
all `vmalert`s are having the same config.
Don't forget to configure [cluster mode](https://prometheus.io/docs/alerting/latest/alertmanager/)
for Alertmanagers for better reliability.
for Alertmanagers for better reliability. List all Alertmanager URLs in vmalert's `-notifier.url`
to ensure [high availability](https://github.com/prometheus/alertmanager#high-availability).
This example uses single-node VM server for the sake of simplicity.
Check how to replace it with [cluster VictoriaMetrics](#cluster-victoriametrics) if needed.
@@ -502,8 +515,8 @@ groups:
expr: avg_over_time(http_requests[5m])
```
Ability of `vmalert` to be configured with different `datasource.url` and `remoteWrite.url` allows
reading data from one data source and backfilling results to another. This helps to build a system
Ability of `vmalert` to be configured with different `-datasource.url` and `-remoteWrite.url` command-line flags
allows reading data from one data source and backfilling results to another. This helps to build a system
for aggregating and downsampling the data.
The following example shows how to build a topology where `vmalert` will process data from one cluster
@@ -527,7 +540,7 @@ Please note, [replay](#rules-backfilling) feature may be used for transforming h
Flags `-remoteRead.url` and `-notifier.url` are omitted since we assume only recording rules are used.
See also [downsampling docs](https://docs.victoriametrics.com/#downsampling).
See also [stream aggregation](https://docs.victoriametrics.com/stream-aggregation.html) and [downsampling](https://docs.victoriametrics.com/#downsampling).
#### Multiple remote writes
@@ -678,7 +691,7 @@ The default list of alerting rules for these metric can be found [here](https://
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
metrics may be analyzed later.
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview.
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950-victoriametrics-vmalert/) for `vmalert` overview.
Graphs on this dashboard contain useful hints - hover the `i` icon in the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard.
@@ -695,7 +708,7 @@ may get empty response from datasource and produce empty recording rules or rese
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
By default recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
This behavior is controlled by `-search.latencyOffset` command-line flag and the `latency_offset` query ag at `vmselect`.
Usually, this results into a 30s shift for recording rules results.
Note that too small value passed to `-search.latencyOffset` or to `latency_offest` query arg may lead to incomplete query results.
@@ -721,8 +734,9 @@ If `-remoteWrite.url` command-line flag is configured, vmalert will persist aler
[vmui](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmui) or Grafana to track how alerts state
changed in time.
vmalert also stores last N state updates for each rule. To check updates, click on `Details` link next to rule's name
on `/vmalert/groups` page and check the `Last updates` section:
vmalert stores last `-rule.updateEntriesLimit` (or `update_entries_limit` [per-rule config](https://docs.victoriametrics.com/vmalert.html#alerting-rules))
state updates for each rule. To check updates, click on `Details` link next to rule's name on `/vmalert/groups` page
and check the `Last updates` section:
<img alt="vmalert state" src="vmalert_state.png">
@@ -731,7 +745,7 @@ HTTP request sent by vmalert to the `-datasource.url` during evaluation. If spec
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
moment when rule was evaluated.
vmalert also alows configuring more detailed logging for specific rule. Just set `debug: true` in rule's configuration
vmalert allows configuring more detailed logging for specific alerting rule. Just set `debug: true` in rule's configuration
and vmalert will start printing additional log messages:
```terminal
2022-09-15T13:35:41.155Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:35:41+02:00: query returned 0 samples (elapsed: 5.896041ms)
@@ -890,6 +904,8 @@ The shortlist of configuration flags is the following:
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
@@ -955,7 +971,7 @@ The shortlist of configuration flags is the following:
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus alertmanager URL, e.g. http://127.0.0.1:9093
Prometheus Alertmanager URL, e.g. http://127.0.0.1:9093. List all Alertmanager URLs if it runs in the cluster mode to ensure high availability.
Supports an array of values separated by comma or specified via multiple flags.
-pprofAuthKey string
Auth key for /debug/pprof/* endpoints. It must be passed via authKey query arg. It overrides httpAuth.* settings
@@ -1102,6 +1118,8 @@ The shortlist of configuration flags is the following:
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
absolute path to all .tpl files in root.
Supports an array of values separated by comma or specified via multiple flags.
-rule.updateEntriesLimit int
Defines the max number of rule's state updates stored in-memory. Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param. (default 20)
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
@@ -1185,6 +1203,8 @@ dns_sd_configs:
```
The list of configured or discovered Notifiers can be explored via [UI](#Web).
If Alertmanager runs in cluster mode then all its URLs needs to be available during discovery
to ensure [high availability](https://github.com/prometheus/alertmanager#high-availability).
The configuration file [specification](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/notifier/config.go)
is the following:

View File

@@ -74,10 +74,15 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
Debug: cfg.Debug,
}),
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
metrics: &alertingRuleMetrics{},
}
if cfg.UpdateEntriesLimit != nil {
ar.state = newRuleState(*cfg.UpdateEntriesLimit)
} else {
ar.state = newRuleState(*ruleUpdateEntriesLimit)
}
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
ar.metrics.pending = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_alerts_pending{%s}`, labels),
func() float64 {
@@ -491,6 +496,7 @@ func (ar *AlertingRule) ToAPI() APIRule {
State: "inactive",
Alerts: ar.AlertsToAPI(),
LastSamples: lastState.samples,
MaxUpdates: ar.state.size(),
Updates: ar.state.getAll(),
// encode as strings to avoid rounding in JSON

View File

@@ -709,7 +709,6 @@ func TestAlertingRule_Template(t *testing.T) {
"summary": `{{ $labels.alertname }}: Too high connection number for "{{ $labels.instance }}"`,
},
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
},
[]datasource.Metric{
metricWithValueAndLabels(t, 1, "instance", "foo"),
@@ -749,7 +748,6 @@ func TestAlertingRule_Template(t *testing.T) {
"description": `{{ $labels.alertname}}: It is {{ $value }} connections for "{{ $labels.instance }}"`,
},
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
},
[]datasource.Metric{
metricWithValueAndLabels(t, 2, "__name__", "first", "instance", "foo", alertNameLabel, "override"),
@@ -789,7 +787,6 @@ func TestAlertingRule_Template(t *testing.T) {
"summary": `Alert "{{ $labels.alertname }}({{ $labels.alertgroup }})" for instance {{ $labels.instance }}`,
},
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
},
[]datasource.Metric{
metricWithValueAndLabels(t, 1,
@@ -820,6 +817,7 @@ func TestAlertingRule_Template(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
tc.rule.state = newRuleState(10)
fq.add(tc.metrics...)
if _, err := tc.rule.Exec(context.TODO(), time.Now(), 0); err != nil {
t.Fatalf("unexpected err: %s", err)
@@ -936,6 +934,6 @@ func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
For: waitFor,
EvalInterval: waitFor,
alerts: make(map[uint64]*notifier.Alert),
state: newRuleState(),
state: newRuleState(10),
}
}

View File

@@ -114,6 +114,9 @@ type Rule struct {
Labels map[string]string `yaml:"labels,omitempty"`
Annotations map[string]string `yaml:"annotations,omitempty"`
Debug bool `yaml:"debug,omitempty"`
// UpdateEntriesLimit defines max number of rule's state updates stored in memory.
// Overrides `-rule.updateEntriesLimit`.
UpdateEntriesLimit *int `yaml:"update_entries_limit,omitempty"`
// Catches all undefined fields and must be empty after parsing.
XXX map[string]interface{} `yaml:",inline"`

View File

@@ -550,6 +550,20 @@ rules:
- alert: foo
expr: sum by(job) (up == 1)
debug: true
`)
})
t.Run("`update_entries_limit` change", func(t *testing.T) {
f(t, `
name: TestGroup
rules:
- alert: foo
expr: sum by(job) (up == 1)
`, `
name: TestGroup
rules:
- alert: foo
expr: sum by(job) (up == 1)
update_entries_limit: 33
`)
})
}

View File

@@ -12,6 +12,7 @@ groups:
expr: vm_tcplistener_conns > 0
for: 3m
debug: true
update_entries_limit: 40
annotations:
labels: "Available labels: {{ $labels }}"
summary: Too high connection number for {{ $labels.instance }}
@@ -20,6 +21,7 @@ groups:
{{ end }}
description: "It is {{ $value }} connections for {{$labels.instance}}"
- alert: ExampleAlertAlwaysFiring
update_entries_limit: -1
expr: sum by(job)
(up == 1)
labels:

View File

@@ -7,6 +7,7 @@ groups:
- alert: Conns
expr: filterSeries(sumSeries(host.receiver.interface.cons),'last','>', 500)
for: 3m
annotations:
summary: Too high connection number for {{$labels.instance}}
description: "It is {{ $value }} connections for {{$labels.instance}}"

View File

@@ -460,7 +460,7 @@ func TestFaultyRW(t *testing.T) {
r := &RecordingRule{
Name: "test",
state: newRuleState(),
state: newRuleState(10),
q: fq,
}

View File

@@ -56,7 +56,9 @@ absolute path to all .tpl files in root.`)
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
"which is by default equal to 3 evaluation intervals of the parent group.")
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param.")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager `+

View File

@@ -17,7 +17,8 @@ var (
configPath = flag.String("notifier.config", "", "Path to configuration file for notifiers")
suppressDuplicateTargetErrors = flag.Bool("notifier.suppressDuplicateTargetErrors", false, "Whether to suppress 'duplicate target' errors during discovery")
addrs = flagutil.NewArrayString("notifier.url", "Prometheus alertmanager URL, e.g. http://127.0.0.1:9093")
addrs = flagutil.NewArrayString("notifier.url", "Prometheus Alertmanager URL, e.g. http://127.0.0.1:9093. "+
"List all Alertmanager URLs if it runs in the cluster mode to ensure high availability.")
basicAuthUsername = flagutil.NewArrayString("notifier.basicAuth.username", "Optional basic auth username for -notifier.url")
basicAuthPassword = flagutil.NewArrayString("notifier.basicAuth.password", "Optional basic auth password for -notifier.url")

View File

@@ -58,7 +58,6 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
Labels: cfg.Labels,
GroupID: group.ID(),
metrics: &recordingRuleMetrics{},
state: newRuleState(),
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: group.Type.String(),
EvaluationInterval: group.Interval,
@@ -67,6 +66,12 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
}),
}
if cfg.UpdateEntriesLimit != nil {
rr.state = newRuleState(*cfg.UpdateEntriesLimit)
} else {
rr.state = newRuleState(*ruleUpdateEntriesLimit)
}
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
rr.metrics.errors = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
func() float64 {
@@ -212,6 +217,7 @@ func (rr *RecordingRule) ToAPI() APIRule {
EvaluationTime: lastState.duration.Seconds(),
Health: "ok",
LastSamples: lastState.samples,
MaxUpdates: rr.state.size(),
Updates: rr.state.getAll(),
// encode as strings to avoid rounding

View File

@@ -19,7 +19,7 @@ func TestRecordingRule_Exec(t *testing.T) {
expTS []prompbmarshal.TimeSeries
}{
{
&RecordingRule{Name: "foo", state: newRuleState()},
&RecordingRule{Name: "foo"},
[]datasource.Metric{metricWithValueAndLabels(t, 10,
"__name__", "bar",
)},
@@ -30,7 +30,7 @@ func TestRecordingRule_Exec(t *testing.T) {
},
},
{
&RecordingRule{Name: "foobarbaz", state: newRuleState()},
&RecordingRule{Name: "foobarbaz"},
[]datasource.Metric{
metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "foo"),
metricWithValueAndLabels(t, 2, "__name__", "bar", "job", "bar"),
@@ -53,8 +53,7 @@ func TestRecordingRule_Exec(t *testing.T) {
},
{
&RecordingRule{
Name: "job:foo",
state: newRuleState(),
Name: "job:foo",
Labels: map[string]string{
"source": "test",
}},
@@ -80,6 +79,7 @@ func TestRecordingRule_Exec(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tc.rule.q = fq
tc.rule.state = newRuleState(10)
tss, err := tc.rule.Exec(context.TODO(), time.Now(), 0)
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
@@ -198,7 +198,7 @@ func TestRecordingRuleLimit(t *testing.T) {
metricWithValuesAndLabels(t, []float64{2, 3}, "__name__", "bar", "job", "bar"),
metricWithValuesAndLabels(t, []float64{4, 5, 6}, "__name__", "baz", "job", "baz"),
}
rule := &RecordingRule{Name: "job:foo", state: newRuleState(), Labels: map[string]string{
rule := &RecordingRule{Name: "job:foo", state: newRuleState(10), Labels: map[string]string{
"source": "test_limit",
}}
var err error
@@ -216,7 +216,7 @@ func TestRecordingRuleLimit(t *testing.T) {
func TestRecordingRule_ExecNegative(t *testing.T) {
rr := &RecordingRule{
Name: "job:foo",
state: newRuleState(),
state: newRuleState(10),
Labels: map[string]string{
"job": "test",
},

View File

@@ -37,6 +37,8 @@ type ruleState struct {
sync.RWMutex
entries []ruleStateEntry
cur int
// disabled defines whether ruleState tracks ruleStateEntry
disabled bool
}
type ruleStateEntry struct {
@@ -57,21 +59,36 @@ type ruleStateEntry struct {
curl string
}
const defaultStateEntriesLimit = 20
func newRuleState() *ruleState {
func newRuleState(size int) *ruleState {
if size < 1 {
return &ruleState{disabled: true}
}
return &ruleState{
entries: make([]ruleStateEntry, defaultStateEntriesLimit),
entries: make([]ruleStateEntry, size),
}
}
func (s *ruleState) getLast() ruleStateEntry {
if s.disabled {
return ruleStateEntry{}
}
s.RLock()
defer s.RUnlock()
return s.entries[s.cur]
}
func (s *ruleState) size() int {
s.RLock()
defer s.RUnlock()
return len(s.entries)
}
func (s *ruleState) getAll() []ruleStateEntry {
if s.disabled {
return nil
}
entries := make([]ruleStateEntry, 0)
s.RLock()
@@ -94,6 +111,10 @@ func (s *ruleState) getAll() []ruleStateEntry {
}
func (s *ruleState) add(e ruleStateEntry) {
if s.disabled {
return
}
s.Lock()
defer s.Unlock()

View File

@@ -6,8 +6,27 @@ import (
"time"
)
func TestRule_stateDisabled(t *testing.T) {
state := newRuleState(-1)
e := state.getLast()
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
}
state.add(ruleStateEntry{at: time.Now()})
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
}
if len(state.getAll()) != 0 {
t.Fatalf("expected for state to have %d entries; got %d",
0, len(state.getAll()),
)
}
}
func TestRule_state(t *testing.T) {
state := newRuleState()
stateEntriesN := 20
state := newRuleState(stateEntriesN)
e := state.getLast()
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
@@ -39,7 +58,7 @@ func TestRule_state(t *testing.T) {
}
var last time.Time
for i := 0; i < defaultStateEntriesLimit*2; i++ {
for i := 0; i < stateEntriesN*2; i++ {
last = time.Now()
state.add(ruleStateEntry{at: last})
}
@@ -50,9 +69,9 @@ func TestRule_state(t *testing.T) {
e.at, last)
}
if len(state.getAll()) != defaultStateEntriesLimit {
if len(state.getAll()) != stateEntriesN {
t.Fatalf("expected for state to have %d entries only; got %d",
defaultStateEntriesLimit, len(state.getAll()),
stateEntriesN, len(state.getAll()),
)
}
}
@@ -61,7 +80,7 @@ func TestRule_state(t *testing.T) {
// execution of state updates.
// Should be executed with -race flag
func TestRule_stateConcurrent(t *testing.T) {
state := newRuleState()
state := newRuleState(20)
const workers = 50
const iterations = 100

View File

@@ -27,11 +27,11 @@ import (
"strconv"
"strings"
"sync"
textTpl "text/template"
"time"
textTpl "text/template"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/formatutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
@@ -350,15 +350,7 @@ func templateFuncs() textTpl.FuncMap {
if math.Abs(v) <= 1 || math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v), nil
}
prefix := ""
for _, p := range []string{"ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"} {
if math.Abs(v) < 1024 {
break
}
prefix = p
v /= 1024
}
return fmt.Sprintf("%.4g%s", v, prefix), nil
return formatutil.HumanizeBytes(v), nil
},
// humanizeDuration converts given seconds to a human-readable duration

View File

@@ -1,6 +1,7 @@
package templates
import (
"math"
"strings"
"testing"
textTpl "text/template"
@@ -50,6 +51,31 @@ func TestTemplateFuncs(t *testing.T) {
if !ok {
t.Fatalf("unexpected mismatch")
}
formatting := func(funcName string, p interface{}, resultExpected string) {
t.Helper()
v := funcs[funcName]
fLocal := v.(func(s interface{}) (string, error))
result, err := fLocal(p)
if err != nil {
t.Fatalf("unexpected error for %s(%f): %s", funcName, p, err)
}
if result != resultExpected {
t.Fatalf("unexpected result for %s(%f); got\n%s\nwant\n%s", funcName, p, result, resultExpected)
}
}
formatting("humanize1024", float64(0), "0")
formatting("humanize1024", math.Inf(0), "+Inf")
formatting("humanize1024", math.NaN(), "NaN")
formatting("humanize1024", float64(127087), "124.1ki")
formatting("humanize1024", float64(130137088), "124.1Mi")
formatting("humanize1024", float64(133260378112), "124.1Gi")
formatting("humanize1024", float64(136458627186688), "124.1Ti")
formatting("humanize1024", float64(139733634239168512), "124.1Pi")
formatting("humanize1024", float64(143087241460908556288), "124.1Ei")
formatting("humanize1024", float64(146521335255970361638912), "124.1Zi")
formatting("humanize1024", float64(150037847302113650318245888), "124.1Yi")
formatting("humanize1024", float64(153638755637364377925883789312), "1.271e+05Yi")
}
func mkTemplate(current, replacement interface{}) textTemplate {

View File

@@ -440,7 +440,7 @@
</div>
<br>
<div class="display-6 pb-3">Last {%d len(rule.Updates) %} updates</span>:</div>
<div class="display-6 pb-3">Last {%d len(rule.Updates) %}/{%d rule.MaxUpdates %} updates</span>:</div>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>

View File

@@ -1345,6 +1345,10 @@ func StreamRuleDetails(qw422016 *qt422016.Writer, r *http.Request, rule APIRule)
<div class="display-6 pb-3">Last `)
//line app/vmalert/web.qtpl:443
qw422016.N().D(len(rule.Updates))
//line app/vmalert/web.qtpl:443
qw422016.N().S(`/`)
//line app/vmalert/web.qtpl:443
qw422016.N().D(rule.MaxUpdates)
//line app/vmalert/web.qtpl:443
qw422016.N().S(` updates</span>:</div>
<table class="table table-striped table-hover table-sm">

View File

@@ -17,7 +17,7 @@ func TestHandler(t *testing.T) {
alerts: map[uint64]*notifier.Alert{
0: {State: notifier.StateFiring},
},
state: newRuleState(),
state: newRuleState(10),
}
g := &Group{
Name: "group",

View File

@@ -121,6 +121,8 @@ type APIRule struct {
// GroupID is an unique Group's ID
GroupID string `json:"group_id"`
// MaxUpdates is the max number of recorded ruleStateEntry objects
MaxUpdates int `json:"max_updates_entries"`
// Updates contains the ordered list of recorded ruleStateEntry objects
Updates []ruleStateEntry `json:"updates"`
}

View File

@@ -270,6 +270,8 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -60,9 +60,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/-/reload":
authKey := r.FormValue("authKey")
if authKey != *reloadAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -reloadAuthKey command line flag", authKey)
if !httpserver.CheckAuthFlag(w, r, *reloadAuthKey, "reloadAuthKey") {
return true
}
configReloadRequests.Inc()

View File

@@ -225,6 +225,8 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"fmt"
"net/url"
"os"
"path/filepath"
"strings"
@@ -41,25 +42,36 @@ func main() {
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
flagutil.RegisterSecretFlag("snapshot.createURL")
flagutil.RegisterSecretFlag("snapshot.deleteURL")
envflag.Parse()
buildinfo.Init()
logger.Init()
pushmetrics.Init()
if len(*snapshotCreateURL) > 0 {
// create net/url object
createUrl, err := url.Parse(*snapshotCreateURL)
if err != nil {
logger.Fatalf("cannot parse snapshotCreateURL: %s", err)
}
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", *snapshotCreateURL)
logger.Infof("Snapshot create url %s", createUrl.Redacted())
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
if err != nil {
logger.Fatalf("Failed to set snapshot.deleteURL flag: %v", err)
}
}
logger.Infof("Snapshot delete url %s", *snapshotDeleteURL)
deleteUrl, err := url.Parse(*snapshotCreateURL)
if err != nil {
logger.Fatalf("cannot parse snapshotDeleteURL: %s", err)
}
logger.Infof("Snapshot delete url %s", deleteUrl.Redacted())
name, err := snapshot.Create(*snapshotCreateURL)
name, err := snapshot.Create(createUrl.String())
if err != nil {
logger.Fatalf("cannot create snapshot: %s", err)
}
@@ -69,7 +81,7 @@ func main() {
}
defer func() {
err := snapshot.Delete(*snapshotDeleteURL, name)
err := snapshot.Delete(deleteUrl.String(), name)
if err != nil {
logger.Fatalf("cannot delete snapshot: %s", err)
}
@@ -118,7 +130,7 @@ func main() {
func usage() {
const s = `
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3, azblob
or local filesystem. Backed up data can be restored with vmrestore.
See the docs at https://docs.victoriametrics.com/vmbackup.html .

View File

@@ -67,7 +67,7 @@ credentials.json
"project_id": "<project>",
"private_key_id": "",
"private_key": "-----BEGIN PRIVATE KEY-----\-----END PRIVATE KEY-----\n",
"client_email": test@<project>.iam.gserviceaccount.com",
"client_email": "test@<project>.iam.gserviceaccount.com",
"client_id": "",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
@@ -158,7 +158,7 @@ The result on the GCS bucket. We see only 3 daily backups:
* GET `/api/v1/backups` - returns list of backups in remote storage.
Example output:
```json
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2022-11-30","size_bytes":26664689,"size":"25.429Mi"},{"name":"daily/2022-12-01","size_bytes":40160965,"size":"38.300Mi"},{"name":"hourly/2022-11-30:12","size_bytes":5846529,"size":"5.576Mi"},{"name":"hourly/2022-11-30:13","size_bytes":17651847,"size":"16.834Mi"},{"name":"hourly/2022-11-30:13:22","size_bytes":8797831,"size":"8.390Mi"},{"name":"hourly/2022-11-30:14","size_bytes":10680454,"size":"10.186Mi"}]
```
* POST `/api/v1/restore` - saves backup name to restore when [performing restore](#restore-commands).
@@ -211,7 +211,7 @@ It can be changed by using flag:
`vmbackupmanager backup list` lists backups in remote storage:
```console
$ ./vmbackupmanager backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2022-11-30","size_bytes":26664689,"size":"25.429Mi"},{"name":"daily/2022-12-01","size_bytes":40160965,"size":"38.300Mi"},{"name":"hourly/2022-11-30:12","size_bytes":5846529,"size":"5.576Mi"},{"name":"hourly/2022-11-30:13","size_bytes":17651847,"size":"16.834Mi"},{"name":"hourly/2022-11-30:13:22","size_bytes":8797831,"size":"8.390Mi"},{"name":"hourly/2022-11-30:14","size_bytes":10680454,"size":"10.186Mi"}]
```
### Restore commands
@@ -270,7 +270,15 @@ If restore mark doesn't exist at `storageDataPath`(restore wasn't requested) `vm
### How to restore in Kubernetes
1. Enter container running `vmbackupmanager`
1. Ensure there is an init container with `vmbackupmanager restore` in `vmstorage` or `vmsingle` pod.
For [VictoriaMetrics operator](https://docs.victoriametrics.com/operator/VictoriaMetrics-Operator.html) deployments it is required to add:
```yaml
vmbackup:
restore:
onStart: "true"
```
See operator `VMStorage` schema [here](https://docs.victoriametrics.com/operator/api.html#vmstorage) and `VMSingle` [here](https://docs.victoriametrics.com/operator/api.html#vmsinglespec).
2. Enter container running `vmbackupmanager`
2. Use `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
@@ -287,6 +295,43 @@ If restore mark doesn't exist at `storageDataPath`(restore wasn't requested) `vm
```
4. Restart pod
#### Restore cluster into another cluster
These steps are assuming that [VictoriaMetrics operator](https://docs.victoriametrics.com/operator/VictoriaMetrics-Operator.html) is used to manage `VMCluster`.
Clusters here are referred to as `source` and `destination`.
1. Create a new cluster with access to *source* cluster `vmbackupmanager` storage and same number of storage nodes.
Add the following section in order to enable restore on start (operator `VMStorage` schema can be found [here](https://docs.victoriametrics.com/operator/api.html#vmstorage):
```yaml
vmbackup:
restore:
onStart: "true"
```
Note: it is safe to leave this section in the cluster configuration, since it will be ignored if restore mark doesn't exist.
> Important! Use different `-dst` for *destination* cluster to avoid overwriting backup data of the *source* cluster.
2. Enter container running `vmbackupmanager` in *source* cluster
2. Use `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
```
3. Use `vmbackupmanager restore create` to create restore mark at each pod of the *destination* cluster.
Each pod in *destination* cluster should be restored from backup of respective pod in *source* cluster.
For example: `vmstorage-source-0` in *source* cluster should be restored from `vmstorage-destination-0` in *destination* cluster.
```console
$ /vmbackupmanager-prod restore create s3://source_cluster/vmstorage-source-0/daily/2022-10-06
```
## Monitoring
`vmbackupmanager` exports various metrics in Prometheus exposition format at `http://vmbackupmanager:8300/metrics` page. It is recommended setting up regular scraping of this page
either via [vmagent](https://docs.victoriametrics.com/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/17798-victoriametrics-backupmanager/) for `vmbackupmanager` overview.
Graphs on this dashboard contain useful hints - hover the `i` icon in the top left corner of each graph in order to read it.
If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard.
## Configuration
### Flags
@@ -372,6 +417,8 @@ command-line flags:
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -353,7 +353,7 @@ var (
},
&cli.StringFlag{
Name: vmNativeStepInterval,
Usage: fmt.Sprintf("Split export data into chunks. Requires setting --%s. Valid values are '%s','%s','%s'.", vmNativeFilterTimeStart, stepper.StepMonth, stepper.StepDay, stepper.StepHour),
Usage: fmt.Sprintf("Split export data into chunks. Requires setting --%s. Valid values are '%s','%s','%s','%s'.", vmNativeFilterTimeStart, stepper.StepMonth, stepper.StepDay, stepper.StepHour, stepper.StepMinute),
},
&cli.StringFlag{
Name: vmNativeSrcAddr,
@@ -410,19 +410,20 @@ var (
)
const (
remoteRead = "remote-read"
remoteReadUseStream = "remote-read-use-stream"
remoteReadConcurrency = "remote-read-concurrency"
remoteReadFilterTimeStart = "remote-read-filter-time-start"
remoteReadFilterTimeEnd = "remote-read-filter-time-end"
remoteReadFilterLabel = "remote-read-filter-label"
remoteReadFilterLabelValue = "remote-read-filter-label-value"
remoteReadStepInterval = "remote-read-step-interval"
remoteReadSrcAddr = "remote-read-src-addr"
remoteReadUser = "remote-read-user"
remoteReadPassword = "remote-read-password"
remoteReadHTTPTimeout = "remote-read-http-timeout"
remoteReadHeaders = "remote-read-headers"
remoteRead = "remote-read"
remoteReadUseStream = "remote-read-use-stream"
remoteReadConcurrency = "remote-read-concurrency"
remoteReadFilterTimeStart = "remote-read-filter-time-start"
remoteReadFilterTimeEnd = "remote-read-filter-time-end"
remoteReadFilterLabel = "remote-read-filter-label"
remoteReadFilterLabelValue = "remote-read-filter-label-value"
remoteReadStepInterval = "remote-read-step-interval"
remoteReadSrcAddr = "remote-read-src-addr"
remoteReadUser = "remote-read-user"
remoteReadPassword = "remote-read-password"
remoteReadHTTPTimeout = "remote-read-http-timeout"
remoteReadHeaders = "remote-read-headers"
remoteReadInsecureSkipVerify = "remote-read-insecure-skip-verify"
)
var (
@@ -493,6 +494,11 @@ var (
"For example, --remote-read-headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding remote source storage. \n" +
"Multiple headers must be delimited by '^^': --remote-read-headers='header1:value1^^header2:value2'",
},
&cli.BoolFlag{
Name: remoteReadInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to the remote read address",
Value: false,
},
}
)

View File

@@ -121,14 +121,15 @@ func main() {
Flags: mergeFlags(globalFlags, remoteReadFlags, vmFlags),
Action: func(c *cli.Context) error {
rr, err := remoteread.NewClient(remoteread.Config{
Addr: c.String(remoteReadSrcAddr),
Username: c.String(remoteReadUser),
Password: c.String(remoteReadPassword),
Timeout: c.Duration(remoteReadHTTPTimeout),
UseStream: c.Bool(remoteReadUseStream),
Headers: c.String(remoteReadHeaders),
LabelName: c.String(remoteReadFilterLabel),
LabelValue: c.String(remoteReadFilterLabelValue),
Addr: c.String(remoteReadSrcAddr),
Username: c.String(remoteReadUser),
Password: c.String(remoteReadPassword),
Timeout: c.Duration(remoteReadHTTPTimeout),
UseStream: c.Bool(remoteReadUseStream),
Headers: c.String(remoteReadHeaders),
LabelName: c.String(remoteReadFilterLabel),
LabelValue: c.String(remoteReadFilterLabelValue),
InsecureSkipVerify: c.Bool(remoteReadInsecureSkipVerify),
})
if err != nil {
return fmt.Errorf("error create remote read client: %s", err)

View File

@@ -0,0 +1,319 @@
package main
import (
"context"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/remoteread"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/stepper"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/testdata/servers_integration_test"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/prometheus/prometheus/prompb"
)
func TestRemoteRead(t *testing.T) {
var testCases = []struct {
name string
remoteReadConfig remoteread.Config
vmCfg vm.Config
start string
end string
numOfSamples int64
numOfSeries int64
rrp remoteReadProcessor
chunk string
remoteReadSeries func(start, end, numOfSeries, numOfSamples int64) []*prompb.TimeSeries
expectedSeries []vm.TimeSeries
}{
{
name: "step minute on minute time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*"},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-11-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMinute,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{0, 0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{100, 100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{200, 200},
},
},
},
{
name: "step month on month time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*"},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-09-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMonth,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1664184185000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1664184185000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1664184185000},
Values: []float64{200},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1666819415000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1666819415000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1666819415000},
Values: []float64{200}},
},
},
}
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
remoteReadServer := remote_read_integration.NewRemoteReadServer(t)
defer remoteReadServer.Close()
remoteWriteServer := remote_read_integration.NewRemoteWriteServer(t)
defer remoteWriteServer.Close()
tt.remoteReadConfig.Addr = remoteReadServer.URL()
rr, err := remoteread.NewClient(tt.remoteReadConfig)
if err != nil {
t.Fatalf("error create remote read client: %s", err)
}
start, err := time.Parse(time.RFC3339, tt.start)
if err != nil {
t.Fatalf("Error parse start time: %s", err)
}
end, err := time.Parse(time.RFC3339, tt.end)
if err != nil {
t.Fatalf("Error parse end time: %s", err)
}
rrs := tt.remoteReadSeries(start.Unix(), end.Unix(), tt.numOfSeries, tt.numOfSamples)
remoteReadServer.SetRemoteReadSeries(rrs)
remoteWriteServer.ExpectedSeries(tt.expectedSeries)
tt.vmCfg.Addr = remoteWriteServer.URL()
importer, err := vm.NewImporter(tt.vmCfg)
if err != nil {
t.Fatalf("failed to create VM importer: %s", err)
}
defer importer.Close()
rmp := remoteReadProcessor{
src: rr,
dst: importer,
filter: remoteReadFilter{
timeStart: &start,
timeEnd: &end,
chunk: tt.chunk,
},
cc: 1,
}
ctx := context.Background()
err = rmp.run(ctx, true, false)
if err != nil {
t.Fatalf("failed to run remote read processor: %s", err)
}
})
}
}
func TestSteamRemoteRead(t *testing.T) {
var testCases = []struct {
name string
remoteReadConfig remoteread.Config
vmCfg vm.Config
start string
end string
numOfSamples int64
numOfSeries int64
rrp remoteReadProcessor
chunk string
remoteReadSeries func(start, end, numOfSeries, numOfSamples int64) []*prompb.TimeSeries
expectedSeries []vm.TimeSeries
}{
{
name: "step minute on minute time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*", UseStream: true},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-11-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMinute,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{0, 0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{100, 100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1669454585000, 1669454615000},
Values: []float64{200, 200},
},
},
},
{
name: "step month on month time range",
remoteReadConfig: remoteread.Config{Addr: "", LabelName: "__name__", LabelValue: ".*", UseStream: true},
vmCfg: vm.Config{Addr: "", Concurrency: 1, DisableProgressBar: true},
start: "2022-09-26T11:23:05+02:00",
end: "2022-11-26T11:24:05+02:00",
numOfSamples: 2,
numOfSeries: 3,
chunk: stepper.StepMonth,
remoteReadSeries: remote_read_integration.GenerateRemoteReadSeries,
expectedSeries: []vm.TimeSeries{
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1664184185000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1664184185000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1664184185000},
Values: []float64{200},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "0"}},
Timestamps: []int64{1666819415000},
Values: []float64{0},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "1"}},
Timestamps: []int64{1666819415000},
Values: []float64{100},
},
{
Name: "vm_metric_1",
LabelPairs: []vm.LabelPair{{Name: "job", Value: "2"}},
Timestamps: []int64{1666819415000},
Values: []float64{200}},
},
},
}
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
remoteReadServer := remote_read_integration.NewRemoteReadStreamServer(t)
defer remoteReadServer.Close()
remoteWriteServer := remote_read_integration.NewRemoteWriteServer(t)
defer remoteWriteServer.Close()
tt.remoteReadConfig.Addr = remoteReadServer.URL()
rr, err := remoteread.NewClient(tt.remoteReadConfig)
if err != nil {
t.Fatalf("error create remote read client: %s", err)
}
start, err := time.Parse(time.RFC3339, tt.start)
if err != nil {
t.Fatalf("Error parse start time: %s", err)
}
end, err := time.Parse(time.RFC3339, tt.end)
if err != nil {
t.Fatalf("Error parse end time: %s", err)
}
rrs := tt.remoteReadSeries(start.Unix(), end.Unix(), tt.numOfSeries, tt.numOfSamples)
remoteReadServer.InitMockStorage(rrs)
remoteWriteServer.ExpectedSeries(tt.expectedSeries)
tt.vmCfg.Addr = remoteWriteServer.URL()
importer, err := vm.NewImporter(tt.vmCfg)
if err != nil {
t.Fatalf("failed to create VM importer: %s", err)
}
defer importer.Close()
rmp := remoteReadProcessor{
src: rr,
dst: importer,
filter: remoteReadFilter{
timeStart: &start,
timeEnd: &end,
chunk: tt.chunk,
},
cc: 1,
}
ctx := context.Background()
err = rmp.run(ctx, true, false)
if err != nil {
t.Fatalf("failed to run remote read processor: %s", err)
}
})
}
}

View File

@@ -10,6 +10,7 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/gogo/protobuf/proto"
@@ -60,6 +61,8 @@ type Config struct {
// LabelName, LabelValue stands for label=~value pair used for read requests.
// Is optional.
LabelName, LabelValue string
// TLSSkipVerify defines whether to skip TLS certificate verification when connecting to the remote read address.
InsecureSkipVerify bool
}
// Filter defines a list of filters applied to requested data
@@ -100,7 +103,7 @@ func NewClient(cfg Config) (*Client, error) {
c := &Client{
c: &http.Client{
Timeout: cfg.Timeout,
Transport: http.DefaultTransport.(*http.Transport).Clone(),
Transport: utils.Transport(cfg.Addr, cfg.InsecureSkipVerify),
},
addr: strings.TrimSuffix(cfg.Addr, "/"),
user: cfg.Username,

View File

@@ -0,0 +1,366 @@
package remote_read_integration
import (
"context"
"fmt"
"io"
"net/http"
"net/http/httptest"
"strconv"
"strings"
"testing"
"github.com/gogo/protobuf/proto"
"github.com/golang/snappy"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/prompb"
"github.com/prometheus/prometheus/storage/remote"
)
const (
maxBytesInFrame = 1024 * 1024
)
type RemoteReadServer struct {
server *httptest.Server
series []*prompb.TimeSeries
storage *MockStorage
}
// NewRemoteReadServer creates a remote read server. It exposes a single endpoint and responds with the
// passed series based on the request to the read endpoint. It returns a server which should be closed after
// being used.
func NewRemoteReadServer(t *testing.T) *RemoteReadServer {
rrs := &RemoteReadServer{
series: make([]*prompb.TimeSeries, 0),
}
rrs.server = httptest.NewServer(rrs.getReadHandler(t))
return rrs
}
// Close closes the server.
func (rrs *RemoteReadServer) Close() {
rrs.server.Close()
}
func (rrs *RemoteReadServer) URL() string {
return rrs.server.URL
}
func (rrs *RemoteReadServer) SetRemoteReadSeries(series []*prompb.TimeSeries) {
rrs.series = append(rrs.series, series...)
}
func (rrs *RemoteReadServer) getReadHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !validateReadHeaders(t, r) {
t.Fatalf("invalid read headers")
}
compressed, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("error read body: %s", err)
}
reqBuf, err := snappy.Decode(nil, compressed)
if err != nil {
t.Fatalf("error decode compressed data:%s", err)
}
var req prompb.ReadRequest
if err := proto.Unmarshal(reqBuf, &req); err != nil {
t.Fatalf("error unmarshal read request: %s", err)
}
resp := &prompb.ReadResponse{
Results: make([]*prompb.QueryResult, len(req.Queries)),
}
for i, r := range req.Queries {
startTs := r.StartTimestampMs
endTs := r.EndTimestampMs
ts := make([]*prompb.TimeSeries, len(rrs.series))
for i, s := range rrs.series {
var samples []prompb.Sample
for _, sample := range s.Samples {
if sample.Timestamp >= startTs && sample.Timestamp < endTs {
samples = append(samples, sample)
}
}
var series prompb.TimeSeries
if len(samples) > 0 {
series.Labels = s.Labels
series.Samples = samples
}
ts[i] = &series
}
resp.Results[i] = &prompb.QueryResult{Timeseries: ts}
data, err := proto.Marshal(resp)
if err != nil {
t.Fatalf("error marshal response: %s", err)
}
compressed = snappy.Encode(nil, data)
w.Header().Set("Content-Type", "application/x-protobuf")
w.Header().Set("Content-Encoding", "snappy")
w.WriteHeader(http.StatusOK)
if _, err := w.Write(compressed); err != nil {
t.Fatalf("snappy encode error: %s", err)
}
}
})
}
func NewRemoteReadStreamServer(t *testing.T) *RemoteReadServer {
rrs := &RemoteReadServer{
series: make([]*prompb.TimeSeries, 0),
}
rrs.server = httptest.NewServer(rrs.getStreamReadHandler(t))
return rrs
}
func (rrs *RemoteReadServer) InitMockStorage(series []*prompb.TimeSeries) {
rrs.storage = NewMockStorage(series)
}
func (rrs *RemoteReadServer) getStreamReadHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !validateStreamReadHeaders(t, r) {
t.Fatalf("invalid read headers")
}
f, ok := w.(http.Flusher)
if !ok {
t.Fatalf("internal http.ResponseWriter does not implement http.Flusher interface")
}
stream := remote.NewChunkedWriter(w, f)
data, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("error read body: %s", err)
}
decodedData, err := snappy.Decode(nil, data)
if err != nil {
t.Fatalf("error decode compressed data:%s", err)
}
var req prompb.ReadRequest
if err := proto.Unmarshal(decodedData, &req); err != nil {
t.Fatalf("error unmarshal read request: %s", err)
}
var chunks []prompb.Chunk
ctx := context.Background()
for idx, r := range req.Queries {
startTs := r.StartTimestampMs
endTs := r.EndTimestampMs
var matchers []*labels.Matcher
cb := func() (int64, error) { return 0, nil }
c := remote.NewSampleAndChunkQueryableClient(rrs.storage, nil, matchers, true, cb)
q, err := c.ChunkQuerier(ctx, startTs, endTs)
if err != nil {
t.Fatalf("error init chunk querier: %s", err)
}
ss := q.Select(false, nil, matchers...)
for ss.Next() {
series := ss.At()
iter := series.Iterator()
labels := remote.MergeLabels(labelsToLabelsProto(series.Labels()), nil)
frameBytesLeft := maxBytesInFrame
for _, lb := range labels {
frameBytesLeft -= lb.Size()
}
isNext := iter.Next()
for isNext {
chunk := iter.At()
if chunk.Chunk == nil {
t.Fatalf("error found not populated chunk returned by SeriesSet at ref: %v", chunk.Ref)
}
chunks = append(chunks, prompb.Chunk{
MinTimeMs: chunk.MinTime,
MaxTimeMs: chunk.MaxTime,
Type: prompb.Chunk_Encoding(chunk.Chunk.Encoding()),
Data: chunk.Chunk.Bytes(),
})
frameBytesLeft -= chunks[len(chunks)-1].Size()
// We are fine with minor inaccuracy of max bytes per frame. The inaccuracy will be max of full chunk size.
isNext = iter.Next()
if frameBytesLeft > 0 && isNext {
continue
}
resp := &prompb.ChunkedReadResponse{
ChunkedSeries: []*prompb.ChunkedSeries{
{Labels: labels, Chunks: chunks},
},
QueryIndex: int64(idx),
}
b, err := proto.Marshal(resp)
if err != nil {
t.Fatalf("error marshal response: %s", err)
}
if _, err := stream.Write(b); err != nil {
t.Fatalf("error write to stream: %s", err)
}
chunks = chunks[:0]
rrs.storage.Reset()
}
if err := iter.Err(); err != nil {
t.Fatalf("error iterate over chunk series: %s", err)
}
}
}
})
}
func validateReadHeaders(t *testing.T, r *http.Request) bool {
if r.Method != http.MethodPost {
t.Fatalf("got %q method, expected %q", r.Method, http.MethodPost)
}
if r.Header.Get("Content-Encoding") != "snappy" {
t.Fatalf("got %q content encoding header, expected %q", r.Header.Get("Content-Encoding"), "snappy")
}
if r.Header.Get("Content-Type") != "application/x-protobuf" {
t.Fatalf("got %q content type header, expected %q", r.Header.Get("Content-Type"), "application/x-protobuf")
}
remoteReadVersion := r.Header.Get("X-Prometheus-Remote-Read-Version")
if remoteReadVersion == "" {
t.Fatalf("got empty prometheus remote read header")
}
if !strings.HasPrefix(remoteReadVersion, "0.1.") {
t.Fatalf("wrong remote version defined")
}
return true
}
func validateStreamReadHeaders(t *testing.T, r *http.Request) bool {
if r.Method != http.MethodPost {
t.Fatalf("got %q method, expected %q", r.Method, http.MethodPost)
}
if r.Header.Get("Content-Encoding") != "snappy" {
t.Fatalf("got %q content encoding header, expected %q", r.Header.Get("Content-Encoding"), "snappy")
}
if r.Header.Get("Content-Type") != "application/x-streamed-protobuf; proto=prometheus.ChunkedReadResponse" {
t.Fatalf("got %q content type header, expected %q", r.Header.Get("Content-Type"), "application/x-streamed-protobuf; proto=prometheus.ChunkedReadResponse")
}
remoteReadVersion := r.Header.Get("X-Prometheus-Remote-Read-Version")
if remoteReadVersion == "" {
t.Fatalf("got empty prometheus remote read header")
}
if !strings.HasPrefix(remoteReadVersion, "0.1.") {
t.Fatalf("wrong remote version defined")
}
return true
}
func GenerateRemoteReadSeries(start, end, numOfSeries, numOfSamples int64) []*prompb.TimeSeries {
var ts []*prompb.TimeSeries
j := 0
for i := 0; i < int(numOfSeries); i++ {
if i%3 == 0 {
j++
}
timeSeries := prompb.TimeSeries{
Labels: []prompb.Label{
{Name: labels.MetricName, Value: fmt.Sprintf("vm_metric_%d", j)},
{Name: "job", Value: strconv.Itoa(i)},
},
}
ts = append(ts, &timeSeries)
}
for i := range ts {
ts[i].Samples = generateRemoteReadSamples(i, start, end, numOfSamples)
}
return ts
}
func generateRemoteReadSamples(idx int, startTime, endTime, numOfSamples int64) []prompb.Sample {
samples := make([]prompb.Sample, 0)
delta := (endTime - startTime) / numOfSamples
t := startTime
for t != endTime {
v := 100 * int64(idx)
samples = append(samples, prompb.Sample{
Timestamp: t * 1000,
Value: float64(v),
})
t = t + delta
}
return samples
}
type MockStorage struct {
query *prompb.Query
store []*prompb.TimeSeries
}
func NewMockStorage(series []*prompb.TimeSeries) *MockStorage {
return &MockStorage{store: series}
}
func (ms *MockStorage) Read(_ context.Context, query *prompb.Query) (*prompb.QueryResult, error) {
if ms.query != nil {
return nil, fmt.Errorf("expected only one call to remote client got: %v", query)
}
ms.query = query
q := &prompb.QueryResult{Timeseries: make([]*prompb.TimeSeries, 0, len(ms.store))}
for _, s := range ms.store {
var samples []prompb.Sample
for _, sample := range s.Samples {
if sample.Timestamp >= query.StartTimestampMs && sample.Timestamp < query.EndTimestampMs {
samples = append(samples, sample)
}
}
var series prompb.TimeSeries
if len(samples) > 0 {
series.Labels = s.Labels
series.Samples = samples
}
q.Timeseries = append(q.Timeseries, &series)
}
return q, nil
}
func (ms *MockStorage) Reset() {
ms.query = nil
}
func labelsToLabelsProto(labels labels.Labels) []prompb.Label {
result := make([]prompb.Label, 0, len(labels))
for _, l := range labels {
result = append(result, prompb.Label{
Name: l.Name,
Value: l.Value,
})
}
return result
}

View File

@@ -0,0 +1,86 @@
package remote_read_integration
import (
"bufio"
"net/http"
"net/http/httptest"
"reflect"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
)
type RemoteWriteServer struct {
server *httptest.Server
series []vm.TimeSeries
}
// NewRemoteWriteServer prepares test remote write server
func NewRemoteWriteServer(t *testing.T) *RemoteWriteServer {
rws := &RemoteWriteServer{series: make([]vm.TimeSeries, 0)}
mux := http.NewServeMux()
mux.Handle("/api/v1/import", rws.getWriteHandler(t))
mux.Handle("/health", rws.handlePing())
rws.server = httptest.NewServer(mux)
return rws
}
// Close closes the server.
func (rws *RemoteWriteServer) Close() {
rws.server.Close()
}
func (rws *RemoteWriteServer) ExpectedSeries(series []vm.TimeSeries) {
rws.series = append(rws.series, series...)
}
func (rws *RemoteWriteServer) URL() string {
return rws.server.URL
}
func (rws *RemoteWriteServer) getWriteHandler(t *testing.T) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var tss []vm.TimeSeries
scanner := bufio.NewScanner(r.Body)
var rows parser.Rows
for scanner.Scan() {
rows.Unmarshal(scanner.Text())
for _, row := range rows.Rows {
var labelPairs []vm.LabelPair
var ts vm.TimeSeries
nameValue := ""
for _, tag := range row.Tags {
if string(tag.Key) == "__name__" {
nameValue = string(tag.Value)
continue
}
labelPairs = append(labelPairs, vm.LabelPair{Name: string(tag.Key), Value: string(tag.Value)})
}
ts.Values = append(ts.Values, row.Values...)
ts.Timestamps = append(ts.Timestamps, row.Timestamps...)
ts.Name = nameValue
ts.LabelPairs = labelPairs
tss = append(tss, ts)
}
rows.Reset()
}
if !reflect.DeepEqual(tss, rws.series) {
w.WriteHeader(http.StatusInternalServerError)
t.Fatalf("datasets not equal, expected: %#v; \n got: %#v", rws.series, tss)
return
}
w.WriteHeader(http.StatusNoContent)
})
}
func (rws *RemoteWriteServer) handlePing() http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("OK"))
})
}

25
app/vmctl/utils/tls.go Normal file
View File

@@ -0,0 +1,25 @@
package utils
import (
"crypto/tls"
"net/http"
"strings"
)
// Transport creates http.Transport object based on provided URL.
// Returns Transport with TLS configuration if URL contains `https` prefix
func Transport(URL string, insecureSkipVerify bool) *http.Transport {
t := http.DefaultTransport.(*http.Transport).Clone()
if !strings.HasPrefix(URL, "https") {
return t
}
t.TLSClientConfig = TLSConfig(insecureSkipVerify)
return t
}
// TLSConfig creates tls.Config object from provided arguments
func TLSConfig(insecureSkipVerify bool) *tls.Config {
return &tls.Config{
InsecureSkipVerify: insecureSkipVerify,
}
}

View File

@@ -311,6 +311,8 @@ The shortlist of configuration flags include the following:
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -19,7 +19,10 @@ type InsertCtx struct {
mrs []storage.MetricRow
metricNamesBuf []byte
relabelCtx relabel.Ctx
relabelCtx relabel.Ctx
streamAggrCtx streamAggrCtx
skipStreamAggr bool
}
// Reset resets ctx for future fill with rowsLen rows.
@@ -42,6 +45,8 @@ func (ctx *InsertCtx) Reset(rowsLen int) {
ctx.mrs = ctx.mrs[:0]
ctx.metricNamesBuf = ctx.metricNamesBuf[:0]
ctx.relabelCtx.Reset()
ctx.streamAggrCtx.Reset()
ctx.skipStreamAggr = false
}
func (ctx *InsertCtx) marshalMetricNameRaw(prefix []byte, labels []prompb.Label) []byte {
@@ -132,6 +137,16 @@ func (ctx *InsertCtx) ApplyRelabeling() {
// FlushBufs flushes buffered rows to the underlying storage.
func (ctx *InsertCtx) FlushBufs() error {
if sa != nil && !ctx.skipStreamAggr {
ctx.streamAggrCtx.push(ctx.mrs)
if !*streamAggrKeepInput {
ctx.Reset(0)
return nil
}
}
// There is no need in limiting the number of concurrent calls to vmstorage.AddRows() here,
// since the number of concurrent FlushBufs() calls should be already limited via writeconcurrencylimiter
// used at every ParseStream() call under lib/protoparser/*/streamparser.go
err := vmstorage.AddRows(ctx.mrs)
ctx.Reset(0)
if err == nil {

View File

@@ -0,0 +1,119 @@
package common
import (
"flag"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
)
var (
streamAggrConfig = flag.String("streamAggr.config", "", "Optional path to file with stream aggregation config. "+
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See also -remoteWrite.streamAggr.keepInput")
streamAggrKeepInput = flag.Bool("streamAggr.keepInput", false, "Whether to keep input samples after the aggregation with -streamAggr.config. "+
"By default the input is dropped after the aggregation, so only the aggregate data is stored. "+
"See https://docs.victoriametrics.com/stream-aggregation.html")
)
// InitStreamAggr must be called after flag.Parse and before using the common package.
//
// MustStopStreamAggr must be called when stream aggr is no longer needed.
func InitStreamAggr() {
if *streamAggrConfig == "" {
// Nothing to initialize
return
}
a, err := streamaggr.LoadFromFile(*streamAggrConfig, pushAggregateSeries)
if err != nil {
logger.Fatalf("cannot load -streamAggr.config=%q: %s", *streamAggrConfig, err)
}
sa = a
}
// MustStopStreamAggr stops stream aggregators.
func MustStopStreamAggr() {
sa.MustStop()
sa = nil
}
var sa *streamaggr.Aggregators
type streamAggrCtx struct {
mn storage.MetricName
tss [1]prompbmarshal.TimeSeries
}
func (ctx *streamAggrCtx) Reset() {
ctx.mn.Reset()
ts := &ctx.tss[0]
promrelabel.CleanLabels(ts.Labels)
}
func (ctx *streamAggrCtx) push(mrs []storage.MetricRow) {
mn := &ctx.mn
tss := ctx.tss[:]
ts := &tss[0]
labels := ts.Labels
samples := ts.Samples
for _, mr := range mrs {
if err := mn.UnmarshalRaw(mr.MetricNameRaw); err != nil {
logger.Panicf("BUG: cannot unmarshal recently marshaled MetricName: %s", err)
}
labels = append(labels[:0], prompbmarshal.Label{
Name: "__name__",
Value: bytesutil.ToUnsafeString(mn.MetricGroup),
})
for _, tag := range mn.Tags {
labels = append(labels, prompbmarshal.Label{
Name: bytesutil.ToUnsafeString(tag.Key),
Value: bytesutil.ToUnsafeString(tag.Value),
})
}
samples = append(samples[:0], prompbmarshal.Sample{
Timestamp: mr.Timestamp,
Value: mr.Value,
})
ts.Labels = labels
ts.Samples = samples
sa.Push(tss)
}
}
func pushAggregateSeries(tss []prompbmarshal.TimeSeries) {
currentTimestamp := int64(fasttime.UnixTimestamp()) * 1000
var ctx InsertCtx
ctx.Reset(len(tss))
ctx.skipStreamAggr = true
for _, ts := range tss {
labels := ts.Labels
for _, label := range labels {
name := label.Name
if name == "__name__" {
name = ""
}
ctx.AddLabel(name, label.Value)
}
value := ts.Samples[0].Value
if err := ctx.WriteDataPoint(nil, ctx.Labels, currentTimestamp, value); err != nil {
logger.Errorf("cannot store aggregate series: %s", err)
// Do not continue pushing the remaining samples, since it is likely they will return the same error.
return
}
}
// There is no need in limiting the number of concurrent calls to vmstorage.AddRows() here,
// since the number of concurrent pushAggregateSeries() calls should be already limited by lib/streamaggr.
if err := vmstorage.AddRows(ctx.mrs); err != nil {
logger.Errorf("cannot flush aggregate series: %s", err)
}
}

View File

@@ -8,7 +8,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -23,10 +22,8 @@ func InsertHandler(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
}

View File

@@ -1,7 +1,6 @@
package datadog
import (
"fmt"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
@@ -9,7 +8,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -26,15 +24,9 @@ func InsertHandlerForHTTP(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
ce := req.Header.Get("Content-Encoding")
err := parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return insertRows(series, extraLabels)
})
if err != nil {
return fmt.Errorf("headers: %q; err: %w", req.Header, err)
}
return nil
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return insertRows(series, extraLabels)
})
}

View File

@@ -6,7 +6,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -19,9 +18,7 @@ var (
//
// See https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol
func InsertHandler(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, insertRows)
})
return parser.ParseStream(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -15,7 +15,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -35,10 +34,8 @@ var (
//
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, false, "", "", func(db string, rows []parser.Row) error {
return insertRows(db, rows, nil)
})
return parser.ParseStream(r, false, "", "", func(db string, rows []parser.Row) error {
return insertRows(db, rows, nil)
})
}
@@ -50,15 +47,13 @@ func InsertHandlerForHTTP(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(db, rows, extraLabels)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(db, rows, extraLabels)
})
}

View File

@@ -9,6 +9,7 @@ import (
"sync/atomic"
"time"
vminsertCommon "github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/graphite"
@@ -34,7 +35,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promscrape"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -66,10 +66,10 @@ var staticServer = http.FileServer(http.FS(staticFiles))
// Init initializes vminsert.
func Init() {
relabel.Init()
vminsertCommon.InitStreamAggr()
storage.SetMaxLabelsPerTimeseries(*maxLabelsPerTimeseries)
storage.SetMaxLabelValueLen(*maxLabelValueLen)
common.StartUnmarshalWorkers()
writeconcurrencylimiter.Init()
if len(*graphiteListenAddr) > 0 {
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, graphite.InsertHandler)
}
@@ -103,6 +103,7 @@ func Stop() {
opentsdbhttpServer.MustStop()
}
common.StopUnmarshalWorkers()
vminsertCommon.MustStopStreamAggr()
}
// RequestHandler is a handler for Prometheus remote storage write API
@@ -127,7 +128,14 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
statusCode := http.StatusNoContent
if strings.HasPrefix(path, "/prometheus/api/v1/import/prometheus/metrics/job/") ||
strings.HasPrefix(path, "/api/v1/import/prometheus/metrics/job/") {
// Return 200 status code for pushgateway requests.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3636
statusCode = http.StatusOK
}
w.WriteHeader(statusCode)
return true
}
if strings.HasPrefix(path, "/datadog/") {
@@ -245,12 +253,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/prometheus/config", "/config":
if *configAuthKey != "" && r.FormValue("authKey") != *configAuthKey {
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("The provided authKey doesn't match -configAuthKey"),
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
return true
}
promscrapeConfigRequests.Inc()
@@ -259,12 +262,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/prometheus/api/v1/status/config", "/api/v1/status/config":
// See https://prometheus.io/docs/prometheus/latest/querying/api/#config
if *configAuthKey != "" && r.FormValue("authKey") != *configAuthKey {
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("The provided authKey doesn't match -configAuthKey"),
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
return true
}
promscrapeStatusConfigRequests.Inc()

View File

@@ -12,7 +12,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,10 +27,8 @@ func InsertHandler(req *http.Request) error {
return err
}
isGzip := req.Header.Get("Content-Encoding") == "gzip"
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return insertRows(block, extraLabels)
})
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return insertRows(block, extraLabels)
})
}

View File

@@ -6,7 +6,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -19,9 +18,7 @@ var (
//
// See http://opentsdb.net/docs/build/html/api_telnet/put.html
func InsertHandler(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, insertRows)
})
return parser.ParseStream(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -9,7 +9,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,10 +27,8 @@ func InsertHandler(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
default:
return fmt.Errorf("unexpected path requested on HTTP OpenTSDB server: %q", path)

View File

@@ -8,7 +8,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -27,12 +26,10 @@ func InsertHandler(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
}, nil)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
}, nil)
}
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {

View File

@@ -9,7 +9,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -24,10 +23,8 @@ func InsertHandler(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
return insertRows(tss, extraLabels)
})
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
return insertRows(tss, extraLabels)
})
}

View File

@@ -12,7 +12,6 @@ import (
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
@@ -29,11 +28,9 @@ func InsertHandler(req *http.Request) error {
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
}

View File

@@ -129,6 +129,8 @@ i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/q
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerJSONFields string
Allows renaming fields in JSON formatted logs. Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message". Supported fields: ts, level, caller, msg
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string

View File

@@ -16,6 +16,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/metricsql"
)
var maxTagValueSuffixes = flag.Int("search.maxTagValueSuffixesPerSearch", 100e3, "The maximum number of tag value suffixes returned from /metrics/find")
@@ -349,7 +350,7 @@ func getRegexpForQuery(query string, delimiter byte) (*regexp.Regexp, error) {
if len(tail) > 0 {
return nil, fmt.Errorf("unexpected tail left after parsing query %q; tail: %q", query, tail)
}
re, err := regexp.Compile(rs)
re, err := metricsql.CompileRegexp(rs)
regexpCache[k] = &regexpCacheEntry{
re: re,
err: err,

View File

@@ -62,7 +62,7 @@ func Init() {
netstorage.InitTmpBlocksDir(tmpDirPath)
promql.InitRollupResultCache(*vmstorage.DataPath + "/cache/rollupResult")
concurrencyCh = make(chan struct{}, *maxConcurrentRequests)
concurrencyLimitCh = make(chan struct{}, *maxConcurrentRequests)
initVMAlertProxy()
}
@@ -71,17 +71,17 @@ func Stop() {
promql.StopRollupResultCache()
}
var concurrencyCh chan struct{}
var concurrencyLimitCh chan struct{}
var (
concurrencyLimitReached = metrics.NewCounter(`vm_concurrent_select_limit_reached_total`)
concurrencyLimitTimeout = metrics.NewCounter(`vm_concurrent_select_limit_timeout_total`)
_ = metrics.NewGauge(`vm_concurrent_select_capacity`, func() float64 {
return float64(cap(concurrencyCh))
return float64(cap(concurrencyLimitCh))
})
_ = metrics.NewGauge(`vm_concurrent_select_current`, func() float64 {
return float64(len(concurrencyCh))
return float64(len(concurrencyLimitCh))
})
)
@@ -99,8 +99,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
// Limit the number of concurrent queries.
select {
case concurrencyCh <- struct{}{}:
defer func() { <-concurrencyCh }()
case concurrencyLimitCh <- struct{}{}:
defer func() { <-concurrencyLimitCh }()
default:
// Sleep for a while until giving up. This should resolve short bursts in requests.
concurrencyLimitReached.Inc()
@@ -110,18 +110,18 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
t := timerpool.Get(d)
select {
case concurrencyCh <- struct{}{}:
qt.Printf("wait in queue because -search.maxConcurrentRequests=%d concurrent requests are executed", *maxConcurrentRequests)
case concurrencyLimitCh <- struct{}{}:
timerpool.Put(t)
defer func() { <-concurrencyCh }()
qt.Printf("wait in queue because -search.maxConcurrentRequests=%d concurrent requests are executed", *maxConcurrentRequests)
defer func() { <-concurrencyLimitCh }()
case <-t.C:
timerpool.Put(t)
concurrencyLimitTimeout.Inc()
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot handle more than %d concurrent search requests during %s; possible solutions: "+
"increase `-search.maxQueueDuration`; increase `-search.maxQueryDuration`; increase `-search.maxConcurrentRequests`; "+
"increase server capacity",
*maxConcurrentRequests, d),
Err: fmt.Errorf("couldn't start executing the request in %.3f seconds, since -search.maxConcurrentRequests=%d concurrent requests "+
"are executed. Possible solutions: to reduce query load; to add more compute resources to the server; "+
"to increase -search.maxQueueDuration=%s; to increase -search.maxQueryDuration; to increase -search.maxConcurrentRequests",
d.Seconds(), *maxConcurrentRequests, maxQueueDuration),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
@@ -145,8 +145,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := strings.Replace(r.URL.Path, "//", "/", -1)
if path == "/internal/resetRollupResultCache" {
if len(*resetCacheAuthKey) > 0 && r.FormValue("authKey") != *resetCacheAuthKey {
sendPrometheusError(w, r, fmt.Errorf("invalid authKey=%q for %q", r.FormValue("authKey"), path))
if !httpserver.CheckAuthFlag(w, r, *resetCacheAuthKey, "resetCacheAuthKey") {
return true
}
promql.ResetRollupResultCache()
@@ -162,8 +161,10 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
case strings.HasPrefix(path, "/graphite/"):
path = path[len("/graphite"):]
}
// vmui access.
if path == "/vmui" || path == "/graph" {
switch {
case path == "/vmui" || path == "/graph":
// VMUI access via incomplete url without `/` in the end. Redirect to complete url.
// Use relative redirect, since, since the hostname and path prefix may be incorrect if VictoriaMetrics
// is hidden behind vmauth or similar proxy.
@@ -172,18 +173,31 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
newURL := path + "/?" + r.Form.Encode()
httpserver.Redirect(w, newURL)
return true
}
if strings.HasPrefix(path, "/vmui/") {
case strings.HasPrefix(path, "/vmui/"):
if path == "/vmui/custom-dashboards" {
if err := handleVMUICustomDashboards(w); err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
}
r.URL.Path = path
vmuiFileServer.ServeHTTP(w, r)
return true
}
if strings.HasPrefix(path, "/graph/") {
case strings.HasPrefix(path, "/graph/"):
// This is needed for serving /graph URLs from Prometheus datasource in Grafana.
if path == "/graph/custom-dashboards" {
if err := handleVMUICustomDashboards(w); err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
}
r.URL.Path = strings.Replace(path, "/graph/", "/vmui/", 1)
vmuiFileServer.ServeHTTP(w, r)
return true
}
if strings.HasPrefix(path, "/api/v1/label/") {
s := path[len("/api/v1/label/"):]
if strings.HasSuffix(s, "/values") {
@@ -412,12 +426,10 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/tags/delSeries":
graphiteTagsDelSeriesRequests.Inc()
authKey := r.FormValue("authKey")
if authKey != *deleteAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -deleteAuthKey command line flag", authKey)
if !httpserver.CheckAuthFlag(w, r, *deleteAuthKey, "deleteAuthKey") {
return true
}
graphiteTagsDelSeriesRequests.Inc()
if err := graphite.TagsDelSeriesHandler(startTime, w, r); err != nil {
graphiteTagsDelSeriesErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
@@ -474,12 +486,10 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
fmt.Fprintf(w, "%s", `{"status":"success","data":[]}`)
return true
case "/api/v1/admin/tsdb/delete_series":
deleteRequests.Inc()
authKey := r.FormValue("authKey")
if authKey != *deleteAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -deleteAuthKey command line flag", authKey)
if !httpserver.CheckAuthFlag(w, r, *deleteAuthKey, "deleteAuthKey") {
return true
}
deleteRequests.Inc()
if err := prometheus.DeleteHandler(startTime, r); err != nil {
deleteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)

View File

@@ -5,8 +5,7 @@ import (
"errors"
"flag"
"fmt"
"math/rand"
"regexp"
"runtime"
"sort"
"sync"
"sync/atomic"
@@ -17,11 +16,10 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/metrics"
"github.com/valyala/fastrand"
"github.com/VictoriaMetrics/metricsql"
)
var (
@@ -133,16 +131,66 @@ func (tsw *timeseriesWork) do(r *Result, workerID uint) error {
return nil
}
func timeseriesWorker(tsws []*timeseriesWork, workerID uint) {
func timeseriesWorker(qt *querytracer.Tracer, workChs []chan *timeseriesWork, workerID uint) {
tmpResult := getTmpResult()
// Perform own work at first.
rowsProcessed := 0
seriesProcessed := 0
ch := workChs[workerID]
for tsw := range ch {
tsw.err = tsw.do(&tmpResult.rs, workerID)
rowsProcessed += tsw.rowsProcessed
seriesProcessed++
}
qt.Printf("own work processed: series=%d, samples=%d", seriesProcessed, rowsProcessed)
// Then help others with the remaining work.
rowsProcessed = 0
seriesProcessed = 0
idx := int(workerID)
for {
tsw, idxNext := stealTimeseriesWork(workChs, idx)
if tsw == nil {
// There is no more work
break
}
tsw.err = tsw.do(&tmpResult.rs, workerID)
rowsProcessed += tsw.rowsProcessed
seriesProcessed++
idx = idxNext
}
qt.Printf("others work processed: series=%d, samples=%d", seriesProcessed, rowsProcessed)
putTmpResult(tmpResult)
}
func stealTimeseriesWork(workChs []chan *timeseriesWork, startIdx int) (*timeseriesWork, int) {
for i := startIdx; i < startIdx+len(workChs); i++ {
// Give a chance other goroutines to perform their work
runtime.Gosched()
idx := i % len(workChs)
ch := workChs[idx]
// It is expected that every channel in the workChs is already closed,
// so the next line should return immediately.
tsw, ok := <-ch
if ok {
return tsw, idx
}
}
return nil, startIdx
}
func getTmpResult() *result {
v := resultPool.Get()
if v == nil {
v = &result{}
}
r := v.(*result)
for _, tsw := range tsws {
err := tsw.do(&r.rs, workerID)
tsw.err = err
}
return v.(*result)
}
func putTmpResult(r *result) {
currentTime := fasttime.UnixTimestamp()
if cap(r.rs.Values) > 1024*1024 && 4*len(r.rs.Values) < cap(r.rs.Values) && currentTime-r.lastResetTime > 10 {
// Reset r.rs in order to preseve memory usage after processing big time series with millions of rows.
@@ -170,87 +218,113 @@ func (rss *Results) RunParallel(qt *querytracer.Tracer, f func(rs *Result, worke
qt = qt.NewChild("parallel process of fetched data")
defer rss.mustClose()
// Prepare work for workers.
tsws := make([]*timeseriesWork, len(rss.packedTimeseries))
rowsProcessedTotal, err := rss.runParallel(qt, f)
seriesProcessedTotal := len(rss.packedTimeseries)
rss.packedTimeseries = rss.packedTimeseries[:0]
rowsReadPerQuery.Update(float64(rowsProcessedTotal))
seriesReadPerQuery.Update(float64(seriesProcessedTotal))
qt.Donef("series=%d, samples=%d", seriesProcessedTotal, rowsProcessedTotal)
return err
}
func (rss *Results) runParallel(qt *querytracer.Tracer, f func(rs *Result, workerID uint) error) (int, error) {
tswsLen := len(rss.packedTimeseries)
if tswsLen == 0 {
// Nothing to process
return 0, nil
}
var mustStop uint32
for i := range rss.packedTimeseries {
tsw := getTimeseriesWork()
initTimeseriesWork := func(tsw *timeseriesWork, pts *packedTimeseries) {
tsw.rss = rss
tsw.pts = &rss.packedTimeseries[i]
tsw.pts = pts
tsw.f = f
tsw.mustStop = &mustStop
}
if gomaxprocs == 1 || tswsLen == 1 {
// It is faster to process time series in the current goroutine.
tsw := getTimeseriesWork()
tmpResult := getTmpResult()
rowsProcessedTotal := 0
var err error
for i := range rss.packedTimeseries {
initTimeseriesWork(tsw, &rss.packedTimeseries[i])
err = tsw.do(&tmpResult.rs, 0)
rowsReadPerSeries.Update(float64(tsw.rowsProcessed))
rowsProcessedTotal += tsw.rowsProcessed
if err != nil {
break
}
tsw.reset()
}
putTmpResult(tmpResult)
putTimeseriesWork(tsw)
return rowsProcessedTotal, err
}
// Slow path - spin up multiple local workers for parallel data processing.
// Do not use global workers pool, since it increases inter-CPU memory ping-poing,
// which reduces the scalability on systems with many CPU cores.
// Prepare the work for workers.
tsws := make([]*timeseriesWork, len(rss.packedTimeseries))
for i := range rss.packedTimeseries {
tsw := getTimeseriesWork()
initTimeseriesWork(tsw, &rss.packedTimeseries[i])
tsws[i] = tsw
}
// Shuffle tsws for providing the equal amount of work among workers.
r := getRand()
r.Shuffle(len(tsws), func(i, j int) {
tsws[i], tsws[j] = tsws[j], tsws[i]
})
putRand(r)
// Spin up up to gomaxprocs local workers and split work equally among them.
// This guarantees linear scalability with the increase of gomaxprocs
// (e.g. the number of available CPU cores).
itemsPerWorker := 1
if len(rss.packedTimeseries) > gomaxprocs {
itemsPerWorker = 1 + len(rss.packedTimeseries)/gomaxprocs
// Prepare worker channels.
workers := len(tsws)
if workers > gomaxprocs {
workers = gomaxprocs
}
var start int
var i uint
itemsPerWorker := (len(tsws) + workers - 1) / workers
workChs := make([]chan *timeseriesWork, workers)
for i := range workChs {
workChs[i] = make(chan *timeseriesWork, itemsPerWorker)
}
// Spread work among workers.
for i, tsw := range tsws {
idx := i % len(workChs)
workChs[idx] <- tsw
}
// Mark worker channels as closed.
for _, workCh := range workChs {
close(workCh)
}
// Start workers and wait until they finish the work.
var wg sync.WaitGroup
for start < len(tsws) {
end := start + itemsPerWorker
if end > len(tsws) {
end = len(tsws)
}
chunk := tsws[start:end]
for i := range workChs {
wg.Add(1)
go func(tswsChunk []*timeseriesWork, workerID uint) {
defer wg.Done()
timeseriesWorker(tswsChunk, workerID)
}(chunk, i)
start = end
i++
qtChild := qt.NewChild("worker #%d", i)
go func(workerID uint) {
timeseriesWorker(qtChild, workChs, workerID)
qtChild.Done()
wg.Done()
}(uint(i))
}
// Wait until work is complete.
wg.Wait()
// Collect results.
var firstErr error
rowsProcessedTotal := 0
for _, tsw := range tsws {
if err := tsw.err; err != nil && firstErr == nil {
if tsw.err != nil && firstErr == nil {
// Return just the first error, since other errors are likely duplicate the first error.
firstErr = err
firstErr = tsw.err
}
rowsReadPerSeries.Update(float64(tsw.rowsProcessed))
rowsProcessedTotal += tsw.rowsProcessed
putTimeseriesWork(tsw)
}
seriesProcessedTotal := len(rss.packedTimeseries)
rss.packedTimeseries = rss.packedTimeseries[:0]
rowsReadPerQuery.Update(float64(rowsProcessedTotal))
seriesReadPerQuery.Update(float64(seriesProcessedTotal))
qt.Donef("series=%d, samples=%d", seriesProcessedTotal, rowsProcessedTotal)
return firstErr
}
var randPool sync.Pool
func getRand() *rand.Rand {
v := randPool.Get()
if v == nil {
v = rand.New(rand.NewSource(int64(fasttime.UnixTimestamp())))
}
return v.(*rand.Rand)
}
func putRand(r *rand.Rand) {
randPool.Put(r)
return rowsProcessedTotal, firstErr
}
var (
@@ -266,48 +340,30 @@ type packedTimeseries struct {
brs []blockRef
}
type unpackWorkItem struct {
br blockRef
tr storage.TimeRange
}
type unpackWork struct {
tbf *tmpBlocksFile
ws []unpackWorkItem
sbs []*sortBlock
doneCh chan error
tbf *tmpBlocksFile
br blockRef
tr storage.TimeRange
sb *sortBlock
err error
}
func (upw *unpackWork) reset() {
upw.tbf = nil
ws := upw.ws
for i := range ws {
w := &ws[i]
w.br = blockRef{}
w.tr = storage.TimeRange{}
}
upw.ws = upw.ws[:0]
sbs := upw.sbs
for i := range sbs {
sbs[i] = nil
}
upw.sbs = upw.sbs[:0]
if n := len(upw.doneCh); n > 0 {
logger.Panicf("BUG: upw.doneCh must be empty; it contains %d items now", n)
}
upw.br = blockRef{}
upw.tr = storage.TimeRange{}
upw.sb = nil
upw.err = nil
}
func (upw *unpackWork) unpack(tmpBlock *storage.Block) {
for _, w := range upw.ws {
sb := getSortBlock()
if err := sb.unpackFrom(tmpBlock, upw.tbf, w.br, w.tr); err != nil {
putSortBlock(sb)
upw.doneCh <- fmt.Errorf("cannot unpack block: %w", err)
return
}
upw.sbs = append(upw.sbs, sb)
sb := getSortBlock()
if err := sb.unpackFrom(tmpBlock, upw.tbf, upw.br, upw.tr); err != nil {
putSortBlock(sb)
upw.err = fmt.Errorf("cannot unpack block: %w", err)
return
}
upw.doneCh <- nil
upw.sb = sb
}
func getUnpackWork() *unpackWork {
@@ -315,9 +371,7 @@ func getUnpackWork() *unpackWork {
if v != nil {
return v.(*unpackWork)
}
return &unpackWork{
doneCh: make(chan error, 1),
}
return &unpackWork{}
}
func putUnpackWork(upw *unpackWork) {
@@ -327,49 +381,60 @@ func putUnpackWork(upw *unpackWork) {
var unpackWorkPool sync.Pool
func scheduleUnpackWork(workChs []chan *unpackWork, uw *unpackWork) {
if len(workChs) == 1 {
// Fast path for a single worker
workChs[0] <- uw
return
}
attempts := 0
for {
idx := fastrand.Uint32n(uint32(len(workChs)))
select {
case workChs[idx] <- uw:
return
default:
attempts++
if attempts >= len(workChs) {
workChs[idx] <- uw
return
}
}
}
}
func unpackWorker(workChs []chan *unpackWork, workerID uint) {
tmpBlock := getTmpStorageBlock()
func unpackWorker(ch <-chan *unpackWork) {
v := tmpBlockPool.Get()
if v == nil {
v = &storage.Block{}
}
tmpBlock := v.(*storage.Block)
// Deal with own work at first.
ch := workChs[workerID]
for upw := range ch {
upw.unpack(tmpBlock)
}
tmpBlockPool.Put(v)
// Then help others with their work.
idx := int(workerID)
for {
upw, idxNext := stealUnpackWork(workChs, idx)
if upw == nil {
// There is no more work
break
}
upw.unpack(tmpBlock)
idx = idxNext
}
putTmpStorageBlock(tmpBlock)
}
var tmpBlockPool sync.Pool
func stealUnpackWork(workChs []chan *unpackWork, startIdx int) (*unpackWork, int) {
for i := startIdx; i < startIdx+len(workChs); i++ {
// Give a chance other goroutines to perform their work
runtime.Gosched()
// unpackBatchSize is the maximum number of blocks that may be unpacked at once by a single goroutine.
//
// It is better to load a single goroutine for up to one second on a system with many CPU cores
// in order to reduce inter-CPU memory ping-pong.
// A single goroutine can unpack up to 40 millions of rows per second, while a single block contains up to 8K rows.
// So the batch size should be 40M / 8K = 5K.
var unpackBatchSize = 5000
idx := i % len(workChs)
ch := workChs[idx]
// It is expected that every channel in the workChs is already closed,
// so the next line should return immediately.
upw, ok := <-ch
if ok {
return upw, idx
}
}
return nil, startIdx
}
func getTmpStorageBlock() *storage.Block {
v := tmpStorageBlockPool.Get()
if v == nil {
v = &storage.Block{}
}
return v.(*storage.Block)
}
func putTmpStorageBlock(sb *storage.Block) {
tmpStorageBlockPool.Put(sb)
}
var tmpStorageBlockPool sync.Pool
// Unpack unpacks pts to dst.
func (pts *packedTimeseries) Unpack(dst *Result, tbf *tmpBlocksFile, tr storage.TimeRange) error {
@@ -377,93 +442,131 @@ func (pts *packedTimeseries) Unpack(dst *Result, tbf *tmpBlocksFile, tr storage.
if err := dst.MetricName.Unmarshal(bytesutil.ToUnsafeBytes(pts.metricName)); err != nil {
return fmt.Errorf("cannot unmarshal metricName %q: %w", pts.metricName, err)
}
sbh := getSortBlocksHeap()
var err error
sbh.sbs, err = pts.unpackTo(sbh.sbs[:0], tbf, tr)
pts.brs = pts.brs[:0]
if err != nil {
putSortBlocksHeap(sbh)
return err
}
dedupInterval := storage.GetDedupInterval()
mergeSortBlocks(dst, sbh, dedupInterval)
putSortBlocksHeap(sbh)
return nil
}
// Spin up local workers.
func (pts *packedTimeseries) unpackTo(dst []*sortBlock, tbf *tmpBlocksFile, tr storage.TimeRange) ([]*sortBlock, error) {
upwsLen := len(pts.brs)
if upwsLen == 0 {
// Nothing to do
return nil, nil
}
initUnpackWork := func(upw *unpackWork, br blockRef) {
upw.tbf = tbf
upw.br = br
upw.tr = tr
}
if gomaxprocs == 1 || upwsLen <= 1000 {
// It is faster to unpack all the data in the current goroutine.
upw := getUnpackWork()
samples := 0
tmpBlock := getTmpStorageBlock()
var err error
for _, br := range pts.brs {
initUnpackWork(upw, br)
upw.unpack(tmpBlock)
if upw.err != nil {
return dst, upw.err
}
samples += len(upw.sb.Timestamps)
if *maxSamplesPerSeries > 0 && samples > *maxSamplesPerSeries {
putSortBlock(upw.sb)
err = fmt.Errorf("cannot process more than %d samples per series; either increase -search.maxSamplesPerSeries "+
"or reduce time range for the query", *maxSamplesPerSeries)
break
}
dst = append(dst, upw.sb)
upw.reset()
}
putTmpStorageBlock(tmpBlock)
putUnpackWork(upw)
return dst, err
}
// Slow path - spin up multiple local workers for parallel data unpacking.
// Do not use global workers pool, since it increases inter-CPU memory ping-poing,
// which reduces the scalability on systems with many CPU cores.
brsLen := len(pts.brs)
workers := brsLen / unpackBatchSize
// Prepare the work for workers.
upws := make([]*unpackWork, upwsLen)
for i, br := range pts.brs {
upw := getUnpackWork()
initUnpackWork(upw, br)
upws[i] = upw
}
// Prepare worker channels.
workers := len(upws)
if workers > gomaxprocs {
workers = gomaxprocs
}
if workers < 1 {
workers = 1
}
itemsPerWorker := (len(upws) + workers - 1) / workers
workChs := make([]chan *unpackWork, workers)
var workChsWG sync.WaitGroup
for i := range workChs {
workChs[i] = make(chan *unpackWork, itemsPerWorker)
}
// Spread work among worker channels.
for i, upw := range upws {
idx := i % len(workChs)
workChs[idx] <- upw
}
// Mark worker channels as closed.
for _, workCh := range workChs {
close(workCh)
}
// Start workers and wait until they finish the work.
var wg sync.WaitGroup
for i := 0; i < workers; i++ {
// Use unbuffered channel on purpose, since there are high chances
// that only a single unpackWork is needed to unpack.
// The unbuffered channel should reduce inter-CPU ping-pong in this case,
// which should improve the performance in a system with many CPU cores.
workChs[i] = make(chan *unpackWork)
workChsWG.Add(1)
go func(workerID int) {
defer workChsWG.Done()
unpackWorker(workChs[workerID])
}(i)
wg.Add(1)
go func(workerID uint) {
unpackWorker(workChs, workerID)
wg.Done()
}(uint(i))
}
wg.Wait()
// Feed workers with work
upws := make([]*unpackWork, 0, 1+brsLen/unpackBatchSize)
upw := getUnpackWork()
upw.tbf = tbf
for _, br := range pts.brs {
if len(upw.ws) >= unpackBatchSize {
scheduleUnpackWork(workChs, upw)
upws = append(upws, upw)
upw = getUnpackWork()
upw.tbf = tbf
}
upw.ws = append(upw.ws, unpackWorkItem{
br: br,
tr: tr,
})
}
scheduleUnpackWork(workChs, upw)
upws = append(upws, upw)
pts.brs = pts.brs[:0]
// Wait until work is complete
// Collect results.
samples := 0
sbs := make([]*sortBlock, 0, brsLen)
var firstErr error
for _, upw := range upws {
if err := <-upw.doneCh; err != nil && firstErr == nil {
if upw.err != nil && firstErr == nil {
// Return the first error only, since other errors are likely the same.
firstErr = err
firstErr = upw.err
}
if firstErr == nil {
for _, sb := range upw.sbs {
samples += len(sb.Timestamps)
}
if *maxSamplesPerSeries <= 0 || samples < *maxSamplesPerSeries {
sbs = append(sbs, upw.sbs...)
} else {
sb := upw.sb
samples += len(sb.Timestamps)
if *maxSamplesPerSeries > 0 && samples > *maxSamplesPerSeries {
putSortBlock(sb)
firstErr = fmt.Errorf("cannot process more than %d samples per series; either increase -search.maxSamplesPerSeries "+
"or reduce time range for the query", *maxSamplesPerSeries)
} else {
dst = append(dst, sb)
}
}
if firstErr != nil {
for _, sb := range upw.sbs {
putSortBlock(sb)
}
} else {
putSortBlock(upw.sb)
}
putUnpackWork(upw)
}
// Shut down local workers
for _, workCh := range workChs {
close(workCh)
}
workChsWG.Wait()
if firstErr != nil {
return firstErr
}
dedupInterval := storage.GetDedupInterval()
mergeSortBlocks(dst, sbs, dedupInterval)
return nil
return dst, firstErr
}
func getSortBlock() *sortBlock {
@@ -483,24 +586,25 @@ var sbPool sync.Pool
var metricRowsSkipped = metrics.NewCounter(`vm_metric_rows_skipped_total{name="vmselect"}`)
func mergeSortBlocks(dst *Result, sbh sortBlocksHeap, dedupInterval int64) {
func mergeSortBlocks(dst *Result, sbh *sortBlocksHeap, dedupInterval int64) {
// Skip empty sort blocks, since they cannot be passed to heap.Init.
src := sbh
sbh = sbh[:0]
for _, sb := range src {
sbs := sbh.sbs[:0]
for _, sb := range sbh.sbs {
if len(sb.Timestamps) == 0 {
putSortBlock(sb)
continue
}
sbh = append(sbh, sb)
sbs = append(sbs, sb)
}
if len(sbh) == 0 {
sbh.sbs = sbs
if sbh.Len() == 0 {
return
}
heap.Init(&sbh)
heap.Init(sbh)
for {
top := sbh[0]
if len(sbh) == 1 {
sbs := sbh.sbs
top := sbs[0]
if len(sbs) == 1 {
dst.Timestamps = append(dst.Timestamps, top.Timestamps[top.NextIdx:]...)
dst.Values = append(dst.Values, top.Values[top.NextIdx:]...)
putSortBlock(top)
@@ -508,21 +612,20 @@ func mergeSortBlocks(dst *Result, sbh sortBlocksHeap, dedupInterval int64) {
}
sbNext := sbh.getNextBlock()
tsNext := sbNext.Timestamps[sbNext.NextIdx]
topTimestamps := top.Timestamps
topNextIdx := top.NextIdx
if n := equalTimestampsPrefix(topTimestamps[topNextIdx:], sbNext.Timestamps[sbNext.NextIdx:]); n > 0 && dedupInterval > 0 {
if n := equalSamplesPrefix(top, sbNext); n > 0 && dedupInterval > 0 {
// Skip n replicated samples at top if deduplication is enabled.
top.NextIdx = topNextIdx + n
} else {
// Copy samples from top to dst with timestamps not exceeding tsNext.
top.NextIdx = topNextIdx + binarySearchTimestamps(topTimestamps[topNextIdx:], tsNext)
dst.Timestamps = append(dst.Timestamps, topTimestamps[topNextIdx:top.NextIdx]...)
top.NextIdx = topNextIdx + binarySearchTimestamps(top.Timestamps[topNextIdx:], tsNext)
dst.Timestamps = append(dst.Timestamps, top.Timestamps[topNextIdx:top.NextIdx]...)
dst.Values = append(dst.Values, top.Values[topNextIdx:top.NextIdx]...)
}
if top.NextIdx < len(topTimestamps) {
heap.Fix(&sbh, 0)
if top.NextIdx < len(top.Timestamps) {
heap.Fix(sbh, 0)
} else {
heap.Pop(&sbh)
heap.Pop(sbh)
putSortBlock(top)
}
}
@@ -535,6 +638,14 @@ func mergeSortBlocks(dst *Result, sbh sortBlocksHeap, dedupInterval int64) {
var dedupsDuringSelect = metrics.NewCounter(`vm_deduplicated_samples_total{type="select"}`)
func equalSamplesPrefix(a, b *sortBlock) int {
n := equalTimestampsPrefix(a.Timestamps[a.NextIdx:], b.Timestamps[b.NextIdx:])
if n == 0 {
return 0
}
return equalValuesPrefix(a.Values[a.NextIdx:a.NextIdx+n], b.Values[b.NextIdx:b.NextIdx+n])
}
func equalTimestampsPrefix(a, b []int64) int {
for i, v := range a {
if i >= len(b) || v != b[i] {
@@ -544,6 +655,15 @@ func equalTimestampsPrefix(a, b []int64) int {
return len(a)
}
func equalValuesPrefix(a, b []float64) int {
for i, v := range a {
if i >= len(b) || v != b[i] {
return i
}
}
return len(a)
}
func binarySearchTimestamps(timestamps []int64, ts int64) int {
// The code has been adapted from sort.Search.
n := len(timestamps)
@@ -588,48 +708,73 @@ func (sb *sortBlock) unpackFrom(tmpBlock *storage.Block, tbf *tmpBlocksFile, br
return nil
}
type sortBlocksHeap []*sortBlock
type sortBlocksHeap struct {
sbs []*sortBlock
}
func (sbh sortBlocksHeap) getNextBlock() *sortBlock {
if len(sbh) < 2 {
func (sbh *sortBlocksHeap) getNextBlock() *sortBlock {
sbs := sbh.sbs
if len(sbs) < 2 {
return nil
}
if len(sbh) < 3 {
return sbh[1]
if len(sbs) < 3 {
return sbs[1]
}
a := sbh[1]
b := sbh[2]
a := sbs[1]
b := sbs[2]
if a.Timestamps[a.NextIdx] <= b.Timestamps[b.NextIdx] {
return a
}
return b
}
func (sbh sortBlocksHeap) Len() int {
return len(sbh)
func (sbh *sortBlocksHeap) Len() int {
return len(sbh.sbs)
}
func (sbh sortBlocksHeap) Less(i, j int) bool {
a := sbh[i]
b := sbh[j]
func (sbh *sortBlocksHeap) Less(i, j int) bool {
sbs := sbh.sbs
a := sbs[i]
b := sbs[j]
return a.Timestamps[a.NextIdx] < b.Timestamps[b.NextIdx]
}
func (sbh sortBlocksHeap) Swap(i, j int) {
sbh[i], sbh[j] = sbh[j], sbh[i]
func (sbh *sortBlocksHeap) Swap(i, j int) {
sbs := sbh.sbs
sbs[i], sbs[j] = sbs[j], sbs[i]
}
func (sbh *sortBlocksHeap) Push(x interface{}) {
*sbh = append(*sbh, x.(*sortBlock))
sbh.sbs = append(sbh.sbs, x.(*sortBlock))
}
func (sbh *sortBlocksHeap) Pop() interface{} {
a := *sbh
v := a[len(a)-1]
*sbh = a[:len(a)-1]
sbs := sbh.sbs
v := sbs[len(sbs)-1]
sbs[len(sbs)-1] = nil
sbh.sbs = sbs[:len(sbs)-1]
return v
}
func getSortBlocksHeap() *sortBlocksHeap {
v := sbhPool.Get()
if v == nil {
return &sortBlocksHeap{}
}
return v.(*sortBlocksHeap)
}
func putSortBlocksHeap(sbh *sortBlocksHeap) {
sbs := sbh.sbs
for i := range sbs {
sbs[i] = nil
}
sbh.sbs = sbs[:0]
sbhPool.Put(sbh)
}
var sbhPool sync.Pool
// DeleteSeries deletes time series matching the given tagFilterss.
func DeleteSeries(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline searchutils.Deadline) (int, error) {
qt = qt.NewChild("delete series: %s", sq)
@@ -1012,7 +1157,8 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
maxSeriesCount := sr.Init(qt, vmstorage.Storage, tfss, tr, sq.MaxMetrics, deadline.Deadline())
indexSearchDuration.UpdateDuration(startTime)
type blockRefs struct {
brs []blockRef
brsPrealloc [4]blockRef
brs []blockRef
}
m := make(map[string]*blockRefs, maxSeriesCount)
orderedMetricNames := make([]string, 0, maxSeriesCount)
@@ -1041,20 +1187,19 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
putStorageSearch(sr)
return nil, fmt.Errorf("cannot write %d bytes to temporary file: %w", len(buf), err)
}
metricName := sr.MetricBlockRef.MetricName
brs := m[string(metricName)]
metricName := bytesutil.InternBytes(sr.MetricBlockRef.MetricName)
brs := m[metricName]
if brs == nil {
brs = &blockRefs{}
brs.brs = brs.brsPrealloc[:0]
}
brs.brs = append(brs.brs, blockRef{
partRef: br.PartRef(),
addr: addr,
})
if len(brs.brs) == 1 {
// An optimization for big number of time series with long metricName values:
// use only a single copy of metricName for both orderedMetricNames and m.
orderedMetricNames = append(orderedMetricNames, string(metricName))
m[orderedMetricNames[len(orderedMetricNames)-1]] = brs
orderedMetricNames = append(orderedMetricNames, metricName)
m[metricName] = brs
}
}
if err := sr.Error(); err != nil {
@@ -1127,7 +1272,7 @@ func applyGraphiteRegexpFilter(filter string, ss []string) ([]string, error) {
// Anchor filter regexp to the beginning of the string as Graphite does.
// See https://github.com/graphite-project/graphite-web/blob/3ad279df5cb90b211953e39161df416e54a84948/webapp/graphite/tags/localdatabase.py#L157
filter = "^(?:" + filter + ")"
re, err := regexp.Compile(filter)
re, err := metricsql.CompileRegexp(filter)
if err != nil {
return nil, fmt.Errorf("cannot parse regexp filter=%q: %w", filter, err)
}

View File

@@ -9,7 +9,10 @@ func TestMergeSortBlocks(t *testing.T) {
f := func(blocks []*sortBlock, dedupInterval int64, expectedResult *Result) {
t.Helper()
var result Result
mergeSortBlocks(&result, blocks, dedupInterval)
sbh := getSortBlocksHeap()
sbh.sbs = append(sbh.sbs[:0], blocks...)
mergeSortBlocks(&result, sbh, dedupInterval)
putSortBlocksHeap(sbh)
if !reflect.DeepEqual(result.Values, expectedResult.Values) {
t.Fatalf("unexpected values;\ngot\n%v\nwant\n%v", result.Values, expectedResult.Values)
}
@@ -102,21 +105,35 @@ func TestMergeSortBlocks(t *testing.T) {
Values: []float64{9, 4.2, 2.1, 5.2, 42, 6.1},
})
// Multiple blocks with identical timestamps.
// Multiple blocks with identical timestamps and identical values.
f([]*sortBlock{
{
Timestamps: []int64{1, 2, 4, 5},
Values: []float64{9, 5.2, 6.1, 9},
},
{
Timestamps: []int64{1, 2, 4},
Values: []float64{9, 5.2, 6.1},
},
}, 1, &Result{
Timestamps: []int64{1, 2, 4, 5},
Values: []float64{9, 5.2, 6.1, 9},
})
// Multiple blocks with identical timestamps.
f([]*sortBlock{
{
Timestamps: []int64{1, 2, 4, 5},
Values: []float64{9, 5.2, 6.1, 9},
},
{
Timestamps: []int64{1, 2, 4},
Values: []float64{4.2, 2.1, 42},
},
}, 1, &Result{
Timestamps: []int64{1, 2, 4},
Values: []float64{4.2, 2.1, 42},
Timestamps: []int64{1, 2, 4, 5},
Values: []float64{9, 5.2, 42, 9},
})
// Multiple blocks with identical timestamps, disabled deduplication.
f([]*sortBlock{
{

View File

@@ -90,16 +90,18 @@ func benchmarkMergeSortBlocks(b *testing.B, blocks []*sortBlock) {
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
var result Result
sbs := make(sortBlocksHeap, len(blocks))
sbh := getSortBlocksHeap()
for pb.Next() {
result.reset()
for i, b := range blocks {
sbs := sbh.sbs[:0]
for _, b := range blocks {
sb := getSortBlock()
sb.Timestamps = b.Timestamps
sb.Values = b.Values
sbs[i] = sb
sbs = append(sbs, sb)
}
mergeSortBlocks(&result, sbs, dedupInterval)
sbh.sbs = sbs
mergeSortBlocks(&result, sbh, dedupInterval)
}
})
}

View File

@@ -59,7 +59,7 @@ textarea { margin: 1em }
<h3>Tutorial for WITH expressions in <a href="https://docs.victoriametrics.com/MetricsQL.html">MetricsQL</a></h3>
<p>
Let's look at the following real query from <a href="https://grafana.com/dashboards/1860">Node Exporter Full</a> dashboard:
Let's look at the following real query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
</p>
<pre>
@@ -123,7 +123,7 @@ my_resource_utilization(
</p>
<p>
Let's take another nice query from <a href="https://grafana.com/dashboards/1860">Node Exporter Full</a> dashboard:
Let's take another nice query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
</p>
<pre>

View File

@@ -131,7 +131,7 @@ func streamwithExprsTutorial(qw422016 *qt422016.Writer) {
<h3>Tutorial for WITH expressions in <a href="https://docs.victoriametrics.com/MetricsQL.html">MetricsQL</a></h3>
<p>
Let's look at the following real query from <a href="https://grafana.com/dashboards/1860">Node Exporter Full</a> dashboard:
Let's look at the following real query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
</p>
<pre>
@@ -195,7 +195,7 @@ my_resource_utilization(
</p>
<p>
Let's take another nice query from <a href="https://grafana.com/dashboards/1860">Node Exporter Full</a> dashboard:
Let's take another nice query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
</p>
<pre>

View File

@@ -716,7 +716,6 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
if err := exportHandler(qt, w, cp, "promapi", 0, false); err != nil {
return fmt.Errorf("error when exporting data for query=%q on the time range (start=%d, end=%d): %w", childQuery, start, end, err)
}
queryDuration.UpdateDuration(startTime)
return nil
}
if childQuery, windowExpr, stepExpr, offsetExpr := promql.IsRollup(query); childQuery != "" {
@@ -732,7 +731,6 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
if err := queryRangeHandler(qt, startTime, w, childQuery, start, end, step, r, ct, etfs); err != nil {
return fmt.Errorf("error when executing query=%q on the time range (start=%d, end=%d, step=%d): %w", childQuery, start, end, step, err)
}
queryDuration.UpdateDuration(startTime)
return nil
}
@@ -768,10 +766,14 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
}
if queryOffset > 0 {
for i := range result {
timestamps := result[i].Timestamps
r := &result[i]
// Do not modify r.Timestamps, since they may be shared among multiple series.
// Make a copy instead.
timestamps := append([]int64{}, r.Timestamps...)
for j := range timestamps {
timestamps[j] += queryOffset
}
r.Timestamps = timestamps
}
}
@@ -912,7 +914,9 @@ func removeEmptyValuesAndTimeseries(tss []netstorage.Result) []netstorage.Result
// Slow path: remove NaNs.
srcTimestamps := ts.Timestamps
dstValues := ts.Values[:0]
dstTimestamps := ts.Timestamps[:0]
// Do not re-use ts.Timestamps for dstTimestamps, since ts.Timestamps
// may be shared among multiple time series.
dstTimestamps := make([]int64, 0, len(ts.Timestamps))
for j, v := range ts.Values {
if math.IsNaN(v) {
continue

View File

@@ -7,6 +7,7 @@ import (
"strconv"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/metrics"
@@ -115,16 +116,17 @@ func aggrFuncExt(afe func(tss []*timeseries, modifier *metricsql.ModifierExpr) [
for i, ts := range arg {
removeGroupTags(&ts.MetricName, modifier)
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
k := bytesutil.InternBytes(bb.B)
if keepOriginal {
ts = argOrig[i]
}
tss := m[string(bb.B)]
tss := m[k]
if tss == nil && maxSeries > 0 && len(m) >= maxSeries {
// We already reached time series limit after grouping. Skip other time series.
continue
}
tss = append(tss, ts)
m[string(bb.B)] = tss
m[k] = tss
}
bbPool.Put(bb)
@@ -817,7 +819,7 @@ func lastValue(values []float64) float64 {
// quantiles calculates the given phis from originValues without modifying originValues, appends them to qs and returns the result.
func quantiles(qs, phis []float64, originValues []float64) []float64 {
a := getFloat64s()
a.A = prepareForQuantileFloat64(a.A[:0], originValues)
a.prepareForQuantileFloat64(originValues)
qs = quantilesSorted(qs, phis, a.A)
putFloat64s(a)
return qs
@@ -826,22 +828,38 @@ func quantiles(qs, phis []float64, originValues []float64) []float64 {
// quantile calculates the given phi from originValues without modifying originValues
func quantile(phi float64, originValues []float64) float64 {
a := getFloat64s()
a.A = prepareForQuantileFloat64(a.A[:0], originValues)
a.prepareForQuantileFloat64(originValues)
q := quantileSorted(phi, a.A)
putFloat64s(a)
return q
}
// prepareForQuantileFloat64 copies items from src to dst but removes NaNs and sorts the dst
func prepareForQuantileFloat64(dst, src []float64) []float64 {
// prepareForQuantileFloat64 copies items from src to a but removes NaNs and sorts items in a.
func (a *float64s) prepareForQuantileFloat64(src []float64) {
dst := a.A[:0]
for _, v := range src {
if math.IsNaN(v) {
continue
}
dst = append(dst, v)
}
sort.Float64s(dst)
return dst
a.A = dst
// Use sort.Sort instead of sort.Float64s in order to avoid a memory allocation
sort.Sort(a)
}
func (a *float64s) Len() int {
return len(a.A)
}
func (a *float64s) Swap(i, j int) {
x := a.A
x[i], x[j] = x[j], x[i]
}
func (a *float64s) Less(i, j int) bool {
x := a.A
return x[i] < x[j]
}
// quantilesSorted calculates the given phis over a sorted list of values, appends them to qs and returns the result.

View File

@@ -5,6 +5,7 @@ import (
"strings"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/metricsql"
)
@@ -98,7 +99,8 @@ func (iafc *incrementalAggrFuncContext) updateTimeseries(tsOrig *timeseries, wor
removeGroupTags(&ts.MetricName, &iafc.ae.Modifier)
bb := bbPool.Get()
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
iac := m[string(bb.B)]
k := bytesutil.InternBytes(bb.B)
iac := m[k]
if iac == nil {
if iafc.ae.Limit > 0 && len(m) >= iafc.ae.Limit {
// Skip this time series, since the limit on the number of output time series has been already reached.
@@ -117,7 +119,7 @@ func (iafc *incrementalAggrFuncContext) updateTimeseries(tsOrig *timeseries, wor
ts: tsAggr,
values: make([]float64, len(ts.Values)),
}
m[string(bb.B)] = iac
m[k] = iac
}
bbPool.Put(bb)
iafc.callbacks.updateAggrFunc(iac, ts.Values)

View File

@@ -5,6 +5,7 @@ import (
"math"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/metricsql"
@@ -248,7 +249,8 @@ func groupJoin(singleTimeseriesSide string, be *metricsql.BinaryOpExpr, rvsLeft,
bb.B = marshalMetricTagsSorted(bb.B[:0], &tsCopy.MetricName)
pair, ok := m[string(bb.B)]
if !ok {
m[string(bb.B)] = &tsPair{
k := bytesutil.InternBytes(bb.B)
m[k] = &tsPair{
left: &tsCopy,
right: tsRight,
}
@@ -504,7 +506,8 @@ func createTimeseriesMapByTagSet(be *metricsql.BinaryOpExpr, left, right []*time
logger.Panicf("BUG: unexpected binary op modifier %q", groupOp)
}
bb.B = marshalMetricTagsSorted(bb.B[:0], mn)
m[string(bb.B)] = append(m[string(bb.B)], ts)
k := bytesutil.InternBytes(bb.B)
m[k] = append(m[k], ts)
}
storage.PutMetricName(mn)
bbPool.Put(bb)

View File

@@ -505,26 +505,54 @@ func execBinaryOpArgs(qt *querytracer.Tracer, ec *EvalConfig, exprFirst, exprSec
}
func getCommonLabelFilters(tss []*timeseries) []metricsql.LabelFilter {
m := make(map[string][]string)
if len(tss) == 0 {
return nil
}
type valuesCounter struct {
values map[string]struct{}
count int
}
m := make(map[string]*valuesCounter, len(tss[0].MetricName.Tags))
for _, ts := range tss {
for _, tag := range ts.MetricName.Tags {
m[string(tag.Key)] = append(m[string(tag.Key)], string(tag.Value))
vc, ok := m[string(tag.Key)]
if !ok {
k := bytesutil.InternBytes(tag.Key)
v := bytesutil.InternBytes(tag.Value)
m[k] = &valuesCounter{
values: map[string]struct{}{
v: {},
},
count: 1,
}
continue
}
if len(vc.values) > 100 {
// Too many unique values found for the given tag.
// Do not make a filter on such values, since it may slow down
// search for matching time series.
continue
}
vc.count++
if _, ok := vc.values[string(tag.Value)]; !ok {
v := bytesutil.InternBytes(tag.Value)
vc.values[v] = struct{}{}
}
}
}
lfs := make([]metricsql.LabelFilter, 0, len(m))
for key, values := range m {
if len(values) != len(tss) {
var values []string
for k, vc := range m {
if vc.count != len(tss) {
// Skip the tag, since it doesn't belong to all the time series.
continue
}
values = getUniqueValues(values)
if len(values) > 1000 {
// Skip the filter on the given tag, since it needs to enumerate too many unique values.
// This may slow down the search for matching time series.
continue
values = values[:0]
for s := range vc.values {
values = append(values, s)
}
lf := metricsql.LabelFilter{
Label: key,
Label: k,
}
if len(values) == 1 {
lf.Value = values[0]
@@ -541,18 +569,6 @@ func getCommonLabelFilters(tss []*timeseries) []metricsql.LabelFilter {
return lfs
}
func getUniqueValues(a []string) []string {
m := make(map[string]struct{}, len(a))
results := make([]string, 0, len(a))
for _, s := range a {
if _, ok := m[s]; !ok {
results = append(results, s)
m[s] = struct{}{}
}
}
return results
}
func joinRegexpValues(a []string) string {
var b []byte
for i, s := range a {
@@ -652,9 +668,9 @@ func evalExprsInParallel(qt *querytracer.Tracer, ec *EvalConfig, es []metricsql.
}
rvs := make([][]*timeseries, len(es))
errs := make([]error, len(es))
qt.Printf("eval function args in parallel")
var wg sync.WaitGroup
for i, e := range es {
qt.Printf("eval function args in parallel")
wg.Add(1)
qtChild := qt.NewChild("eval arg %d", i)
go func(e metricsql.Expr, i int) {

View File

@@ -12,6 +12,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/querystats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
@@ -121,15 +122,18 @@ func timeseriesToResult(tss []*timeseries, maySort bool) ([]netstorage.Result, e
bb := bbPool.Get()
for i, ts := range tss {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if _, ok := m[string(bb.B)]; ok {
k := bytesutil.InternBytes(bb.B)
if _, ok := m[k]; ok {
return nil, fmt.Errorf(`duplicate output timeseries: %s`, stringMetricName(&ts.MetricName))
}
m[string(bb.B)] = struct{}{}
m[k] = struct{}{}
rs := &result[i]
rs.MetricName.CopyFrom(&ts.MetricName)
rs.Values = append(rs.Values[:0], ts.Values...)
rs.Timestamps = append(rs.Timestamps[:0], ts.Timestamps...)
rs.MetricName.MoveFrom(&ts.MetricName)
rs.Values = ts.Values
ts.Values = nil
rs.Timestamps = ts.Timestamps
ts.Timestamps = nil
}
bbPool.Put(bb)

View File

@@ -86,6 +86,61 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("duration-constant", func(t *testing.T) {
t.Parallel()
q := `1h23m5S`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{4985, 4985, 4985, 4985, 4985, 4985},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("num-with-suffix-1", func(t *testing.T) {
t.Parallel()
q := `123M`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{123e6, 123e6, 123e6, 123e6, 123e6, 123e6},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("num-with-suffix-2", func(t *testing.T) {
t.Parallel()
q := `1.23TB`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{1.23e12, 1.23e12, 1.23e12, 1.23e12, 1.23e12, 1.23e12},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("num-with-suffix-3", func(t *testing.T) {
t.Parallel()
q := `1.23Mib`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20)},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("num-with-suffix-4", func(t *testing.T) {
t.Parallel()
q := `1.23mib`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20), 1.23 * (1 << 20)},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("simple-arithmetic", func(t *testing.T) {
t.Parallel()
q := `-1+2 *3 ^ 4+5%6`
@@ -935,7 +990,7 @@ func TestExecSuccess(t *testing.T) {
))`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{nan, nan, nan, 1, nan, nan},
Values: []float64{nan, nan, 1, 1, nan, nan},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -2149,6 +2204,34 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`label_replace(nonexisting_src_match)`, func(t *testing.T) {
t.Parallel()
q := `label_replace(time(), "foo", "x", "bar", "")`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{1000, 1200, 1400, 1600, 1800, 2000},
Timestamps: timestampsExpected,
}
r.MetricName.Tags = []storage.Tag{
{
Key: []byte("foo"),
Value: []byte("x"),
},
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`label_replace(nonexisting_src_mismatch)`, func(t *testing.T) {
t.Parallel()
q := `label_replace(time(), "foo", "x", "bar", "y")`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{1000, 1200, 1400, 1600, 1800, 2000},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`label_replace(mismatch)`, func(t *testing.T) {
t.Parallel()
q := `label_replace(label_set(time(), "foo", "foobar"), "__name__", "x${1}y", "foo", "bar(.+)")`
@@ -6692,7 +6775,7 @@ func TestExecSuccess(t *testing.T) {
q := `rate((2000-time())[100s])`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{5.5, 4.5, 3.5, 2.5, 1.5, 0.5},
Values: []float64{5, 4, 3, 2, 1, 0},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -6703,7 +6786,7 @@ func TestExecSuccess(t *testing.T) {
q := `rate((2000-time())[100s:])`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{5.5, 4.5, 3.5, 2.5, 1.5, 0.5},
Values: []float64{5, 4, 3, 2, 1, 0},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -6714,7 +6797,7 @@ func TestExecSuccess(t *testing.T) {
q := `rate((2000-time())[100s:100s])`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{0, 0, 6.5, 4.5, 2.5, 0.5},
Values: []float64{0, 0, 6, 4, 2, 0},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -6725,7 +6808,7 @@ func TestExecSuccess(t *testing.T) {
q := `rate((2000-time())[100s:100s] offset 100s)`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{0, 0, 3.5, 5.5, 3.5, 1.5},
Values: []float64{0, 0, 7, 5, 3, 1},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -6736,7 +6819,7 @@ func TestExecSuccess(t *testing.T) {
q := `rate((2000-time())[100s:100s] offset 100s)[:] offset 100s`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{0, 0, 0, 3.5, 5.5, 3.5},
Values: []float64{0, 0, 0, 7, 5, 3},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -7329,7 +7412,7 @@ func TestExecSuccess(t *testing.T) {
})
t.Run(`rollup_scrape_interval()`, func(t *testing.T) {
t.Parallel()
q := `sort_by_label(rollup_scrape_interval(1[5m:10s]), "rollup")`
q := `sort_by_label(rollup_scrape_interval(1[5m:10S]), "rollup")`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{10, 10, 10, 10, 10, 10},

View File

@@ -145,11 +145,11 @@ var rollupAggrFuncs = map[string]rollupFunc{
"zscore_over_time": rollupZScoreOverTime,
}
// VictoriaMetrics can increase lookbehind window in square brackets for these functions
// if the given window doesn't contain enough samples for calculations.
// VictoriaMetrics can extends lookbehind window for these functions
// in order to make sure it contains enough points for returning non-empty results.
//
// This is needed in order to return the expected non-empty graphs when zooming in the graph in Grafana,
// which is built with `func_name(metric[$__interval])` query.
// This is needed for returning the expected non-empty graphs when zooming in the graph in Grafana,
// which is built with `func_name(metric)` query.
var rollupFuncsCanAdjustWindow = map[string]bool{
"default_rollup": true,
"deriv": true,
@@ -584,15 +584,23 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
window := rc.Window
if window <= 0 {
window = rc.Step
if rc.MayAdjustWindow && window < maxPrevInterval {
// Adjust lookbehind window only if it isn't set explicilty, e.g. rate(foo).
// In the case of missing lookbehind window it should be adjusted in order to return non-empty graph
// when the window doesn't cover at least two raw samples (this is what most users expect).
//
// If the user explicitly sets the lookbehind window to some fixed value, e.g. rate(foo[1s]),
// then it is expected he knows what he is doing. Do not adjust the lookbehind window then.
//
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3483
window = maxPrevInterval
}
if rc.isDefaultRollup && rc.LookbackDelta > 0 && window > rc.LookbackDelta {
// Implicit window exceeds -search.maxStalenessInterval, so limit it to -search.maxStalenessInterval
// according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/784
window = rc.LookbackDelta
}
}
if rc.MayAdjustWindow && window < maxPrevInterval {
window = maxPrevInterval
}
rfa := getRollupFuncArg()
rfa.idx = 0
rfa.window = window

View File

@@ -488,7 +488,8 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
defer bbPool.Put(bb)
for _, ts := range a {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
m[string(bb.B)] = ts
k := bytesutil.InternBytes(bb.B)
m[k] = ts
}
rvs := make([]*timeseries, 0, len(a))
@@ -497,12 +498,11 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
tmp.denyReuse = true
tmp.Timestamps = sharedTimestamps
tmp.Values = make([]float64, 0, len(tmp.Timestamps))
// Do not use MetricName.CopyFrom for performance reasons.
// It is safe to make shallow copy, since tsB must no longer used.
tmp.MetricName = tsB.MetricName
tmp.MetricName.MoveFrom(&tsB.MetricName)
bb.B = marshalMetricNameSorted(bb.B[:0], &tsB.MetricName)
tsA := m[string(bb.B)]
bb.B = marshalMetricNameSorted(bb.B[:0], &tmp.MetricName)
k := bytesutil.InternBytes(bb.B)
tsA := m[k]
if tsA == nil {
tStart := ec.Start
for tStart < bStart {
@@ -511,7 +511,7 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
}
} else {
tmp.Values = append(tmp.Values, tsA.Values...)
delete(m, string(bb.B))
delete(m, k)
}
tmp.Values = append(tmp.Values, tsB.Values...)
if len(tmp.Values) != len(tmp.Timestamps) {
@@ -525,9 +525,8 @@ func mergeTimeseries(a, b []*timeseries, bStart int64, ec *EvalConfig) []*timese
var tmp timeseries
tmp.denyReuse = true
tmp.Timestamps = sharedTimestamps
// Do not use MetricName.CopyFrom for performance reasons.
// It is safe to make shallow copy, since tsA must no longer used.
tmp.MetricName = tsA.MetricName
tmp.Values = make([]float64, 0, len(tmp.Timestamps))
tmp.MetricName.MoveFrom(&tsA.MetricName)
tmp.Values = append(tmp.Values, tsA.Values...)
tStart := bStart

View File

@@ -320,26 +320,16 @@ func marshalMetricTagsFast(dst []byte, tags []storage.Tag) []byte {
func marshalMetricNameSorted(dst []byte, mn *storage.MetricName) []byte {
dst = marshalBytesFast(dst, mn.MetricGroup)
sortMetricTags(mn.Tags)
sortMetricTags(mn)
dst = marshalMetricTagsFast(dst, mn.Tags)
return dst
}
func marshalMetricTagsSorted(dst []byte, mn *storage.MetricName) []byte {
sortMetricTags(mn.Tags)
sortMetricTags(mn)
return marshalMetricTagsFast(dst, mn.Tags)
}
func sortMetricTags(tags []storage.Tag) {
less := func(i, j int) bool {
return string(tags[i].Key) < string(tags[j].Key)
}
if sort.SliceIsSorted(tags, less) {
return
}
sort.Slice(tags, less)
}
func marshalBytesFast(dst []byte, s []byte) []byte {
dst = encoding.MarshalUint16(dst, uint16(len(s)))
dst = append(dst, s...)
@@ -361,14 +351,14 @@ func unmarshalBytesFast(src []byte) ([]byte, []byte, error) {
func stringMetricName(mn *storage.MetricName) string {
var dst []byte
dst = append(dst, mn.MetricGroup...)
sortMetricTags(mn.Tags)
sortMetricTags(mn)
dst = appendStringMetricTags(dst, mn.Tags)
return string(dst)
}
func stringMetricTags(mn *storage.MetricName) string {
var dst []byte
sortMetricTags(mn.Tags)
sortMetricTags(mn)
dst = appendStringMetricTags(dst, mn.Tags)
return string(dst)
}
@@ -428,3 +418,26 @@ func assertIdenticalTimestamps(tss []*timeseries, step int64) {
}
}
}
func sortMetricTags(mn *storage.MetricName) {
mts := (*metricTagsSorter)(mn)
if !sort.IsSorted(mts) {
sort.Sort(mts)
}
}
type metricTagsSorter storage.MetricName
func (mts *metricTagsSorter) Len() int {
return len(mts.Tags)
}
func (mts *metricTagsSorter) Less(i, j int) bool {
a := mts.Tags
return string(a[i].Key) < string(a[j].Key)
}
func (mts *metricTagsSorter) Swap(i, j int) {
a := mts.Tags
a[i], a[j] = a[j], a[i]
}

View File

@@ -411,7 +411,8 @@ func transformBucketsLimit(tfa *transformFuncArg) ([]*timeseries, error) {
mn.CopyFrom(&ts.MetricName)
mn.RemoveTag("le")
b = marshalMetricNameSorted(b[:0], &mn)
m[string(b)] = append(m[string(b)], x{
k := bytesutil.InternBytes(b)
m[k] = append(m[k], x{
le: le,
ts: ts,
})
@@ -513,7 +514,8 @@ func vmrangeBucketsToLE(tss []*timeseries) []*timeseries {
ts.MetricName.RemoveTag("le")
ts.MetricName.RemoveTag("vmrange")
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
m[string(bb.B)] = append(m[string(bb.B)], x{
k := bytesutil.InternBytes(bb.B)
m[k] = append(m[k], x{
startStr: startStr,
endStr: endStr,
start: start,
@@ -1002,7 +1004,8 @@ func groupLeTimeseries(tss []*timeseries) map[string][]leTimeseries {
ts.MetricName.ResetMetricGroup()
ts.MetricName.RemoveTag("le")
bb.B = marshalMetricTagsSorted(bb.B[:0], &ts.MetricName)
m[string(bb.B)] = append(m[string(bb.B)], leTimeseries{
k := bytesutil.InternBytes(bb.B)
m[k] = append(m[k], leTimeseries{
le: le,
ts: ts,
})
@@ -1532,10 +1535,11 @@ func transformUnion(tfa *transformFuncArg) ([]*timeseries, error) {
for _, arg := range args {
for _, ts := range arg {
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
if m[string(bb.B)] {
k := bytesutil.InternBytes(bb.B)
if m[k] {
continue
}
m[string(bb.B)] = true
m[k] = true
rvs = append(rvs, ts)
}
}
@@ -1873,12 +1877,12 @@ func labelReplace(tss []*timeseries, srcLabel string, r *regexp.Regexp, dstLabel
replacementBytes := []byte(replacement)
for _, ts := range tss {
mn := &ts.MetricName
dstValue := getDstValue(mn, dstLabel)
srcValue := mn.GetTagValue(srcLabel)
if !r.Match(srcValue) {
continue
}
b := r.ReplaceAll(srcValue, replacementBytes)
dstValue := getDstValue(mn, dstLabel)
*dstValue = append((*dstValue)[:0], b...)
if len(b) == 0 {
mn.RemoveTag(dstLabel)

View File

@@ -48,30 +48,18 @@ func GetTime(r *http.Request, argKey string, defaultMs int64) (int64, error) {
if len(argValue) == 0 {
return roundToSeconds(defaultMs), nil
}
secs, err := strconv.ParseFloat(argValue, 64)
// Handle Prometheus'-provided minTime and maxTime.
// See https://github.com/prometheus/client_golang/issues/614
switch argValue {
case prometheusMinTimeFormatted:
return minTimeMsecs, nil
case prometheusMaxTimeFormatted:
return maxTimeMsecs, nil
}
// Parse argValue
secs, err := parseTime(argValue)
if err != nil {
// Try parsing string format
t, err := time.Parse(time.RFC3339, argValue)
if err != nil {
// Handle Prometheus'-provided minTime and maxTime.
// See https://github.com/prometheus/client_golang/issues/614
switch argValue {
case prometheusMinTimeFormatted:
return minTimeMsecs, nil
case prometheusMaxTimeFormatted:
return maxTimeMsecs, nil
}
// Try parsing duration relative to the current time
d, err1 := promutils.ParseDuration(argValue)
if err1 != nil {
return 0, fmt.Errorf("cannot parse %q=%q: %w", argKey, argValue, err)
}
if d > 0 {
d = -d
}
t = time.Now().Add(d)
}
secs = float64(t.UnixNano()) / 1e9
return 0, fmt.Errorf("cannot parse %s=%s: %w", argKey, argValue, err)
}
msecs := int64(secs * 1e3)
if msecs < minTimeMsecs {
@@ -83,6 +71,78 @@ func GetTime(r *http.Request, argKey string, defaultMs int64) (int64, error) {
return msecs, nil
}
func parseTime(s string) (float64, error) {
if len(s) > 0 && (s[len(s)-1] != 'Z' && s[len(s)-1] > '9' || s[0] == '-') {
// Parse duration relative to the current time
d, err := promutils.ParseDuration(s)
if err != nil {
return 0, err
}
if d > 0 {
d = -d
}
t := time.Now().Add(d)
return float64(t.UnixNano()) / 1e9, nil
}
if len(s) == 4 {
// Parse YYYY
t, err := time.Parse("2006", s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
if !strings.Contains(s, "-") {
// Parse the timestamp in milliseconds
return strconv.ParseFloat(s, 64)
}
if len(s) == 7 {
// Parse YYYY-MM
t, err := time.Parse("2006-01", s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
if len(s) == 10 {
// Parse YYYY-MM-DD
t, err := time.Parse("2006-01-02", s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
if len(s) == 13 {
// Parse YYYY-MM-DDTHH
t, err := time.Parse("2006-01-02T15", s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
if len(s) == 16 {
// Parse YYYY-MM-DDTHH:MM
t, err := time.Parse("2006-01-02T15:04", s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
if len(s) == 19 {
// Parse YYYY-MM-DDTHH:MM:SS
t, err := time.Parse("2006-01-02T15:04:05", s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
t, err := time.Parse(time.RFC3339, s)
if err != nil {
return 0, err
}
return float64(t.UnixNano()) / 1e9, nil
}
var (
// These constants were obtained from https://github.com/prometheus/prometheus/blob/91d7175eaac18b00e370965f3a8186cc40bf9f55/web/api/v1/api.go#L442
// See https://github.com/prometheus/client_golang/issues/614 for details.

View File

@@ -11,6 +11,69 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
)
func TestGetDurationSuccess(t *testing.T) {
f := func(s string, dExpected int64) {
t.Helper()
urlStr := fmt.Sprintf("http://foo.bar/baz?s=%s", url.QueryEscape(s))
r, err := http.NewRequest("GET", urlStr, nil)
if err != nil {
t.Fatalf("unexpected error in NewRequest: %s", err)
}
// Verify defaultValue
d, err := GetDuration(r, "foo", 123456)
if err != nil {
t.Fatalf("unexpected error when obtaining default time from GetDuration(%q): %s", s, err)
}
if d != 123456 {
t.Fatalf("unexpected default value for GetDuration(%q); got %d; want %d", s, d, 123456)
}
// Verify dExpected
d, err = GetDuration(r, "s", 123)
if err != nil {
t.Fatalf("unexpected error in GetDuration(%q): %s", s, err)
}
if d != dExpected {
t.Fatalf("unexpected timestamp for GetDuration(%q); got %d; want %d", s, d, dExpected)
}
}
f("1.234", 1234)
f("1.23ms", 1)
f("1.23s", 1230)
f("2s56ms", 2056)
f("2s-5ms", 1995)
f("5m3.5s", 303500)
f("2h", 7200000)
f("1d", 24*3600*1000)
f("7d5h4m3s534ms", 623043534)
}
func TestGetDurationError(t *testing.T) {
f := func(s string) {
t.Helper()
urlStr := fmt.Sprintf("http://foo.bar/baz?s=%s", url.QueryEscape(s))
r, err := http.NewRequest("GET", urlStr, nil)
if err != nil {
t.Fatalf("unexpected error in NewRequest: %s", err)
}
if _, err := GetDuration(r, "s", 123); err == nil {
t.Fatalf("expecting non-nil error in GetDuration(%q)", s)
}
}
// Negative durations aren't supported
f("-1.234")
// Invalid duration
f("foo")
// Invalid suffix
f("1md")
}
func TestGetTimeSuccess(t *testing.T) {
f := func(s string, timestampExpected int64) {
t.Helper()
@@ -39,6 +102,17 @@ func TestGetTimeSuccess(t *testing.T) {
}
}
f("2019", 1546300800000)
f("2019-01", 1546300800000)
f("2019-02", 1548979200000)
f("2019-02-01", 1548979200000)
f("2019-02-02", 1549065600000)
f("2019-02-02T00", 1549065600000)
f("2019-02-02T01", 1549069200000)
f("2019-02-02T01:00", 1549069200000)
f("2019-02-02T01:01", 1549069260000)
f("2019-02-02T01:01:00", 1549069260000)
f("2019-02-02T01:01:01", 1549069261000)
f("2019-07-07T20:01:02Z", 1562529662000)
f("2019-07-07T20:47:40+03:00", 1562521660000)
f("-292273086-05-16T16:47:06Z", minTimeMsecs)
@@ -58,27 +132,26 @@ func TestGetTimeError(t *testing.T) {
t.Fatalf("unexpected error in NewRequest: %s", err)
}
// Verify defaultValue
ts, err := GetTime(r, "foo", 123456)
if err != nil {
t.Fatalf("unexpected error when obtaining default time from GetTime(%q): %s", s, err)
}
if ts != 123000 {
t.Fatalf("unexpected default value for GetTime(%q); got %d; want %d", s, ts, 123000)
}
// Verify timestampExpected
_, err = GetTime(r, "s", 123)
if err == nil {
if _, err := GetTime(r, "s", 123); err == nil {
t.Fatalf("expecting non-nil error in GetTime(%q)", s)
}
}
f("foo")
f("foo1")
f("1245-5")
f("2022-x7")
f("2022-02-x7")
f("2022-02-02Tx7")
f("2022-02-02T00:x7")
f("2022-02-02T00:00:x7")
f("2022-02-02T00:00:00a")
f("2019-07-07T20:01:02Zisdf")
f("2019-07-07T20:47:40+03:00123")
f("-292273086-05-16T16:47:07Z")
f("292277025-08-18T07:12:54.999999998Z")
f("123md")
f("-12.3md")
}
func TestGetExtraTagFilters(t *testing.T) {

Some files were not shown because too many files have changed in this diff Show More