Compare commits

...

571 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
73dfb030dd deployment/docker/Makefile: use alpine 3.17.3 instead of alpine 3.18.0 for certs image, since alpine 3.18.0 doesnt work for cross-platform builds 2023-05-18 14:10:29 -07:00
Aliaksandr Valialkin
52b5498165 docs/CHANGELOG.md: cut v1.91.0 2023-05-18 12:37:12 -07:00
Aliaksandr Valialkin
b6dda0fefe vendor: make vendor-update 2023-05-18 12:22:09 -07:00
Aliaksandr Valialkin
d9b3a92348 app/vmselect/vmui: run make vmui-update after 39c1b0f8d1 2023-05-18 12:15:12 -07:00
Aliaksandr Valialkin
1f28b46ae9 lib/storage: revert the migration from global to per-day index for (MetricName -> TSID)
This reverts the following commits:
- e0e16a2d36
- 2ce02a7fe6

The reason for revert: the updated logic breaks assumptions made
when fixing https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698 .
For example, if a time series stop receiving new samples during the first
day after the indexdb rotation, there are chances that the time series
won't be registered in the new indexdb. This is OK until the next indexdb
rotation, since the time series is registered in the previous indexdb,
so it can be found during queries. But the time series will become invisible
for search after the next indexdb rotation, while its data is still there.

There is also incompletely solved issue with the increased CPU and disk IO resource
usage just after the indexdb rotation. There was an attempt to fix it, but it didn't fix
it in full, while introducing the issue mentioned above. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401

TODO: to find out the solution, which simultaneously solves the following issues:
- increased memory usage for setups high churn rate and long retention (e.g. what the reverted commit does)
- increased CPU and disk IO usage during indexdb rotation ( https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401 )
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2698

Possible solution - to create the new indexdb in one hour before the indexdb rotation
and to gradually pre-populate it with the needed index data during the last hour before indexdb rotation.
Then the new indexdb will contain all the needed data just after the rotation,
so it won't trigger increased CPU and disk IO.
2023-05-18 11:30:49 -07:00
Aliaksandr Valialkin
4f7f750850 lib/handshake: do not pollute logs with cannot read hello messages on TCP health checks
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1762
2023-05-18 10:41:34 -07:00
Aliaksandr Valialkin
0645688f32 app/vmauth: allow -auth.config without users section of unauthorized_user section is present here
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4083
2023-05-18 10:41:34 -07:00
Dmytro Kozlov
7cbda6796c app/vmctl: set default value for --vm-native-step-interval flag (#4327)
* app/vmctl: set default value for `--vm-native-step-interval` flag

* app/vmctl: update CHANGELOG.md

* app/vmctl: update CHANGELOG.md, fix docs

* app/vmctl: fix typo

* app/vmctl: fix typo
2023-05-18 13:43:35 +02:00
Haleygo
1531d757ea fix lint check 2023-05-17 13:51:36 +02:00
Denys Holius
c605d64a95 deployment/docker/Makefile: updated docker compose commands regarding migration from V1 to V2 (#4314)
deployment/docker/Makefile: updated docker compose commands regarding migration from V1 to V2
2023-05-17 13:14:24 +02:00
Aliaksandr Valialkin
63b1cab454 app/vmauth: simplify the code after 4a1d29126c
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4242
2023-05-17 00:37:05 -07:00
Nikolay
4a1d29126c app/vmauth: retry common network dial errors (#4280)
with tracking request body read calls
it allows us to retry POST and PUT requests
2023-05-17 00:19:33 -07:00
Nikolay
16df18ec14 app/vmauth: do not return invalid credentials (#4288)
at http response by default
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4188

based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4190
Thanks @raj-kumar-j  for init implementation
2023-05-17 00:09:47 -07:00
Aliaksandr Valialkin
e0e16a2d36 lib/storage: follow-up after 2ce02a7fe6
- Document the change at docs/CHANGELOG.md
- Clarify comments for non-trivial code touched by the commit
- Improve the logic behind maybeCreateIndexes():
  - Correctly create per-day indexes if the indexdb rotation is performed during
    the first hour or the last hour of the day by UTC.
    Previously there was a possibility of missing index entries on that day.
  - Increase the duration for creating new indexes in the current indexdb for up to 22 hours
    after indexdb rotation. This should reduce the increased resource usage
    after indexdb rotation.
    It is safe to postpone index creation for the current day until the last hour
    of the current day after indexdb rotation by UTC, since the corresponding (date, ...)
    entries exist in the previous indexdb.
- Search for TSID by (date, MetricName) in both the current and the previous indexdb.
  Previously the search was performed only in the current indexdb. This could lead
  to excess creation of per-day indexes for the current day just after indexdb rotation.
- Search for (date, metricID) entries in both the current and the previous indexdb.
  Previously the search was performed only in the current indexdb. This could lead
  to excess creation of per-day indexes for the current day just after indexdb rotation.
2023-05-16 23:19:27 -07:00
Roman Khavronenko
2ce02a7fe6 lib/storage: introduce per-day MetricName=>TSID index (#4252)
The new index substitutes global MetricName=>TSID index
used for locating TSIDs on ingestion path.
For installations with high ingestion and churn rate, global
MetricName=>TSID index can grow enormously making
index lookups too expensive. This also results into bigger
than expected cache growth for indexdb blocks.

New per-day index supposed to be much smaller and more efficient.
This should improve ingestion speed and reliability during
re-routings in cluster.

The negative outcome could be occupied disk size, since
per-day index is more expensive comparing to global index.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-16 15:46:42 -07:00
Artem Navoiev
39ba4fc1c4 update logo width in cluster doc to 300
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-16 15:33:15 -07:00
Aliaksandr Valialkin
278278af95 lib/storage: reduce the unimportant logging during Storage start / stop
This should improve the visibility of potentially important logs
2023-05-16 15:14:21 -07:00
Aliaksandr Valialkin
d330c7e6fc lib/mergeset: remove superflouos logging when opening and closing the Table
The logged messages had little useful info, while they were polluting log output during VictoriaMetrics start/stop
2023-05-16 15:01:25 -07:00
Aliaksandr Valialkin
3cbc0975f6 lib/mergeset: close and open the table before making snapshots at TestTableCreateSnapshotAt()
This gives guarantees that all the in-memory data is written to disk at the snapshot time.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4272
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4316
2023-05-16 14:55:11 -07:00
Aliaksandr Valialkin
09b403d38a lib/{mergeset,storage}: make it clear that DebugFlush() doesn't store all the recently ingested data to disk
DebugFlush() makes sure that the recently ingested data becomes visible to search.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4272
2023-05-16 11:50:17 -07:00
Aliaksandr Valialkin
664db964ca vendor: update github.com/VictoriaMetrics/metrics from v1.23.1 to v1.24.0
This change adds process_* metrics to VictoriaMetrics components under Windows OS

See https://github.com/VictoriaMetrics/metrics/pull/47
2023-05-16 11:37:07 -07:00
Aliaksandr Valialkin
60bdbdaa70 docs/vmbackupmanager.md: run make docs-sync after c7d8dda39225b716ea44df7223db5e4a125d407b 2023-05-16 11:27:42 -07:00
Alexander Marshalov
3b2dc2b098 backup metadata are written in separate file (#560)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-16 11:24:54 -07:00
Alexander Marshalov
7b15834cbe added backup locking/unlocking against retention policy to vmbackupmanager (#558)
* added backup locking/unlocking against retention policy to vmbackupmanager

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* added docs for new commands

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* fix review comments

Signed-off-by: Alexander Marshalov <_@marshalov.org>

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-16 11:23:36 -07:00
Roman Khavronenko
f68d93cca2 vmalert: follow-up after 669becd011 (#4318)
* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: follow-up after 669becd011

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-16 18:51:38 +02:00
Zakhar Bessarab
242050ba94 lib/storage: follow-up after a50d63c376 (#4289)
* lib/storage: follow-up after a50d63c376

- ensure retentionMsecs is rounded to day
- remove localTimeOffset in test as localOffset is ignored when using `UnixMilli`

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/storage: restore retention timezone offset effect on retention deadline

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-16 17:14:08 +02:00
Michael Hoffmann
3a65f4a733 vmalert: improve retry logic for remote write (#4134)
vmalert should not retry on 4xx status codes
according to https://prometheus.io/docs/concepts/remote_write_spec/
2023-05-16 16:30:03 +02:00
Yury Molodov
39c1b0f8d1 vmui: refactor code using custom hooks (#4145)
* refactor: replace boolean useState with useBoolean

* refactor: replace useResize with useWindowSize/useElementSize

* refactor: replace addEventListener with useEventListener

* refactor: replace navigator.clipboard.writeText with useCopyToClipboard

* fix: prevent redirect loop
2023-05-16 16:41:06 +03:00
Roman Khavronenko
686bf6c5ad vmctl: update VictoriaMetrics migration section (#4310)
Remove unnecessary information to simplify the description and tips.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-15 16:15:01 +02:00
Artem Navoiev
478258bc4d fix link in operator quick start docs.2
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-14 21:20:57 +02:00
Artem Navoiev
c19fc52676 fix link in operator quick start docs
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-14 19:54:08 +02:00
Aliaksandr Valialkin
1c47acda11 lib/promutils: add ParseTimeAt() function 2023-05-13 20:12:31 -07:00
Aliaksandr Valialkin
2613110f75 deployment/docker: update base docker image from 3.17.3 to 3.18.0
See https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html
2023-05-12 17:31:21 -07:00
Aliaksandr Valialkin
616175b1ce lib/promutils: properly return error when incorrect Prometheus label names are passed to NewLabelsFromString()
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4284
See also https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4304
2023-05-12 16:52:29 -07:00
Aliaksandr Valialkin
318a87c36f Revert "lib/promrelabel: show error message if labels not in prometheus exposition format (#4304)"
This reverts commit 193a9c3328.

Reason for revert: the commit doesn't fix the real issue with promutils.NewLabelsFromString()
function, which must return error when improperly formatted Prometheus metric with labels is passed to it.
See https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#text-format-example

E.g. the promutils.NewLabelsFromString() must return error when the following strings are passed to it:

- `{foo:"bar"}`, since `:` is disallowed in Prometheus text exposition format. The corect value is `{foo="bar"}`
- `{"foo":"bar"}`, since label name shouldn't be quoted. The correct value is `{foo="bar"}`.

The reverted commit introduces another set of bugs, which happily accept the following invalid input:

- `{foo=~"bar"}`
- `{foo!="bar"}`
- `{foo!~"bar"}`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4284
See also https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4304
2023-05-12 16:07:37 -07:00
Aliaksandr Valialkin
160453b86c lib/protoparser/csvimport: properly parse the last empty column in CSV line
Do not ignore the last empty column in CSV line.
While at it, properly parse CSV columns in single quotes, e.g. `'foo,bar',baz` is parsed as two columns - `foo,bar` and `baz`

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048

See also https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4298
2023-05-12 15:51:41 -07:00
Aliaksandr Valialkin
b7fe7b801c Revert "lib/protoparser: fix skip csv line when metric can be collect from the line (#4298)"
This reverts commit 410ae99c2e.

Reason for revert: the commit masks the real issue instead of fixing it.
The real issue is that the scanner.NextColumn() skips the last column if it is empty.

The commit also introduces two bugs:

- a panic if all the metric values in CSV line are empty
- silent import of CSV lines with too small number of columns

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4048
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4298
2023-05-12 15:22:27 -07:00
Yury Molodov
f0fad01e8a vmui: add notification for non-matching queries (#4301)
vmui: add notification for non-matching queries (#4211)

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4211
2023-05-12 13:55:47 +02:00
Roman Khavronenko
5a7159ab2e docs: mention link to public relabeling playground (#4306)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-12 10:58:22 +02:00
Roman Khavronenko
ad2d079ba5 docs: update docs about VMUI pages (#4305)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-12 10:51:44 +02:00
Dmytro Kozlov
193a9c3328 lib/promrelabel: show error message if labels not in prometheus exposition format (#4304)
lib/promrelabel: show error message if labels not in prometheus exposition format

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4284
2023-05-12 10:42:56 +02:00
Dmytro Kozlov
410ae99c2e lib/protoparser: fix skip csv line when metric can be collect from the line (#4298)
* lib/protoparser: fix skip csv line when metric can be collect from the line

* lib/protoparser: fix comment
2023-05-12 11:04:16 +03:00
Yury Molodov
1e4a9a8dfe vmui: enhancements to top queries page (#4299)
* feat: improvement of the top queries page

* vmui/docs: enhancements to top queries page

* Apply suggestions from code review

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-05-11 13:47:32 -07:00
Alexander Marshalov
9855b38da2 fixed error with double slash in vmbackupmanager (#557)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-11 13:38:07 -07:00
Aliaksandr Valialkin
5d22c36904 docs/CHANGELOG.md: improve the description of the change at 7ea2531db0
Move the change description to the group of vmui changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4213
2023-05-11 13:30:33 -07:00
Aliaksandr Valialkin
73812c71a5 lib/promutils: properly parse time strings with timezones at ParseTime() 2023-05-11 13:24:00 -07:00
Roman Khavronenko
adc5635a07 vmalert: add hints to filter buttons (#4296)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-11 16:38:08 +02:00
Denys Holius
bb7a295146 app/vmctl/vm_native.go: fixed a typo in error message 2023-05-11 16:36:13 +02:00
Yury Molodov
a55d3f6882 vmui: increase font-size and fix the text display (#4273)
vmui: change default font size to 14px for better readability
vmui: fix bug with missing text on buttons in safari

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-05-11 14:50:09 +02:00
Dmytro Kozlov
7ea2531db0 app/vmui: added table where Labels with the highest number of unique values show (#4271)
* app/vmui: added Labels with the highest number of unique values

* app/vmui: cleanup

* app/vmui: cleanup

* app/vmui: add table description

* app/vmui: fix comment, updated CHANGELOG.md

* app/vmui: disable links

* app/vmui: added actions to the table, it will show values for selected label with the highest number of series

* app/vmui: fix comment
2023-05-11 15:19:36 +03:00
Aliaksandr Valialkin
da037cafc5 lib/bytesutil: go fmt after 2ec17bed2c 2023-05-10 20:29:03 -07:00
Aliaksandr Valialkin
86424e079e docs/CHANGELOG.md: fix typo after 2caf0b05c6 2023-05-10 13:03:20 -07:00
Aliaksandr Valialkin
2ec17bed2c lib/bytesutil: add benchmarks for ToUnsafeString() and ToUnsafeBytes() 2023-05-10 12:59:26 -07:00
Artem Navoiev
0ccb7f51dd fix typo in changelog
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-10 19:23:24 +02:00
Roman Khavronenko
2caf0b05c6 vmalert: correctly update seriesFetched metric for const exprs (#4287)
Previously, metric `vmalert_alerting_rules_last_evaluation_series_fetched`
would be set to 0 for const expressions, because const expression do not match
any series. This may result into a confusion: no series were matched but response isn't empty.
The change updates the logic behind metric: if no series were matched but there are samples
in response - use amount of samples as number of series.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-10 15:04:05 +02:00
Roman Khavronenko
51196739af Vmalert UI updates (#4276)
* vmalert: expand rule groups on anchor click

before, anchor click was only updating the URL.
To expand the group, user had to click on rule's block.
Now, group will toggle automatically.

* vmalert: allow filtering group in web UI

The new filter allows to filter groups and rules within
groups by: errors only or noMatch only.

The filtering supposed to help navigating big numbers of groups/rules.
Filtering is reflected in URL, so can be shared as a link.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-10 14:38:13 +02:00
Aliaksandr Valialkin
c8799a5d97 vendor: update github.com/valyala/gozstd from v1.19.1 to v1.20.1 2023-05-10 02:16:08 -07:00
Dmytro Kozlov
060a0cdbf4 docs: Add information about datasource plugin (#4266)
docs: Add information about datasource plugin
2023-05-10 10:26:57 +02:00
Alexander Marshalov
2e494e2375 fixed typos in documentation and commandline flags descriptions (#4275) 2023-05-10 09:50:41 +02:00
Aliaksandr Valialkin
9eb1abdefe vendor: make vendor-update 2023-05-09 23:13:50 -07:00
Aliaksandr Valialkin
4e0345a5ef docs/CHANGELOG.md: add a link to docs about never-firing alerts
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039
2023-05-09 21:58:36 -07:00
Aliaksandr Valialkin
78b23c9a83 docs/CHANGELOG.md: document 8f4de6fa47 2023-05-09 21:39:25 -07:00
Roman Khavronenko
491831df49 vminsert: properly reset labels object on aggregation (#4278)
Without reset, labels duplicates could have been added during stream aggregation.
Since `ctx.Labels` is reused during processing of many series, each series will
add its labels to the context. Even if the same labels were already addeded on prev
iteration. Now, we reset `ctx.Labels` on each iteration to contain so labels from
different series didn't interfere.

This could have cause exceeding of the limit on number of labels per pushed time series.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4277

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-09 08:33:58 -07:00
Aliaksandr Valialkin
b9bb64ce55 lib/promscrape/discovery/consulagent: substitute metaPrefix with the __meta_consulagent_ plaintext string
This simplifies future code navigation and search for the specific meta-label starting from __meta_consulagent_* prefix.
For example, `grep __meta_consulagent_namespace` finds the exact place where this label is defined.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3953
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4217
2023-05-08 23:40:13 -07:00
Aliaksandr Valialkin
7db647e924 lib/fs: move common code outside arch-specific implementations of mustRemoveDirAtomic()
This is a follow-up for 73b6c23271
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-05-08 23:10:20 -07:00
Aliaksandr Valialkin
83a1c2484c docs/CHANGELOG.md: group changelog lines for tip release according to VictoriaMetrics apps 2023-05-08 22:57:23 -07:00
Aliaksandr Valialkin
11eb94d3bc docs/CHANGELOG.md: document baf456978d
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4223
2023-05-08 22:26:04 -07:00
Aliaksandr Valialkin
ff3c90d305 docs/managed-victoriametrics/overview.md: typo fix after 51fbf58d89 2023-05-08 21:58:40 -07:00
Aliaksandr Valialkin
fc42617ecd docs/CHANGELOG.md: refer to the author and the pull request of the notifier_headers feature at vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3260
2023-05-08 17:18:48 -07:00
Aliaksandr Valialkin
887555669e Revert "lib/streamaggr: discard samples with timestamps outside of aggregation interval (#4199)"
This reverts commit 9e99f2f5b3.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4068

Reason for revert: this breaks valid use cases:

- If timestamps aren't specified in the incoming samples on purpose. For example, if stream aggregation is used
  as StatsD replacement. StatsD protocol has no timestamp concept for incoming samples.
  See https://github.com/b/statsd_spec

- If all the samples must be aggregated, even if they contain stale timestamps.
  for example, if the stream aggregation produces some counter of some events,
  it may be better to count all the events even if they were delayed before
  being ingested into VictoriaMetrics.

Is is also unclear how to determine whether the sample becomes stale.
For example, if the aggregation interval equals to 1h, and the previous
aggregation cycle just finished 10 minutes ago, what to do with the newly
incoming sample with the timestamp 30 minutes older than the current time?
The answer highly depends on the context, so it is unsafe to uncoditionally
use a single logic for dropping the old samples here.
2023-05-08 16:52:27 -07:00
Aliaksandr Valialkin
5a02bc56fb docs/CHANGELOG.md: document 03150c8973
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4204
2023-05-08 16:30:28 -07:00
Aliaksandr Valialkin
3c0470f91e docs/vmalert.md: clarify docs regarding the support of recursive globs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4041
2023-05-08 16:21:44 -07:00
Aliaksandr Valialkin
74155afb71 docs: clarify docs after 5ee344824f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183
2023-05-08 16:11:44 -07:00
Aliaksandr Valialkin
185894fc5a app/vmagent/remotewrite: make more user-friendly the warning message about too small -remoteWrite.maxdiskUsagePerURL value
This is a follow-up for bc17f4828c .
While at it, document the change at docs/CHANGELOG.md .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4195
2023-05-08 15:42:30 -07:00
Aliaksandr Valialkin
39530903bf docs/managed-victoriametrics: consistently use setup-notifications prefix for images used at docs/managed-victoriametrics/setup-notifications.md
This should simplify managment of the images belonging to the given docs.
See docs/assets/README.md for details

This is a follow-up for 4052c44ac1
2023-05-08 15:29:53 -07:00
Aliaksandr Valialkin
d906e83e5e app/vmauth: merge default_url example into multi-url example in order to reduce the amounts of text to read for the user
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4084

This is a follow-up for 041e188df8
2023-05-08 15:12:23 -07:00
Aliaksandr Valialkin
ec3943d14a app/vmselect: small cleanup after 4f3f9950d0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3807
2023-05-08 14:57:11 -07:00
Aliaksandr Valialkin
1db9b78b88 app/vmselect: small cleanup after 68e31a6000
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3811
2023-05-08 14:34:37 -07:00
Aliaksandr Valialkin
80946f06c2 app/{vmselect,vmctl}: move ParseTime() to lib/promutils
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4091

This is a follow-up for e2053baf32
2023-05-08 14:17:57 -07:00
Aliaksandr Valialkin
6601fa3e9d docs/CHANGELOG.md: typo fix after 45a551df9c: 'this doc' -> 'this feature request' 2023-05-08 13:41:58 -07:00
Aliaksandr Valialkin
92a549bccb app/vmauth/README.md: mention about ip filters and concurrency limiter at Security chapter 2023-05-08 13:35:58 -07:00
Aliaksandr Valialkin
23595465b8 app/vmauth: refer ip_filters option in example auth config
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3491
2023-05-08 13:29:18 -07:00
Aliaksandr Valialkin
8f43f496d7 docs: document IP filters functionality in vmauth
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3491

This is a follow-up for 2f08ed3be2
2023-05-08 12:12:16 -07:00
Aliaksandr Valialkin
09268d41ed app/vmauth: remove duplicate mentioning of -auth.config value in error message in logs on usuccessful load of -auth.config
This is a follow-up for 25759082f4
2023-05-08 10:16:16 -07:00
Aliaksandr Valialkin
1b288e0a05 all: update Go builder from Go1.20.3 to Go1.20.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.4+label%3ACherryPickApproved
2023-05-08 09:40:55 -07:00
Aliaksandr Valialkin
24eeb8d9af docs/CHANGELOG.md: document c77385e78f 2023-05-08 08:50:30 -07:00
Alexander Marshalov
8225a48b56 fixed vm_promscrape_config_last_reload_successful metric value recovery after successful reloading with unchanged content (#4260) (#4268)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-08 13:32:51 +02:00
Roman Khavronenko
fa3a17938e vmalert: follow-up after cae87da (#4269)
* vmalert: follow-up after cae87da

cae87da4bb
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: update struct comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: rm typo

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 13:31:54 +02:00
Alexander Marshalov
c0cb3e4f98 update generated docs for operator (#4267)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-08 12:39:44 +02:00
Haleygo
cae87da4bb vmalert: support reading rule from http url (#4212)
vmalert: support reading rule's config from HTTP URL
2023-05-08 09:52:57 +02:00
Roman Khavronenko
5b8450fc1b app/vmalert: detect alerting rules which don't match any series at all (#4198)
app/vmalert: detect alerting rules which don't match any series at all

vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4056

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-08 09:36:39 +02:00
Zakhar Bessarab
d71a2605d1 deployment/docker: allow overriding docker namespace (#4265)
It makes it easier for users who build and self-host images to publish their images without changing tags manually.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-08 10:34:05 +04:00
Roman Khavronenko
01520d3e5d alerts: update TooHighMemoryUsage threshold (#4256)
It appears that 90% usage for anonymous mem usage
is already concerning. So we lowering the threshold to 80%.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-07 22:18:56 +02:00
Nikolay
8f4de6fa47 lib/storage: properly update link for entry at dateMetricID cache (#4258)
previously during sync for mutable and immutable cache parts, link for hotEntry with current date may be not properly updated
it corrupts cache for backfilling metrics and increased cpu load
2023-05-05 21:45:47 -07:00
Zakhar Bessarab
4e71003620 lib/promscrape/discovery/kubernetes: follow-up for d5e94721db (#4255)
- add changelog reference to an author
- fix tests
- add metadata to match Prometheus behavior

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-05 14:41:17 +02:00
Vasilchenko Anton
22e65402af Add endpoint labels for pod targets discovered form endpoint but has different ports (#4253)
Signed-off-by: Vasilchenko Anton <vasilchenko-as@yandex.ru>
2023-05-05 15:46:07 +04:00
Zakhar Bessarab
aca256735c lib/storage: fix indexdb rotation infinite loop (#4249)
When using `retentionTimezoneOffset` and having local timezone being more than 4 hours different from UTC indexdb retention calculation could return negative value. This caused indexdb rotation to get in loop.
Fix calculation of offset to use `retentionTimezoneOffset` value properly and add test to cover all legit timezone configs.
See:
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4207
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4206

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-05-04 17:16:48 +02:00
Alexander Marshalov
56b84140a9 added new consulagent service discovery (#3953) (#4217) 2023-05-04 11:36:21 +02:00
Alexander Marshalov
2eb27ddb22 max value for memory.allowedPercent changed from 200 to 100 (#4171) (#4251)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-05-04 11:34:57 +02:00
Zakhar Bessarab
ddcc3b1e9d docs: changelog follow-up for 49b77ec01a (#4250)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-03 16:42:22 +04:00
justcompile
49b77ec01a squash commits (#4166) 2023-05-03 10:51:08 +02:00
Nikolay
4786f036de lib/backup: fixes path generation for windows (#4133)
replaces custom fsync function with standard Fsync methods for files.
fixes pattern matching for parts and properly generate backup path for local fs.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-05-03 10:48:53 +02:00
Nikolay
73b6c23271 lib/fs: do not panic at windows at dir deletion (#4132)
Windows doesn't allow to remove dir with opened files. Usually it's a case for snapshots, hard cannot be removed if file is openned.
With this change, dir will be renamed and properly deleted at the next process start.
It's recommended to restart vmstorage/vmsingle for snapshots deletion completion periodically.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-05-03 10:47:02 +02:00
Roman Khavronenko
baf456978d vmselect: exit early from queue on context cancel (#4223)
* vmselect: exit early from queue on context cancel

When `-search.maxConcurrentRequests` is reached, vmselect puts
request in the queue. It is expected, that requests in the queue
will be processed as soon as it would be enough capacity to do so.

However, it could happen that while request was waiting its turn,
the client could have already cancel it (close the connection,
or just close the tab with UI). In this case, we should de-queue
such requests to avoid spending extra resources on them.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmselect: address review comments

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-05-03 10:42:17 +02:00
Zakhar Bessarab
bf3b6732bd lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints (#4235)
* lib/promscrape/discovery/kubernetes: add common labels to all ports discovered from endpoints

Sets
`__meta_kubernetes_endpoints_name` and `__meta_kubernetes_namespace` labels to all ports of pod.
Prometheus sets those labels to all ports in pod (0ab9553611/discovery/kubernetes/endpoints.go (L267C15-L269)) even if port is not matching any service.

See: #4154

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape/discovery/kubernetes: fix test for updated discovery logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-05-03 02:17:33 +02:00
Artem Navoiev
4ddfc67d54 remove information of releasing graphite render api from tip section as we released it in 1.90
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-02 15:30:22 +02:00
Max Golionko
f3b829125e add vmsingle filter for health alerts (#4238) 2023-05-02 20:54:42 +08:00
Artem Navoiev
5dde81259c prepare operator docs to migration
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-01 12:32:58 +02:00
Artem Navoiev
51fbf58d89 update managed docs - prepare for migration
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-05-01 12:19:57 +02:00
Roman Khavronenko
3d955d1078 docs: note automatic conversion to ms for influx protocol (#4224)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-30 17:13:35 +04:00
Artem Navoiev
a5fb8b93a8 add wight do trobubleshooting docs
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-04-30 12:33:36 +02:00
Artem Navoiev
ee1da35071 prepare static docs to migration
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-04-30 12:33:36 +02:00
Roman Khavronenko
3383f12a4b vmalert: fix API to return non-nil values (#4222)
Properly return empty slices instead of nil for `/api/v1/rules` and `/api/v1/alerts` API handlers.
This improves compatibility with Grafana.

 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-28 10:08:29 +02:00
Roman Khavronenko
eb746a4dab Revert "http server: limit max concurrent requests (#4185)" (#4215)
This reverts commit 77f76371

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-27 13:02:47 +02:00
Roman Khavronenko
29e059e49c app/vmalert: follow-up after 6c322b4a00 (#4214)
6c322b4a00

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-27 13:02:21 +02:00
Haleygo
6c322b4a00 vmalert: allow configuring custom notifier headers per group (#4088)
vmalert: allow configuring custom notifier headers per group
2023-04-27 12:17:26 +02:00
Zakhar Bessarab
9e99f2f5b3 lib/streamaggr: discard samples with timestamps outside of aggregation interval (#4199)
* lib/streamaggr: discard samples with timestamps not matching aggregation interval

Samples with timestamps lower than `now - aggregation_interval` are likely to be written via backfilling and should not be used for calculation of aggregation.
See #4068

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/streamaggr: make log message more descriptive, fix imports

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-27 11:59:49 +02:00
Haleygo
03150c8973 lib/opentsdbhttp: fix a typo preventing from using writeconcurrencylimiter (#4208) 2023-04-27 09:22:42 +02:00
Zakhar Bessarab
b21a55febf app/vmalert: add support of recursive path globs for rules and templates (#4148)
Supports using `**` for `-rule` and `-rule.templates`: `dir/**/*.tpl` loads contents of dir and all subdirectories recursively.

See: #4041

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-04-26 19:20:22 +02:00
Nikolay
5ee344824f lib/promscrape: adds filter for consul_sd_configs: (#4184)
* lib/promscrape: adds filter for consul_sd_configs:
it allows advanced filtering for consul service discovery requests
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4183

* typo fix

* removes deprecation mentions since it's not relevant

* Update docs/CHANGELOG.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-04-26 19:16:27 +02:00
Max Golionko
5c955dd876 alerts: relax job filter to support job names created by VMOperator (#4203) 2023-04-26 15:32:25 +02:00
Zakhar Bessarab
89a1c941c2 app/vmalert: return an error when using query function in -external.alert.source flag (#4191)
Templating of `-external.alert.source` is not expected to have access to the query which was causing runtime error when query function was passed as nil.
See: #4181

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-26 15:31:14 +02:00
7840vz
43ededf3a1 alerts: decrease severity to info for RecordingRulesNoData (#4089) 2023-04-26 12:57:41 +02:00
Dmytro Kozlov
bc17f4828c app/vmagent,lib/persistentqueue: show warning message if --remoteWrite.maxDiskUsagePerURL flag lower than 500MB (#4196)
* app/vmagent,lib/persistentqueue: show warning message if `--remoteWrite.maxDiskUsagePerURL` flag lower than 500MB

* app/vmagent,lib/persistentqueue: linter fix

* app/vmagent,lib/persistentqueue: fix comment
2023-04-26 13:23:01 +03:00
dmitryk-dk
368571be00 docs: fix type 2023-04-26 11:50:17 +02:00
dmitryk-dk
0660cc7128 app/vmctl: fix comments 2023-04-26 11:50:17 +02:00
dmitryk-dk
cafad95790 app/vmctl: fix comments 2023-04-26 11:50:17 +02:00
dmitryk-dk
2b34c01f6d docs/managed-victoriametrics: change emails 2023-04-26 11:50:17 +02:00
dmitryk-dk
4052c44ac1 docs/managed-victoriametrics: add notifications setup 2023-04-26 11:50:17 +02:00
Alexander Marshalov
041e188df8 added default_url field in vmauth users config (#4084) (#4156)
* added default url field in vmauth users config (#4084)

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-04-26 11:04:35 +02:00
Yury Molodov
4f3f9950d0 vmui: add metric relabel debug (#3889)
* feat: add metric relabel debug (#3807)

* fix: add link to relabeling cookbook

* lib/promrelabel: merge, fix conflicts

* lib/promrelabel: fix diff

* docs/vmui: add metric relabel playground

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-04-26 11:53:29 +03:00
Alexander Marshalov
45a551df9c changelog for issue #4083 (#4197)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-04-26 10:50:44 +02:00
Yury Molodov
cf567badcf vmui: display heatmap in the Explore Metrics (#4124)
* feat: display heatmap in the explore metrics (#4111)

* fix: correct calc step for heatmap

* fix: remove spaces in the result of getDurationFromMilliseconds
2023-04-25 13:14:57 +03:00
Yury Molodov
a80b0aebe8 vmui: add a comparison of data to the Cardinality Explorer (#4123)
* feat: add button "show today" to date picker

* feat: add comparison with the prev day (#3967)

* vmui/docs: add comparison of data to cardinality page
2023-04-25 12:21:57 +03:00
Yury Molodov
752895d1ee docs/vmui: fix CHANGELOG.md about WITH templates (#4194) 2023-04-25 12:05:14 +03:00
Yury Molodov
68e31a6000 vmui: Integrate WITH template playground (#3831)
* feat: add WithTemplate page

* app/vmselect/prometheus: enable json mode for expand with expr API

* app/vmselect/prometheus: enable CORS and add content type

* feat: add api for expand with templates

* fix: remove console from useExpandWithExprs

* app/vmselect/prometheus: fix escaping

* vmui:  integrate WITH template

* app/vmctl: check content type instead of form param

* fix: add content-type for fetch with-exprs

* fix: add a header to the server's response that allows the "Content-Type" header

* app/vmctl: added comment and cleanup

* app/vmctl: use format query param

---------

Co-authored-by: dmitryk-dk <kozlovdmitriyy@gmail.com>
2023-04-25 11:40:01 +03:00
Dmytro Kozlov
e2053baf32 app/vmctl: add support for the different time format in the native binary protocol (#4189)
* app/vmctl: add support for the different time format in the native binary protocol

* app/vmctl: update flag description, update CHANGELOG.md

* app/vmctl: add comment to exported function
2023-04-24 18:33:30 +02:00
Alexander Marshalov
73e22dcf81 added unauthorized_user field in vmauth users config (#4083) (#4157)
added `unauthorized_user` field in vmauth users config (#4083)

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-04-24 14:57:13 +02:00
Roman Khavronenko
77f76371d0 http server: limit max concurrent requests (#4185)
* lib/httpserver: introduce `-http.maxConcurrentRequests` command-line flag

Introduce `-http.maxConcurrentRequests` command-line flag to protect
VM components from resource exhaustion during unexpected spikes of HTTP requests.
By default, the new flag's value is set to 0 which means no limits are applied.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/httpserver: mention http.maxConcurrentRequests in docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-24 14:52:06 +02:00
Alexander Marshalov
31e174977e doc improvements (#4172) (#4186)
- added info about metric `vm_vminsert_metrics_read_total`,
- small doc refactoring
- and added make-command for running docs in docker.

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-04-24 13:38:00 +02:00
Yury Molodov
05ab34f2c8 vmui: fix freeze when query regular with heatmap query (#4093)
* fix: fix freeze when query regular with heatmap query

* vmui/docs: fix freeze when query regular with heatmap query
2023-04-21 11:59:09 +03:00
Yury Molodov
3140aa34de vmui: fix bug where tenant list was not displayed (#4162)
* fix: modify the condition for querying tenants

* fix: change getTenantIdFromUrl output to string
2023-04-21 11:56:08 +03:00
Alexander Marshalov
25759082f4 vmauth ip filters (refactoring) (#4059)
Added ip filters (allow_list and deny_list) for enterprise-version of vmauth (#3491)

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-04-20 19:08:27 +02:00
Artem Navoiev
87b925afb2 move note about opensource of graphite in v1.90 release note
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-04-19 16:34:00 +02:00
Roman Khavronenko
de61a73c63 vmalert: retry datasource requests with EOF or unexpected EOF errors (#4146)
* vmalert: retry datasource requests with EOF or unexpected EOF errors

Retry failed read request on the closed connection one more time.
This may improve rules execution reliability when connection
between vmalert and datasource closes unexpectedly.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: fix old tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-04-19 10:18:32 +02:00
Zakhar Bessarab
472fe3fd03 lib/httpserver: add handler to serve /robots.txt and deny search indexing (#4143)
This handler will instruct search engines that indexing is not allowed for the content exposed to the internet. This should help to address issues like #4128 when instances are exposed to the internet without authentication.
2023-04-18 16:47:26 +04:00
Balamurugan Krishnamoorthy (Bala)
06aefcec81 Removed duplicate third-party article reference (#4142)
"How do We Keep Metrics for a Long Time in VictoriaMetrics" article is referenced twice in "Third-party articles and slides about VictoriaMetrics" section
2023-04-18 14:03:06 +04:00
Artem Navoiev
413701454c add graphite render api opensource to changelog
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-04-18 11:09:02 +02:00
Aliaksandr Valialkin
2a4c48c59d lib/{mergeset,storage}: make mustReadPartNames() code more clear 2023-04-14 23:16:59 -07:00
Aliaksandr Valialkin
52006149b2 lib/storage: replace OpenStorage() with MustOpenStorage()
Callers of OpenStorage() log the returned error and exit.
The error logging and exit can be performed inside MustOpenStorage()
alongside with printing the stack trace for better debuggability.
This simplifies the code at caller side.
2023-04-14 23:02:40 -07:00
Aliaksandr Valialkin
2a2036160d lib/storage: fix a bug, which prevents from reading pre-v1.90.0 parts
The bug has been introduced in c0b852d50d
2023-04-14 22:33:08 -07:00
Aliaksandr Valialkin
3727251910 lib/fs: add MustReadDir() function
Use fs.MustReadDir() instead of os.ReadDir() across the code in order to reduce the code verbosity.
The fs.MustReadDir() logs the error with the directory name and the call stack on error
before exit. This information should be enough for debugging the cause of the error.
2023-04-14 22:10:46 -07:00
Aliaksandr Valialkin
60d92894c5 lib/storage: validate rows in partition.AddRows() only during tests 2023-04-14 20:52:36 -07:00
Aliaksandr Valialkin
df619bdff0 all: consistently use fs.MustClose() for closing lock files 2023-04-14 20:14:21 -07:00
Aliaksandr Valialkin
2a3b19e1d2 lib/fs: convert CreateFlockFile to MustCreateFlockFile
Callers of CreateFlockFile log the returned err and exit.
It is better to log the error inside the MustCreateFlockFile together with the path
to the specified directory and the call stack. This simplifies
the code at the callers' side while leaving the debuggability at the same level.
2023-04-14 19:50:01 -07:00
Aliaksandr Valialkin
c0b852d50d lib/{storage,mergeset}: convert InitFromFilePart to MustInitFromFilePart
Callers of InitFromFilePart log the error and exit.
It is better to log the error with the path to the part and the call stack
directly inside the MustInitFromFilePart() function.
This simplifies the code at callers' side while leaving the same level of debuggability.
2023-04-14 15:46:12 -07:00
Aliaksandr Valialkin
9183a439c7 lib/filestream: change Create() to MustCreate()
Callers of this function log the returned error and exit.
It is better logging the error together with the path to the filename
and call stack directly inside the function. This simplifies
the code at callers' side without reducing the level of debuggability
2023-04-14 15:12:48 -07:00
Aliaksandr Valialkin
5eb163a08a lib/filestream: transform Open() -> MustOpen()
Callers of this function log the returned error and exit.
Let's log the error with the path to the filename and call stack
inside the function. This simplifies the code at callers' side
without reducing the level of debuggability.
2023-04-14 15:03:42 -07:00
Aliaksandr Valialkin
fda1a54343 lib/fs: improve error logging at ReaderAt.MustReadAt()
- Add 'BUG:' prefix to error messages related to programming errors aka bugs.
- Consistently log the path to the file in all the messages in order to improve debuggability.
2023-04-14 14:51:06 -07:00
Aliaksandr Valialkin
f341b7b3f8 lib/fs: substitute ReadFullData with MustReadData
Callers of ReadFullData() log the error and then exit.
So let's log the error with the path to the filename and the call stack
inside MustReadData(). This simplifies the code at callers' side,
while leaving the debuggability at the same level.
2023-04-14 14:39:29 -07:00
Aliaksandr Valialkin
bd6de6406a lib/fs: improve error logging inside MustWriteData
Log the path to file on errors inside MustWriteData().
This improves debuggability of errors, which may occur inside MustWriteData().
2023-04-14 14:32:45 -07:00
Aliaksandr Valialkin
4b43c91f8c vendor: update github.com/VictoriaMetrics/metricsql from v0.56.1 to v0.56.2
This fixes panic when the duration in the query contains `M` suffix.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4120
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3589
2023-04-14 14:05:56 -07:00
Aliaksandr Valialkin
e0595af2bf lib/{mergeset,storage}: remove isInMerge flag from parts only when they werent removed yet from the list of active parts
This prevents from possible panic during access to pw.p when it is set to nil at partWrapper.decRef() called inside swapSrcWithDstParts()
2023-04-14 00:08:11 -07:00
Aliaksandr Valialkin
88a4b9c313 docs/vmctl.md: run make docs-sync after 2a5b9ff782 2023-04-13 23:54:41 -07:00
Aliaksandr Valialkin
5e87e03409 docs/CHANGELOG.md: move the bugfix description into the correct place
This is a follow-up for 2a5b9ff782

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4092
2023-04-13 23:49:53 -07:00
Aliaksandr Valialkin
9f8209d593 docs/CHANGELOG.md: run at least 4 background mergers on systems with less than 4 CPU cores
This reduces the probability of sudden spike in the number of small parts when all the background mergers
are busy with big merges.
2023-04-13 23:43:17 -07:00
Aliaksandr Valialkin
550d5c7ea4 lib/{mergeset,storage}: make sure that getFlushToDiskDeadline() takes into account only in-memory parts 2023-04-13 23:43:17 -07:00
Dmytro Kozlov
2a5b9ff782 app/vmctl: fix performance degradation, add flag to disable backoff policy (#4097)
* app/vmctl: change api for getting metric names

* app/vmctl: fix tests

* app/vmctl: add flag to enable backoff policy, fix test, performance improvements

* app/vmctl: use one http client

* app/vmctl: made linter happy

* app/vmctl: updated documentation and CHANGELOG.md

* app/vmctl: cleanup

* app/vmctl: rename flag

* app/vmctl: cleanup

* app/vmctl: fix comments

* app/vmctl: fix metrics parser problem, improve tests
2023-04-14 09:34:54 +03:00
Aliaksandr Valialkin
809fbaeaac lib/fs: add Must prefix to CopyDirectory and CopyFile functions
Callers of these functions log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 23:02:59 -07:00
Aliaksandr Valialkin
780abc3b3b lib/fs: rename SymlinkRelative to MustSymlinkRelative
Callers of this function log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 22:52:55 -07:00
Aliaksandr Valialkin
5f487ed996 lib/fs: rename HardLinkFiles to MustHardLinkFiles
Callers of this function log the returned error and then exit.
Let's log the error with the call stack inside the function itself.
This simplifies the code at callers' side, while leaving the same
level of debuggability in case of errors.
2023-04-13 22:48:07 -07:00
Aliaksandr Valialkin
30425ca81a lib/fs: rename WriteFileAtomically to MustWriteAtomic
Callers of this function log the returned error and exit.
So let's just log the error with the given filepath and the call stack
inside the function itself and then exit. This simplifies the code
at callers' place while leaves the same level of debuggability in case of errors.
2023-04-13 22:41:15 -07:00
Aliaksandr Valialkin
036a7b7365 lib/fs: replace MkdirAllIfNotExist->MustMkdirIfNotExist and MkdirAllFailIfExist->MustMkdirFailIfExist
Callers of these functions log the returned error and then exit. The returned error already contains the path
to directory, which was failed to be created. So let's just log the error together with the call stack
inside these functions. This leaves the debuggability of the returned error at the same level
while allows simplifying the code at callers' side.

While at it, properly use MustMkdirFailIfExist instead of MustMkdirIfNotExist inside inmemoryPart.MustStoreToDisk().
It is expected that the inmemoryPart.MustStoreToDick() must fail if there is already a directory under the given path.
2023-04-13 22:11:59 -07:00
Aliaksandr Valialkin
344209e5e6 lib/fs: rename MustWriteFileAndSync to MustWriteSync in order to improve readability a bit
This is a follow-up for 2a8395be05
2023-04-13 21:43:32 -07:00
Aliaksandr Valialkin
b15c5961ab lib/{mergeset,storage}: remove unused path field from blockStreamWriter
This is a follow-up after 42bba64aa7
2023-04-13 21:39:59 -07:00
Aliaksandr Valialkin
2a8395be05 lib/fs: replace WriteFileAndSync with MustWriteAndSync
When WriteFileAndSync fails, then the caller eventually logs the error message
and exits. The error message returned by WriteFileAndSync already contains the path
to the file, which couldn't be created. This information alongside the call stack
is enough for debugging the issue. So just use log.Panicf("FATAL: ...") inside MustWriteAndSync().
This simplifies error handling at caller side a bit.
2023-04-13 21:33:19 -07:00
Aliaksandr Valialkin
25f089de9d lib/{mergeset,storage}: properly fsync part directory listing after writing in-memory part to disk
This is a follow-up after 42bba64aa7

Previously the part directory listing was fsync'ed implicitly inside partHeader.WriteMetadata()
by calling fs.WriteFileAtomically(). Now it must be fsync'ed explicitly.

There is no need in fsync'ing the parent directory, since it is fsync'ed by the caller
when updating parts.json file.
2023-04-13 21:19:04 -07:00
Aliaksandr Valialkin
42bba64aa7 lib/{mergeset,storage}: explicitly fsync the created part directory listing
Previously the created part directory listing was fsynced implicitly
when storing metadata.json file in it.

Also remove superflouous fsync for part directory listing,
which was called at blockStreamWriter.MustClose().
After that the metadata.json file is created, so an additional fsync
for the directory contents is needed.
2023-04-13 21:03:08 -07:00
Aliaksandr Valialkin
e1211a1187 app/vmstorage: deprecate -bigMergeConcurrency command-line flag
Improperly configured -bigMergeConcurrency command-line flag usually leads to uncontrolled
growth of unmerged parts, which, in turn, increases CPU usage and query durations.

So it is better deprecating this flag. In rare cases -smallMergeConcurrency command-line flag
can be used instead for controlling the concurrency of background merges.
2023-04-13 20:40:24 -07:00
Aliaksandr Valialkin
ca54e58c1f lib/{fs,persistentqueue}: use filepath.Join() instead of concatenating path parts with /
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-04-13 20:13:45 -07:00
Aliaksandr Valialkin
90b876cd1e app/vmbackupmanager: sync with enterprise-single-node branch after 41a54c775891c87e3d5ed59ff0769c869dd2fe71 2023-04-13 19:29:06 -07:00
Alexander Marshalov
47e16594dd Added extra information to docs about total output in stream aggregation. (#4130)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-04-13 16:08:07 +02:00
Denys Holius
174a70369c deleted unnecessary file (#4126) 2023-04-12 20:55:39 +04:00
Zakhar Bessarab
81f28f0f1f lib/backup/actions: store metadata(creation and completion time) in backup files (#4117)
This makes it easier to understand exact point in time which is included in this backup.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-12 18:51:27 +02:00
Denys Holius
51466ce379 Fix/snap build (#4125)
* snap/snapcraft.yaml: bump go-channel to latest 1.20/stable

* snap/local/Makefile: fixed typo in output snap file
2023-04-12 18:50:56 +02:00
Aliaksandr Valialkin
b9ab07ced9 vendor: update github.com/valyala/gozstd from v1.19.0 to v1.19.1 2023-04-10 11:30:13 -07:00
Aliaksandr Valialkin
1fe87691ec app/vmbackupmanager/README.md: sync with docs/vmbackupmanager.md after 4b2cc1b32c 2023-04-10 10:51:49 -07:00
Aliaksandr Valialkin
7849545e78 docs/Single-server-VictoriaMetrics.md: document automatic switch from graph view to heatmap view for histogram buckets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3384
2023-04-10 10:49:44 -07:00
Aliaksandr Valialkin
ddb8883817 docs/Troubleshooting.md: mention that it is recommended to use default command-line flag values for VictoriaMetrics components 2023-04-10 10:48:25 -07:00
Haleygo
0ad6010c91 fix sort pendingDateMetricsIDs (#4102) 2023-04-10 10:23:12 -07:00
Aliaksandr Valialkin
b7cce552da vendor: make vendor-update 2023-04-10 10:20:54 -07:00
Aliaksandr Valialkin
0e1e0b036a deployment: update VictoriaMetrics from v1.89.1 to v1.90.0
See https://docs.victoriametrics.com/CHANGELOG.html#v1900
2023-04-06 19:08:11 -07:00
Aliaksandr Valialkin
9586dabd94 docs/guides: update VictoriaMetrics from v1.89.1 to v1.90.0 2023-04-06 19:07:38 -07:00
Aliaksandr Valialkin
05d44525b7 docs/CHANGELOG.md: formatting fix 2023-04-06 19:04:13 -07:00
Aliaksandr Valialkin
b5d18c0d28 app/vmctl/terminal: fix builds for GOOS=freebsd and GOOS=openbsd
This is a follow-up for 8da9502df6
2023-04-06 17:09:07 -07:00
Aliaksandr Valialkin
28975067c6 docs/CHANGELOG.md: cut v1.90.0 release 2023-04-06 16:16:42 -07:00
Aliaksandr Valialkin
a3eebf118e app/vmselect/vmui: run make vmui-update after 01fc228fb0 2023-04-06 15:07:41 -07:00
Dmytro Kozlov
244c18fa38 app/vmctl: add multiple filters defined in --vm-native-filter-match flag to discovered metric names (#4063)
* app/vmctl: add multiple filters defined in `--vm-native-filter-match` flag to discovered metric names

* app/vmctl: fix comments

* app/vmctl: move function buildMatchWithFilter to the correct place

* app/vmctl: update CHANGELOG.md

* app/vmctl: fix CI, remove error wrapping

* app/vmctl: fix CI, simplify `Set()`
2023-04-06 15:06:52 -07:00
Yury Molodov
01fc228fb0 vmui: heatmap fixes (#4086)
* fix: correct display of errors for query

* fix: change the logic of histogram detection

* feat: hide empty buckets from the graph

* fix: revert server url
2023-04-06 15:02:44 -07:00
Aliaksandr Valialkin
ee80e71d17 docs/CHANGELOG.md: document the bugfix, which remove unneeded logger.Errorf() call during stream aggregation with the enabled deduplication
This is a follow-up for ff72ca14b9
2023-04-06 15:00:42 -07:00
Aliaksandr Valialkin
4770377fb3 app/vmselect/vmui: run make vmui-update after a1601929ec 2023-04-06 03:20:13 -07:00
Aliaksandr Valialkin
44aad84a53 docs/CHANGELOG.md: document that VictoriaMetrics for Windows cannot delete snapshots
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70#issuecomment-1491529183
2023-04-06 03:16:06 -07:00
Aliaksandr Valialkin
608d87273d docs/CHANGELOG.md: document v1.79.12 2023-04-06 03:10:01 -07:00
Aliaksandr Valialkin
7a65329e65 docs/CHANGELOG.md: document v1.87.5 2023-04-06 00:44:07 -07:00
Timur Bakeyev
37a7627254 Fix cut-n-paste error (#4079)
It seems that VMServiceScrape description was c-n-p from vmselect one into all other resources.
2023-04-06 11:02:12 +04:00
Yury Molodov
a1601929ec fix: correct the description of shortcut keys (#4057) 2023-04-05 22:19:36 -07:00
Zakhar Bessarab
4b2cc1b32c docs: fix example operator spec for vmbackupmanager restore usage (#4074)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-05 22:16:39 -07:00
Yury Molodov
74eea53dee vmui: implement heatmap improvements (#4078)
* fix: disabled limits for histogram

* fix: add sorted buckets by upper bound

* refactor: move line chart components to folder

* feat: implement heatmap improvements (https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3384#issuecomment-1484023162)

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-04-05 22:13:57 -07:00
Aliaksandr Valialkin
593c151831 lib/encoding: fix test after 4725549cb2 2023-04-05 21:38:37 -07:00
Aliaksandr Valialkin
4725549cb2 vendor: update github.com/klauspost/compress from v1.16.3 to v1.16.4
See https://github.com/klauspost/compress/releases/tag/v1.16.4
2023-04-05 21:25:35 -07:00
Aliaksandr Valialkin
29a692f278 vendor: update github.com/valyala/gozstd from v1.18.0 to v1.19.0 2023-04-05 20:53:30 -07:00
Timur Bakeyev
d87a700528 Fix reference to the imagepullsecrets description (#4080)
Looks like the original document has moved to the https://kubernetes.io/docs/concepts/containers/images/#referring-to-an-imagepullsecrets-on-a-pod.
Alternatively, it could be that https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ describes the meaning of the parameter in more detail.
2023-04-05 19:56:28 -07:00
Aliaksandr Valialkin
8249451dbb docs/Troubleshooting.md: add missing help word after c7bcf750c2d031b1259cd8115d7464b67f40cb9eg 2023-04-05 14:28:09 -07:00
Aliaksandr Valialkin
13668fc935 docs/Troubleshooting.md: another typo fixes after c7bcf750c2d031b1259cd8115d7464b67f40cb9eg 2023-04-05 14:14:41 -07:00
Aliaksandr Valialkin
052160dcdc docs/Troubleshooting.md: fix formatting after c7bcf750c2 2023-04-05 13:46:19 -07:00
Aliaksandr Valialkin
a265da4f53 docs/Troubleshooting.md: fix a typo in the link after c7bcf750c2 2023-04-05 13:40:24 -07:00
Aliaksandr Valialkin
5074cc672a all: update Go builder from Go1.20.2 to Go1.20.3
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.3+label%3ACherryPickApproved
2023-04-05 13:37:22 -07:00
Aliaksandr Valialkin
c7bcf750c2 docs/Troubleshooting.md: add General troubleshooting checklist
This checklist helps searching for the infromation related to some issue / question
about VictoriaMetrics
2023-04-05 13:29:54 -07:00
Artem Navoiev
59102db4cf update changelog
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-04-05 15:45:38 +02:00
Zakhar Bessarab
8a6acce7d3 deployment/docker: update Grafana URLs to match latest format (#4060)
See: #4019

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-04 15:26:08 +04:00
Zakhar Bessarab
a8d1497024 app/vmalert: update Grafana URLs to match latest format (#4061)
See: #4019

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-04-04 15:25:29 +04:00
Artem Navoiev
33c6cc2530 fix closing divs in docs
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-04-04 13:00:22 +02:00
Aliaksandr Valialkin
19b189e9b7 lib/storage: use shorter code after 03bde173b7 2023-04-02 21:35:52 -07:00
faceair
38fc55976e lib/storage: fix reuse pendingMetricRow (#4049) 2023-04-02 21:35:50 -07:00
Aliaksandr Valialkin
55b5276b70 docs/CHANGELOG.md: document edb45d7fc1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4013
2023-04-02 21:26:12 -07:00
faceair
f3af8331ec lib/storage: remove unused code (#4050) 2023-04-02 21:24:42 -07:00
Aliaksandr Valialkin
de0fe02f6e app/vmselect/vmui: run make vmui-update after edb45d7fc1 2023-04-02 21:21:51 -07:00
Yury Molodov
edb45d7fc1 feat: add accept/cancel buttons for settings (#4013) (#4052) 2023-04-02 21:20:10 -07:00
Aliaksandr Valialkin
f638496298 lib/promscrape: do not re-use previously loaded scrape targets on failed attempt to load updated scrape targets at file_sd_configs
The logic employed for re-using the previously loaded scrape target was broken initially.
The commit cc0427897c tried to fix it, but the new logic
became too complex and fragile. So it is better to just remove this logic,
since the targets from temporarily broken file should be eventually loaded on next
attempts every -promscrape.fileSDCheckInterval

This also allows removing fragile hacks around __vm_filepath label.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3989
2023-04-02 21:05:28 -07:00
Dmytro Kozlov
cc0427897c lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read (#4027)
* lib/promscrape: fix the problem with scrape work duplicates when file_sd_config can't be read

* lib/promscrape: clarified comment

* lib/promscrape: made better approach to handle a problem with growing []*ScrapeWork on each error when loading config

* lib/promscrape: added CHANGELOG.md

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-04-02 20:26:13 -07:00
Aliaksandr Valialkin
06b721dd07 app/vmselect/vmui: run make vmui-update after 42087518ba 2023-04-01 00:40:49 -07:00
Yury Molodov
42087518ba vmui: tips for working with the graph and legend (#4045)
* feat: add tips for working with the graph and legend

* feat: add the ability to collapse the legend

* vmui/docs: add the ability to collapse the legend

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-04-01 00:38:18 -07:00
Aliaksandr Valialkin
02b714c110 vendor: make vendor-update 2023-03-31 23:59:34 -07:00
Yury Molodov
dd200409d9 vmui: add a tip for JSON and Table tabs (#4000)
* feat: add a tip for JSON and Table tabs

* feat: add Hyperlink component

* fix: update Hyperlink

* fix: update link to instant query
2023-03-31 23:53:06 -07:00
Roman Khavronenko
27b958ba8b lib/storage: check for free disk space before opening tables (#4035)
* lib/storage: check for free disk space before opening tables

We check for free disk space before call to `openTable`,
so `Storage` can be set to ReadOnly before mergeWorkers start.

Before the change, there was a chance that merges will start
even if Storage has to start in ReadOnly mode because of
`-storage.minFreeDiskSpaceBytes` limit.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4023
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* lib/storage: chore

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update lib/storage/storage.go

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-31 23:50:27 -07:00
Aliaksandr Valialkin
ffdf430be0 app/vmselect/graphite: open source Graphite Render API 2023-03-31 23:25:04 -07:00
Aliaksandr Valialkin
cddfc4d3f8 deployment/docker: update base Docker image from Alpine 3.17.2 to Alpine 3.17.3
This fixes security issues from https://alpinelinux.org/posts/Alpine-3.17.3-released.html

This is a follow-up for 59c350d0d2
2023-03-31 22:46:27 -07:00
Aliaksandr Valialkin
4d00107b92 lib/fs: follow-up for ec45f1bc5f
Properly close response body before checking for the response code.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4034
2023-03-31 22:42:10 -07:00
Aliaksandr Valialkin
d577657fb7 lib/streamaggr: follow-up for ff72ca14b9
- Make sure that the last successfully loaded config is used on hot-reload failure
- Properly cleanup resources occupied by already initialized aggregators
  when the current aggregator fails to be initialized
- Expose distinct vmagent_streamaggr_config_reload* metrics per each -remoteWrite.streamAggr.config
  This should simplify monitoring and debugging failed reloads
- Remove race condition at app/vminsert/common.MustStopStreamAggr when calling sa.MustStop() while sa
  could be in use at realoadSaConfig()
- Remove lib/streamaggr.aggregator.hasState global variable, since it may negatively impact scalability
  on system with big number of CPU cores at hasState.Store(true) call inside aggregator.Push().
- Remove fine-grained aggregator reload - reload all the aggregators on config change instead.
  This simplifies the code a bit. The fine-grained aggregator reload may be returned back
  if there will be demand from real users for it.
- Check -relabelConfig and -streamAggr.config files when single-node VictoriaMetrics runs with -dryRun flag
- Return back accidentally removed changelog for v1.87.4 at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3639
2023-03-31 22:30:38 -07:00
Max Golionko
59c350d0d2 fix: app/vmui/Dockerfile-web to reduce vulnerabilities (#4044)
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-ALPINE317-OPENSSL-3368755
- https://snyk.io/vuln/SNYK-ALPINE317-OPENSSL-3368755
- https://snyk.io/vuln/SNYK-ALPINE317-OPENSSL-5291795
- https://snyk.io/vuln/SNYK-ALPINE317-OPENSSL-5291795

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
2023-03-31 16:29:44 +02:00
Zakhar Bessarab
5e5fc66e3b docs/vmctl: add examples of URLs used for migration in different modes (#4042)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-30 17:21:36 +02:00
Roman Khavronenko
4a49577028 vmalert: use missingkey=zero for templating (#4040)
Replace empty labels with "" instead of "<no value>"
during templating, as Prometheus does.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4012

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-30 16:57:00 +04:00
Zakhar Bessarab
ec45f1bc5f lib/fs: verify response code when reading configuration over HTTP (#4036)
Verifying status code helps to avoid misleading errors caused by attempt to parse unsuccessful response.

Related issue: #4034

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-30 13:18:00 +02:00
Daria Karavaieva
0945a03843 Vmanomaly guide index fix (#4029)
* name and scrutture change

* fix indexing

* index fix

* name change

* line separator fix
2023-03-29 20:24:06 +02:00
Alexander Marshalov
ff72ca14b9 added hot reload support for stream aggregation configs (#3969) (#3970)
added hot reload support for stream aggregation configs (#3969)

Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-29 18:05:58 +02:00
Eliran Barnoy
9199c23720 Fix operator links to include VMPrometheusConverter for added visibility 2023-03-29 11:44:49 +02:00
Aliaksandr Valialkin
94cabf29b0 lib/flagutil: ArrayString: support commas inside quoted strings and inside [], {} and () braces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3915
2023-03-28 21:22:55 -07:00
Aliaksandr Valialkin
e094c8e214 docs: mention that VictoriaMetrics rounds time range to UTC days at /api/v1/labels, /api/v1/label/.../values and /api/v1/series handlers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3107
2023-03-28 17:00:17 -07:00
Aliaksandr Valialkin
7048a316aa lib/persistentqueue: typo fix after aea6df8197 2023-03-27 20:06:04 -07:00
Aliaksandr Valialkin
aea6df8197 app/vmagent/remotewrite: cosmetic updates after f3a51e8b1d
- Compare directory names instead of paths to directory when determining which persistent queues must be deleted
  This is less error-prone solution, since paths to the same directory can differ, which could lead
  to accidental directory removal for the existing -remoteWrite.url

- Log the `removed %d dangling queues` message when at least a single queue has been removed

- Consistently use filepath.Join() for creating paths to persistent queues.
  This is needed for Windows support (see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70 )

- Clarify the description of the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
2023-03-27 18:33:07 -07:00
Zakhar Bessarab
f3a51e8b1d app/vmagent: add -remoteWrite.removeDanglingQueues flag (#4017)
* app/vmagent: add `-remoteWrite.removeDanglingQueues` flag which allows to automatically remove dangling persistent queue contents

Related issue: #4014

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmagent: address review feedback

- remove persistent queues files by default
- rename `remoteWrite.removeDanglingQueues` to `remoteWrite.keepDanglingQueues`
- update docs to reflect changed behaviour

Related issue: #4014

* Apply suggestions from code review

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 18:15:28 -07:00
Nikolay
9b1e002287 app/vmselect: properly remove temp files at windows system (#4020)
With non-posix compliant systems it's not possible to remove unclosed files.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-27 18:10:15 -07:00
Aliaksandr Valialkin
02ee4ffd4d app/vmselect/promql: follow-up for 79e1c6a6fc
- Document the fix at docs/CHANGELOG.md
- Add tests with multiple adjancent zero buckets
- Simplify the fix a bit

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/296
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4021
2023-03-27 18:03:36 -07:00
Ze'ev Klapow
79e1c6a6fc fix le buckets when adjacent vmrange is empty (#4021)
There is a bug here where if you have a single bucket like:

foo{vmrange="4.084e+02...4.642e+02"} 2 123

The expected output is three le encoded buckets like:

foo{le="4.084e+02"} 0 123
foo{le="4.642e+02"} 2 123
foo{le="+Inf"} 2 123

This correctly encodes the start and end of the vmrange.
If however, the input contains the previous bucket, and that bucket is
empty then you only get the end le and +Inf out currently, i.e:

foo{vmrange="7.743e+05...8.799e+05"} 5 123
foo{vmrange="6.813e+05...7.743e+05"} 0 123

results in:

foo{le="8.799e+05"} 5 123
foo{le="+Inf"} 5 123

This causes issues when you go to compute a quantile because this means
that the assumed lower bound of the buckets is 0 and this we interpolate
between 0->end rather than the vmrange start->end as expected.
2023-03-27 17:54:19 -07:00
Aliaksandr Valialkin
9e02b3d48a vendor: make vendor-update 2023-03-27 15:28:02 -07:00
Aliaksandr Valialkin
622000797a app/vmselect: follow-up for 10ab086366
- Expose stats.seriesFetched at `/api/v1/query_range` responses too
  for the sake of consistency.

- Initialize QueryStats when it is needed and pass it to EvalConfig then.
  This guarantees that the QueryStats is properly collected when the query
  contains some subqueries.
2023-03-27 15:22:00 -07:00
Roman Khavronenko
4021aa11b5 app/vmselect: export seriesFetched stat for /query responses (#3925)
The change adds a new field `seriesFetched` to EvalConfig object.
Since EvalConfig object can be copied inside `Exec`,
`seriesFetched` is a pointer which can be updated by all copied
objects.

The reason for having stats is that other components, like vmalert,
could benefit from this information.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-27 15:18:25 -07:00
Yury Molodov
3214b1c315 vmui: heatmap (#3780)
* fix: add stroke and font for all axes

* feat: add util for generate gradient

* feat: add heatmap plugin

* feat: add heatmap legend

* feat: add heatmap graph (#3384)

* vmui: add heatmap graph (#3384)

* feat: add convert Prometheus to VictoriaMetrics histogram

* fix: prevent re-render graph

* feat: reset step for heatmap

* feat: normalize heatmap data

* fix: format heatmap legend

* wip

* app/vmselect/vmui: run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-26 00:30:02 -07:00
Aliaksandr Valialkin
92199c964c docs/MetricsQL.md: quote min, max and avg args for rollup_*() functions in order to reduce the level of confusion when users try to pass the second argument to these functions 2023-03-26 00:01:51 -07:00
Aliaksandr Valialkin
72a0b49330 docs/CHANGELOG.md: document v1.87.4 LTS release 2023-03-25 22:43:59 -07:00
Aliaksandr Valialkin
5832242b44 app/vmselect/netstorage: reduce the contention at fs.ReaderAt stats collection on systems with big number of CPU cores
This optimization is based on the profile provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966#issuecomment-1483208419
2023-03-25 16:37:07 -07:00
Aliaksandr Valialkin
a1e496ced6 app/vmselect/netstorage: document why runtime.Gosched() is removed at 28f054bb00
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-25 16:36:51 -07:00
Zakhar Bessarab
28f054bb00 vmselect/netstorage: remove direct calls to Gosched to reduce amount of locks for global scope
using `runtime.Gosched` requires acquiring global lock to check if there are any other goroutines to perform tasks. with the latest versions of runtime it can pause running goroutines automatically without requiring to call `Gosched` directly.

Updates #3966

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-25 16:34:03 -07:00
Aliaksandr Valialkin
811f4a9380 app/{vmbackup,vmrestore}: publish vmbackup and vmrestore binaries for Windows
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 15:08:21 -07:00
Aliaksandr Valialkin
c8f2febaa1 lib/storage: consistently use OS-independent separator in file paths
This is needed for Windows support, which uses `\` instead of `/` as file separator

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 14:33:58 -07:00
Aliaksandr Valialkin
36bbdd7d4b lib/mergeset: consistently use OS-independent separator in file paths
This is needed for Windows support, which uses `\` instead of `/` as file separator

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 13:39:41 -07:00
Aliaksandr Valialkin
b14d96618c all: follow-up after 34634ec357
- Use windows.FlushFileBuffers() instead of windows.Fsync() at streamTracker.adviseDontNeed()
  for consistency with implementations for other architectures.
- Use filepath.Base() instead of filepath.Split(), since the dir part isn't used.
  This simplifies the code a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 11:57:39 -07:00
Nikolay
34634ec357 lib/fs: adds memory map for windows (#3988)
This is a follow-up for 43b24164ef

* lib/fs: adds memory map for windows
it should improve performance for file reading

* lib/storage: replace '/' with os specific separator
it must fix an errors for windows

* lib/fs: mention windows fsync support

* lib/filestream: adds fdatasync for windows writes

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-25 11:43:19 -07:00
Aliaksandr Valialkin
2b851e69d2 app/vmselect/promql: typo fix after e7f46a0aab 2023-03-24 23:46:30 -07:00
Aliaksandr Valialkin
e7f46a0aab app/vmselect/promql: follow-up for 7205c79c5a
- Allocate and initialize seriesByWorkerID slice in a single go instead
  of initializing every item in the list separately.
  This should reduce CPU usage a bit.
- Properly set anti-false sharing padding at timeseriesWithPadding structure
- Document the change at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-24 23:34:37 -07:00
Zakhar Bessarab
7205c79c5a app/vmselect/promql: use lock-less approach to gather results of parallel processing for evalRollup* funcs (#4004)
* vmselect/promql: refactor `evalRollupNoIncrementalAggregate` to use lock-less approach for parallel workers computation

Locking there is causing issues when running on highly multi-core system as it introduces lock contention during results merge.

New implementation uses lock less approach to store results per workerID and merges final result in the end, this is expected to significantly reduce lock contention and CPU usage for systems with high number of cores.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: add pooling for `timeseriesWithPadding` to reduce allocations

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* vmselect/promql: refactor `evalRollupFuncWithSubquery` to avoid using locks

Uses same approach as `evalRollupNoIncrementalAggregate` to remove locking between workers and reduce lock contention.

Related: #3966
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-03-24 23:07:12 -07:00
Aliaksandr Valialkin
9436ae3b07 app/vmbackup: simplify code a bit after 5ba347bd2c
Unconditionally call deleteSnapshot() func just after making the snapshot, either successful or unsuccessful

Related issue: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2055
2023-03-24 22:03:14 -07:00
Zakhar Bessarab
5ba347bd2c app/vmbackup: delete created snapshot in case of error during backup (#4008)
Related issue: #2055

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-24 21:49:58 -07:00
Aliaksandr Valialkin
25446a7933 vendor: run make vendor-update 2023-03-24 18:08:06 -07:00
Aliaksandr Valialkin
ebc1caa5dc app/vmselect/vmui: run make vmui-update after dc2c712a29 2023-03-24 18:01:39 -07:00
Aliaksandr Valialkin
27f9a1eda2 docs/CHANGELOG.md: cosmetic fixes: remove trailing whitespace and consistently use -flag instead of --flag 2023-03-24 15:44:33 -07:00
Alexander Marshalov
7c86dcc4fa allowed using dashes and dots in environment variables names (#4009)
* allowed using dashes and dots in environment variables names for templating config files with envtemplate (#3999)

Signed-off-by: Alexander Marshalov <_@marshalov.org>

* Apply suggestions from code review

---------

Signed-off-by: Alexander Marshalov <_@marshalov.org>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-24 15:43:05 -07:00
Aliaksandr Valialkin
c1d871a45a docs/vmauth.md: follow-up for 36edba9bfb
- Document `-configCheckInterval` command-line flag in `quick start` section
- Clarify the addition of `-configCheckInterval` at docs/CHANGELOG.md

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3990
2023-03-24 13:22:37 -07:00
Aliaksandr Valialkin
54796f69db docs/vmagent.md: clarify that there is no need to specify multiple -remoteWrite.url options when writing data to a single VictoriaMetrics cluster when data replication is needed
Also add a link to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format from `getting started` section,
so users could quickly find how to write data to VictoriaMetrics cluster
2023-03-24 13:07:33 -07:00
Roman Khavronenko
365d2ff0bf vmalert: add anchor char to Group's link (#4006)
This should help users to see that Group's name is clickable
and used for anchoring.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:48:43 +01:00
Roman Khavronenko
8db6a71f83 vmalert: mention VMUI example for alert's source (#4005)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-24 09:40:55 +01:00
Roman Khavronenko
edeb56d208 docs: mention cluster URL for exporting series (#4002)
docs: mention cluster URL for exporting series

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-23 20:55:47 +01:00
Daria Karavaieva
24938872a6 Vmanomaly guide (#3834)
Setting up VMAnomaly on NodeExporter metrics with VictoriaMetrics and AlertManager.


* vmanomaly-guide-draft

* aletr graphs and description

* readme vmanomaly tutorial

* Added back fit_every param for performance

* vmanomaly guide fixes

* added spaces div

* spaces + resize image

* alert example grammar

* quotation marks

* docker link

* typo fixed

* more links

* reader section rephrased

* label change

* lower case for grafana service

* lower case for vm service

* yaml markdown

---------

Co-authored-by: Dima Lazerka <dima@victoriametrics.com>
2023-03-23 21:04:44 +02:00
Dmytro Kozlov
ba505dd357 docs: follow up after dc2c712a29 (#4001) 2023-03-23 18:27:55 +01:00
Dmytro Kozlov
dc2c712a29 app/vmui: update cardinality page (#3986)
vmui: update cardinality page

---------

Co-authored-by: Yury Moladau <yurymolodov@gmail.com>
2023-03-23 18:18:02 +01:00
Yury Molodov
023c65968f vmui: display errors for each query individually (#3987) (#3994) 2023-03-23 13:10:59 +01:00
Alexander Marshalov
36edba9bfb added configCheckInterval flag for vmauth (#3990) (#3991)
* added configCheckInterval flag for vmauth (#3990)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-23 09:34:12 +01:00
Nikolay
a2f716b6cc lib/netutil: log only parsing errors for proxy-protocol (#3985)
* lib/netutil: log only parsing errors for proxy-protocol

Previosly every error was logged. With configured TCP health checks at load-balancer or kubernetes, vmauth spams a lot of false positive error message into logs

* Update docs/CHANGELOG.md

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update lib/netutil/tcplistener.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-03-21 10:22:39 -07:00
Dmytro Kozlov
f0b09a1382 app/vmctl: follow up after aed59b9029 (#3983) 2023-03-21 15:53:53 +01:00
Aliaksandr Valialkin
6a78755b66 docs/vmagent.md: mention in docs that the target relabel debug page shows target url now 2023-03-20 22:20:03 -07:00
Dmytro Kozlov
e79cd24807 lib/promrelabel: make target url from labels on target relabel page (#3882)
* lib/promrelabel: make target url from labels on target relabel page

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-20 22:07:52 -07:00
Aliaksandr Valialkin
e480b9881e app/vmselect/promql: pass workerID to the callback inside doParallel()
This opens the possibility to remove tssLock from evalRollupFuncWithSubquery()
in the follow-up commit from @zekker6 in order to speed up the code
for systems with many CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:54:57 -07:00
Aliaksandr Valialkin
9e16329b2f app/vmselect/promql: fix TestIncrementalAggr test on systems less than 3 CPU cores
This is a follow-up for 4856a4cf5a
2023-03-20 20:37:18 -07:00
Aliaksandr Valialkin
70959d5dab app/vmselect/netstorage: reduce the number of calls to runtime.Gosched() at timeseriesWorker() and unpackWorker()
Call runtime.Gosched() only when there is a work to steal from other workers.
Simplify the timeseriesWorker() and unpackWroker() code a bit by inlining stealTimeseriesWork() and stealUnpackWork().

This should reduce CPU usage when processing queries on systems with big number of CPU cores.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
2023-03-20 20:31:02 -07:00
Aliaksandr Valialkin
4856a4cf5a app/vmselect: optimize incremental aggregates a bit
Substitute sync.Map with an ordinary slice indexed by workerID.
This should reduce the overhead when updating the incremental aggregate state
2023-03-20 15:37:06 -07:00
Aliaksandr Valialkin
8622dee4b5 app/vmselect/vmui: make vmui-update after d4525bd2d0 2023-03-20 14:35:03 -07:00
Aliaksandr Valialkin
8d709f3483 docs/CHANGELOG.md: cosmetic fixes 2023-03-20 14:14:20 -07:00
Aliaksandr Valialkin
91533531f5 docs/Troubleshooting.md: document an additional case, which could result in slow inserts
If `-cacheExpireDuration` is lower than the interval between ingested samples for the same time series,
then vm_slow_row_inserts_total` metric is increased.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183
2023-03-20 13:28:36 -07:00
Roman Khavronenko
3283f0dae4 vmalert: support logs suppressing during config reloads (#3973)
* vmalert: support logs suppressing during config reloads

The change is mostly required for ENT version of vmalert,
since it supports object-storage for config files.
Reading data from object storage could be time-consuming,
so vmalert emits logs to track the progress.

However, these logs are mostly needed on start or on
manual config reload. Printing these logs each time
`rule.configCheckInterval` is triggered would too verbose.
So the change allows to control logs emitting during
config reloads.

Now, logs are emitted during start up or when SIGHUP is receieved.
For periodicall config checks logs emitted by config pkg are suppressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: review fixes

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-20 16:08:30 +01:00
Dmytro Kozlov
8da9502df6 app/vmctl: automatically check tty (#3938)
app/vmctl: automatically detect if TTY is available
2023-03-20 11:16:08 +01:00
Yury Molodov
d4525bd2d0 vmui: support for drag'n'drop in the "Trace analyzer" page (#3971)
vmui: add drag-and-drop support for the trace analyzer page
2023-03-20 11:07:18 +01:00
dependabot[bot]
cc67eb4ff3 build(deps): bump actions/setup-go from 3 to 4 (#3962)
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3 to 4.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](https://github.com/actions/setup-go/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-20 09:23:24 +01:00
Yury Molodov
a2af2e5a1b vmui: improve usability of date/time picker (#3968)
* vmui: allow manually set input date and time
* vmui/docs: improve usability of date/time picker
2023-03-20 09:22:49 +01:00
Dmytro Kozlov
5c92022cc6 lib/storage: fix collect downsampling metrics (#489)
* lib/storage: fix downsampling

* lib/storage: update logic

* lib/storage: fix comments, removed unneeded check
2023-03-19 23:34:46 -07:00
Aliaksandr Valialkin
43b24164ef all: add Windows build for VictoriaMetrics
This commit changes background merge algorithm, so it becomes compatible with Windows file semantics.

The previous algorithm for background merge:

1. Merge source parts into a destination part inside tmp directory.
2. Create a file in txn directory with instructions on how to atomically
   swap source parts with the destination part.
3. Perform instructions from the file.
4. Delete the file with instructions.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since the remaining files with instructions is replayed on the next restart,
after that the remaining contents of the tmp directory is deleted.

Unfortunately this algorithm doesn't work under Windows because
it disallows removing and moving files, which are in use.

So the new algorithm for background merge has been implemented:

1. Merge source parts into a destination part inside the partition directory itself.
   E.g. now the partition directory may contain both complete and incomplete parts.
2. Atomically update the parts.json file with the new list of parts after the merge,
   e.g. remove the source parts from the list and add the destination part to the list
   before storing it to parts.json file.
3. Remove the source parts from disk when they are no longer used.

This algorithm guarantees that either source parts or destination part
is visible in the partition after unclean shutdown at any step above,
since incomplete partitions from step 1 or old source parts from step 3 are removed
on the next startup by inspecting parts.json file.

This algorithm should work under Windows, since it doesn't remove or move files in use.
This algorithm has also the following benefits:

- It should work better for NFS.
- It fits object storage semantics.

The new algorithm changes data storage format, so it is impossible to downgrade
to the previous versions of VictoriaMetrics after upgrading to this algorithm.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3236
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3821
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2023-03-19 01:36:51 -07:00
Aliaksandr Valialkin
6460475e3b lib/{mergeset,storage}: prevent from long wait time when creating a snapshot under high data ingestion rate
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3873
2023-03-19 00:15:30 -07:00
Aliaksandr Valialkin
6d56149b9f deployment/docker/Makefile: properly add amd64 suffix to windows binary names 2023-03-18 23:29:44 -07:00
Aliaksandr Valialkin
4ea27d6f6a deployment/docker/Makefile: build CGO-enabled vmagent for GOARCH=arm64
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2271

This is a follow-up for 565497fb074321caedea38d5151044d98d92d759
2023-03-18 23:15:31 -07:00
Aliaksandr Valialkin
f3c7302772 SECURITY.md: update the list of VictoriaMetrics versions, which support security updates 2023-03-18 12:28:17 -07:00
Aliaksandr Valialkin
a26c6628fd lib/{fs,mergeset,storage}: substitute os.Open()+os.File.Readdir() with os.ReadDir()
This simplifies code a bit
2023-03-17 21:03:37 -07:00
Roman Khavronenko
8fdd613f25 Vmalert tests (#3975)
* vmalert: add tests for notifier pkg

* vmalert: add tests for remotewrite pkg

* vmalert: add tests for template functions

* vmalert: add tests for web pages

* vmalert: fix int overflow in tests

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-17 15:57:24 +01:00
Alexander Marshalov
57b00bafc9 updated api doc for operator (#3972)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-17 10:42:30 +01:00
Zakhar Bessarab
ac3043ff74 doc/vmgateway-grafana-openid-guide: fix formatting, add reproducible example and example results (#3964) 2023-03-17 09:57:10 +01:00
Roman Khavronenko
d3608be313 alerts: add TooManyTSIDMisses alerting rule (#3959)
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3502#issuecomment-1358374954

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-17 09:46:51 +01:00
Aliaksandr Valialkin
fdbb819195 all: typo fix in the same way as in e566d49e3a: 8248 -> 8428 2023-03-16 22:06:38 -07:00
oliverpool
fbefc940ef app/vmselect/promql: add test to ensure 8-byte alignment (#3948)
See 0af9e2b693
2023-03-16 09:01:42 -07:00
Pavel Skuratovich
e566d49e3a Fix a typo in README.md so reload scrape config command can be copy-pasted 2023-03-15 22:41:31 +01:00
Artem Navoiev
f8a2a3784b managed quickstart fix typo
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-15 22:38:59 +01:00
Artem Navoiev
20aa707979 fix anchor after chaning manager quick start
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-15 22:32:20 +01:00
Aliaksandr Valialkin
ddbbc9a86d vendor: make vendor-update 2023-03-15 13:24:12 -07:00
Nikolay
91cbb9063d Vmagent kafka updates (#535)
* app/vmagent: allow vm proto for kafka consumer and producer
it should reduce network usage up to 50%.
According to benchmarks without any encoding at kafka topic, it reduces traffic up to 50%.
With enabled zstd at kafka topic, it shows no diffence in traffic. So it
doesn't make much sense to use it.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225

* mention eb61a7dd68b834b08d01727a918f207700348ada at changelog

* app/vmagent: bumps kafka lib version
it allows compiling vmagent for arm64 machines
fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2271

* mention d19b1a888248c96cfd7ccee00ba6f596d89be1d7 at change log

* app/vmagent: adds natural concurrency for kafka consumer
it should improve performance for data consumption
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1957

* mention change 0c143bb22ca2e7e0b7eec9bc84a94ee2b41626ca

* Update app/vmagent/kafka/consumer.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* Update app/vmagent/kafka/consumer_cgo.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-03-15 13:03:44 -07:00
Alexander Marshalov
55afae8641 updated vars doc for operator (#3960)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-15 17:28:01 +01:00
dmitryk-dk
0691e115b1 docs: cleanup 2023-03-15 11:54:59 +01:00
dmitryk-dk
32266aaea2 app/vmctl: update managed quickstart guide 2023-03-15 11:54:59 +01:00
Aliaksandr Valialkin
a11ac9648c vendor: make vendor-update 2023-03-14 16:19:43 -07:00
Aliaksandr Valialkin
90e1818068 vendor: update github.com/klauspost/compress from v1.16.0 to v1.16.3 2023-03-14 16:14:25 -07:00
Zakhar Bessarab
8f6d5217d1 doc: add guide for vmgateway configuration with OpenID and Grafana (#3951)
docs: add guide for vmgateway configuration with OpenID and Grafana
2023-03-14 16:19:29 +01:00
Zakhar Bessarab
6a5d236245 lib/storage: log original labels set when label value is truncated (#3952)
lib/storage: log original labels set when label value is truncated
2023-03-14 10:59:40 +01:00
Dmytro Kozlov
4d68f5b1fc app/vmctl: integration test for native protocol (#3947)
* app/vmctl: integration test for native protocol

* app/vmctl: implemented two integration tests

* app/vmctl: cleanup

* app/vmctl: split storage init and filling data logic

* app/vmctl: cleanup

* app/vmctl: remove storage from server, used initialization process

* app/vmctl: prepare for parallel run, code cleanup

* app/vmctl: code cleanup

* app/vmctl: remove unused field
2023-03-14 09:55:49 +01:00
Aliaksandr Valialkin
6f6333831e docs/Single-server-VictoriaMetrics.md: clarify that the cache directory can be removed manually when VictoriaMetrics is stopped 2023-03-13 00:23:40 -07:00
Aliaksandr Valialkin
3e7bfe1200 docs/CHANGELOG.md: document v1.87.3 2023-03-13 00:20:51 -07:00
Aliaksandr Valialkin
02ffe05750 docs/CHANGELOG.md: document v1.79.11 LTS release 2023-03-12 23:22:53 -07:00
Aliaksandr Valialkin
b9e79250b3 deployment: update VictoriaMetrics release from v1.88.0 to v1.89.1
See https://docs.victoriametrics.com/CHANGELOG.html#v1891
2023-03-12 20:05:03 -07:00
Aliaksandr Valialkin
388d6ee16e docs/CHANGELOG.md: cut v1.89.1 2023-03-12 19:14:19 -07:00
Aliaksandr Valialkin
e8225d7d6b app/vmselect/promql: prevent from cannot unmarshal timeseries from rollupResultCache panic after the upgrade to v1.89.0
The issue has been introduced in 0af9e2b693
2023-03-12 19:09:39 -07:00
Aliaksandr Valialkin
911bab4f6a docs/CHANGELOG.md: cut v1.89.0 2023-03-12 17:29:44 -07:00
Aliaksandr Valialkin
1428aa2c22 app/vmselect/vmui: make vmui-update after 00a0816ab1 2023-03-12 17:19:19 -07:00
Yury Molodov
00a0816ab1 vmui: predefined dashboards docs (#3895)
* fix: correct display predefined panels

* docs: update the documentation for predefined dashboards
2023-03-12 17:16:26 -07:00
Aliaksandr Valialkin
be68a6a1ee Makefile: update golangci-lint from v1.51.1 to v1.51.2
See https://github.com/golangci/golangci-lint/releases/tag/v1.51.2
2023-03-12 17:08:19 -07:00
Aliaksandr Valialkin
468de76e9a app/vmselect: remove data race on updating EvalConfig.IsPartialResponse from concurrently running goroutines
This properly returns `is_partial: true` for partial responses.
2023-03-12 16:54:08 -07:00
Aliaksandr Valialkin
0af9e2b693 app/vmselect/promql: prevent from SIGBUS crash on architecures, which deny unaligned access to 8-byte words (e.g. ARM)
Thanks to @oliverpool for nailing down the root cause of the issue and for the initial attempt to fix it
at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3927
2023-03-12 16:32:08 -07:00
Artem Navoiev
7257a2a97f fix typos on image
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-12 17:12:57 +01:00
Aliaksandr Valialkin
c28c25ed2e vendor: make vendor-update 2023-03-12 03:13:53 -07:00
Yury Molodov
01367faa39 vmui: remove send step param for instant queries (#3931)
* fix: remove step param for instant queries (#3896)

* vmui: remove send step param for instant queries

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-03-12 03:09:56 -07:00
Aliaksandr Valialkin
a52413ce0a docs/CHANGELOG.md: document 113a89904d 2023-03-12 01:58:18 -08:00
Aliaksandr Valialkin
b19de3fa12 docs/CHANGELOG.md: yet another typo fix 2023-03-12 01:06:40 -08:00
Aliaksandr Valialkin
2f1d24fccf docs/CHANGELOG.md: typo fix 2023-03-12 01:04:14 -08:00
Aliaksandr Valialkin
b5db69fe05 app/vmselect/netstorage: do not intern string representation of MetricName for time series received from vmstorage
It has been appeared that this interning may lead to increased memory usage and increased CPU usage
when vmselect performs queries, which select big number of time series.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
2023-03-12 00:52:35 -08:00
Aliaksandr Valialkin
babc9e9815 docs/CHANGELOG.md: document 927d9da270 2023-03-12 00:25:00 -08:00
Aliaksandr Valialkin
d95037a175 app/vmctl/README.md: remove trailing space from the line added at 4c3bc04efa 2023-03-12 00:11:51 -08:00
Aliaksandr Valialkin
e3488c6cbc docs/CHANGELOG.md: typo fixes 2023-03-12 00:09:26 -08:00
Aliaksandr Valialkin
48e32b325e docs/CHANGELOG.md: document c9f44daaee8f4282d9ed41e3ba799c7a33841313 2023-03-11 23:55:13 -08:00
Roman Khavronenko
856c2db144 vmalert: support concurrent reading from object storage (#532)
* vmalert: support concurrent reading from object storage

Config reading from GCS or S3 can be slow if object storage
contains a big number of files. Object storages are usually
fast for downloading and are slow for individual operations.
If there would be thousands of files to read, vmalert could
spend significant time for retrieving those because it is
done sequentially.

The change introduces ability to read configs from object
storage concurrently. By default, both GCS and S3 are now
read with 50 concurrent readers. This significantly reduces
the load time:
* loading 500 files with concurrency=1 takes 27s
* loading 500 files with concurrency=50 takes <1s

* vmalert: add note to Changelog

* vmalert: cleanup

* vmalert: use ticker properly

* app/vmalert: improve status reporting during config loading

* vmalert: support concurrent reading from object storage

Config reading from GCS or S3 can be slow if object storage
contains a big number of files. Object storages are usually
fast for downloading and are slow for individual operations.
If there would be thousands of files to read, vmalert could
spend significant time for retrieving those because it is
done sequentially.

The change introduces ability to read configs from object
storage concurrently. By default, both GCS and S3 are now
read with 50 concurrent readers. This significantly reduces
the load time:
* loading 500 files with concurrency=1 takes 27s
* loading 500 files with concurrency=50 takes <1s

* app/vmalert: make linter happy
2023-03-11 23:51:23 -08:00
Nikolay
927d9da270 lib/storage: correctly handle io.EOF error for pre-fetched metrics (#3946)
io.EOF shouldn't be returned from this function. It breaks all search
API logic and may result in empty query results.
2023-03-11 23:29:43 -08:00
Alexander Marshalov
c0c3dc02cf Stream aggregation doc improvements based on users feedback (#3934)
docs: stream aggregation doc improvements based on users feedback
2023-03-10 21:39:58 +01:00
Roman Khavronenko
3eebe52a06 Dashboards upd (#3942)
* dashboards/cluser: use `quantile` since `median` isn't supported by PromQL

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/*: add `restarts` annotation to show when there were restarts

The cluster's annotation query is aggregated `by job`,
while vmagent/vmalert are aggregated `by job, instance`.
This is because cluster dashboard can contains too many instances
and annotation could become too noisy.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* dashboards/*: support instance filter in Version annotation

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-10 17:13:19 +01:00
Zakhar Bessarab
f7a7eb8f3e docs: add a note about cache reset for vmalert backfilling docs (#3940)
docs: add a note about cache reset for vmalert backfilling docs
2023-03-10 13:45:11 +01:00
Dmytro Kozlov
4c3bc04efa app/vmctl: update importing tips when migrating data with overlapping time range (#3941)
app/vmctl: update importing tips when migrating data with overlapping time range
2023-03-10 11:29:08 +01:00
Dmytro Kozlov
3c9058c168 app/vmctl: add support of basic auth and barer token (#3921)
app/vmctl: add support of basic auth and bearer token
2023-03-09 14:53:29 +01:00
Roman Khavronenko
d66bae212b app/vmalert: log number of configration files found for each specified -rule (#3936)
The change also introduces `List` method to `FS` interface.
The `List` method can be used for wildcard support in object storage FS.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Nikolay <nik@victoriametrics.com>
2023-03-09 14:46:19 +01:00
Dmytro Kozlov
7f54c181bb app/vmctl: follow up after 09e3742a82 (#3937)
app/vmctl: follow up after 09e3742a82
2023-03-09 13:28:55 +01:00
Roman Khavronenko
3de7fc5c71 security: bump go version to 1.20.2 (#3935)
upgrade Go builder from Go1.20.1 to Go1.20.2
See the list of issues addressed in Go1.20.2 here (https://github.com/golang/go/issues?q=milestone%3AGo1.20.2+label%3ACherryPickApproved).

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-09 13:20:54 +01:00
Gowtam Lal
09e3742a82 app/vmctl: Allow vmnative exports to skip HTTP keepalive. (#3909)
app/vmctl: support HTTP keepalive disabling for vm-native mode
2023-03-09 09:47:46 +01:00
Denys Holius
cbba6bd3db Adds snap badge to README.md (#3930)
docs: adds snap badge to README.md
2023-03-08 16:31:58 +01:00
Aliaksandr Valialkin
1b5dc9f91d all: follow-up for 7a3e16e774
- Sync the description for -httpListenAddr.useProxyProtocol command-line flag at vmagent and vmauth,
  so it is consistent with the description at vmauth and victoria-metrics
- Add a sample of panic text to docs/CHANGELOG.md, so it could be googled
- Mention the -httpListenAddr.useProxyProtocol command-line flag in the description for the bugfix

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-08 01:26:55 -08:00
Aliaksandr Valialkin
05709bdfae app/vmselect/vmui: make vmui-update after bbf8e459a0 2023-03-08 01:15:52 -08:00
Aliaksandr Valialkin
ed73317622 docs/CHANGELOG.md: improve description for 4b136abff8 2023-03-08 01:05:33 -08:00
Aliaksandr Valialkin
70b831e684 docs/CHANGELOG.md: improve the description of the bugfix at 62beea23f7
- Make the description easier to read by humans :)
- Add a link to VictoriaMetrics datasource plugin for Grafana, so users could easily discover it
2023-03-08 00:59:42 -08:00
Aliaksandr Valialkin
6e8e64f695 docs/CHANGELOG.md: clarify the description for 6bfe9cc733
- Add the panic message to the description, so it is easier to google
- Add a link to the corresponding bugreport

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-08 00:39:34 -08:00
Aliaksandr Valialkin
138757629b app/vmctl/README.md: remove trailing space after cc5b916237 2023-03-08 00:32:00 -08:00
Aliaksandr Valialkin
9e71462cee all: typo fixes of the same type as in the d056be710b 2023-03-08 00:30:21 -08:00
Aliaksandr Valialkin
884e58d58d docs/CHANGELOG.md: clarify the description for the change at 8bab50dc29
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3600
2023-03-08 00:23:35 -08:00
Aliaksandr Valialkin
1bb529e23e app/vmagent/remotewrite: follow-up for e3a756d82869f8c357b072f6e635ebfc7d65dd2c
- Document the fix
- Move the detection of VictoriaMetrics remoteWrite protocol from client.init() to newHTTPClient()
  This simplifies the fix to the following diff:

diff --git a/app/vmagent/remotewrite/client.go b/app/vmagent/remotewrite/client.go
index 099899c19..70b904af4 100644
--- a/app/vmagent/remotewrite/client.go
+++ b/app/vmagent/remotewrite/client.go
@@ -151,10 +151,6 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
        }
        c.sendBlock = c.sendBlockHTTP

-       return c
-}
-
-func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
        useVMProto := forceVMProto.GetOptionalArg(argIdx)
        usePromProto := forcePromProto.GetOptionalArg(argIdx)
        if useVMProto && usePromProto {
@@ -173,6 +169,10 @@ func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
        }
        c.useVMProto = useVMProto

+       return c
+}
+
+func (c *client) init(argIdx, concurrency int, sanitizedURL string) {
2023-03-07 23:54:24 -08:00
Dmytro Kozlov
66bf1987bf app/vmagent: fix panic if auth config not defined (#530) 2023-03-07 23:51:30 -08:00
Aliaksandr Valialkin
202083f38c docs/CHANGELOG.md: document ec2abf9b69 2023-03-07 23:33:19 -08:00
Alexander Marshalov
ccb08fc28b added documentation about new templates field of vmalertmanager specification in operator (https://github.com/VictoriaMetrics/operator/issues/592) (#3924)
Signed-off-by: Alexander Marshalov <_@marshalov.org>
2023-03-07 17:51:26 +01:00
Nikolay
7a3e16e774 lib/netutil: fixes panic at proxy protocol (#3905)
it may occur if non proxy protocol message received by tcp server.
Listener Accept method must return only non-recoverable errors.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3335
2023-03-07 08:50:18 -08:00
Yury Molodov
bbf8e459a0 vmui: fix display of selected value in the selector (#3919)
vmui: fix selected value in dropdowns for Explore page
2023-03-07 16:23:02 +01:00
Nikolay
02f13d5681 docs: updates operator api.md (#3922) 2023-03-07 09:09:05 +01:00
Roman Khavronenko
2472baa934 app/vmalert: do not wait for group start on removal (#3891)
Each group in vmalert starts with an artifical delay to avoid
thundering herd problem. For some groups with high evaluation
intervals, the delay could be significant.
If during this delay user will remove the group from the config
and hot-reload it - vmalert will have to wait until the delay
ends. This results into slow config reloading and UI hang.

The change moves the start-delay logic back to the group's
`start` method. Now, group can immediately exit from the
delay when `group.close()` method is called.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-06 14:04:43 +01:00
Craig Rodrigues
1b194bc6de docs: move installation methods further up in README (#3904) 2023-03-06 12:44:33 +01:00
Dmytro Kozlov
cc5b916237 docs: follow up after 4b136abff8 (#3918)
docs: follow up after 4b136abff8
2023-03-06 12:41:48 +01:00
Alexander Marshalov
75977a4d02 added doc for placeholder support in vmagent specification for operator (https://github.com/VictoriaMetrics/operator/issues/592) (#3916) 2023-03-06 11:58:31 +01:00
Gowtam Lal
4b136abff8 app/vmctl: Add ability to set headers for vm-native HTTP requests. (#3906)
app/vmctl: Add ability to set headers for vm-native HTTP requests
2023-03-06 11:22:31 +01:00
Roman Khavronenko
95dc65e7b3 docs: follow-up after 62beea23f7 (#3907)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 22:45:19 +01:00
Roman Khavronenko
a0cbef1c46 github: fix validation errors (#3903)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 16:28:47 +01:00
Roman Khavronenko
4e8de26fec docs: follow-up e781e22c9c (#3902)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 16:15:15 +01:00
Roman Khavronenko
0d8f80ce90 github: add a Question issue type (#3901)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-03 16:11:06 +01:00
Yury Molodov
62beea23f7 vmui: show query error (#3890)
* add links support with old query params
* show error after execute query
2023-03-03 16:07:47 +01:00
Artem Makhortov
943243321a doc: vmctl vm-native-step-interval supported values (#3899) 2023-03-03 16:05:33 +01:00
Nikolay
6bfe9cc733 lib{mergset,storage}: prevent possible race condition with logging st… (#3900)
lib{mergset,storage}: prevent possible race condition with logging stats for merges

Previously partwrapper could be release by background process and reference for part may be invalid 
during logging stats. It will lead to panic at vmstorage
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3897
2023-03-03 12:33:42 +01:00
Haleygo
d056be710b fix some typo (#3898) 2023-03-03 11:02:13 +01:00
Artem Navoiev
de621c0cb7 remove image width for vmalert managed vm guide
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-02 21:51:34 +01:00
Artem Navoiev
14ef5311f0 change title
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-02 18:06:43 +01:00
Artem Navoiev
5872b5711d add vmalert managed vm integration guide
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-03-02 18:04:17 +01:00
Dmytro Kozlov
8bab50dc29 app/vmctl: add backoff retries to native protocol (#3859)
app/vmctl: vm-native - split migration on per-metric basis

`vm-native` mode now splits the migration process on per-metric basis. 
This allows to migrate metrics one-by-one according to the specified filter. 
This change allows to retry export/import requests for a specific metric and provides a better 
understanding of the migration progress.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: hagen1778 <roman@victoriametrics.com>
2023-03-02 13:19:45 +01:00
Roman Khavronenko
d6fa4da712 vmalert: cancel in-flight requests on group's update or close (#3886)
When group's update() or close() method is called, the group
still need to wait for its current evaluation to finish.
Sometimes, evaluation could take a significant amount of time
which slows configuration update or vmalert's graceful shutdown.

The change interrupts current evaluation in order to speed up
the graceful shutdown or config update procedures.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-03-01 15:48:20 +01:00
Roman Khavronenko
2e153b68cd dashboards: account for indexdb size in Bytes-per-Point panel (#3884)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-28 17:47:52 +01:00
Roman Khavronenko
3814747e94 deployment/docker: fix typo (#3883)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-28 14:29:28 +01:00
Dmytro Kozlov
ec2abf9b69 app/vmctl: Increase http request timeout made by remote read client, add importing tips (#3879)
app/vmctl: Increase default http request timeout made by remote read client
2023-02-28 09:50:33 +01:00
Aliaksandr Valialkin
06854738b6 .github/workflows/check-licenses.yml: use the correct version of Go - 1.20.1 - instead of 1.21.0 2023-02-27 19:25:13 -08:00
Aliaksandr Valialkin
95e1173423 docs/CHANGELOG.md: document v1.79.10 release 2023-02-27 17:35:48 -08:00
Aliaksandr Valialkin
8937de5f99 vendor: make vendor-update 2023-02-27 15:32:45 -08:00
Aliaksandr Valialkin
8288e327ee docs/CHANGELOG.md: cut v1.88.1 2023-02-27 15:28:18 -08:00
Aliaksandr Valialkin
dfe3939665 docs/CHANGELOG.md: link to the issue, which may benefit from -internStringDisableCache command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3863
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
2023-02-27 14:55:40 -08:00
Aliaksandr Valialkin
46127b432d lib/bytesutil: add -internStringDisableCache and -internStringCacheExpireDuration command-line flags
This commit is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3872
2023-02-27 14:16:49 -08:00
Aliaksandr Valialkin
0d3f31f60e lib/storage: follow-up for 39cdc546dd
- Use flag.Duration instead of flagutil.Duration for -snapshotCreateTimeout,
  since the flagutil.Duration is intended mostly for big durations, e.g. days, months and years,
  while the -snapshotCreateTimeout is usually smaller than one hour.
- Add links to https://docs.victoriametrics.com/#how-to-work-with-snapshots in docs/CHANGELOG.md,
  so readers could easily find the corresponding docs when reading the changelog.
- Properly remove all the created directories on unsuccessful attempt to create
  snapshot in Storage.CreateSnapshot().

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
2023-02-27 13:07:38 -08:00
Zakhar Bessarab
39cdc546dd lib/storage: enhancements for snapshots process (#3873)
* lib/{fs,mergeset,storage}: skip `.must-remove.` dirs when creating snapshot (#3858)

* lib/{mergeset,storage}: add timeout configuration for snapshots creation, remove incomplete snapshots from storage

* docs: fix formatting

* app/vmstorage: add metrics to track status of snapshots

* app/vmstorage: use `vm_http_requests_total` metric for snapshot endpoints metrics, rename new flag to make name more clear

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: update flag name in docs

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* app/vmstorage: reflect new metrics names change in docs

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 12:12:03 -08:00
Zakhar Bessarab
5fadd58cf6 lib/promscrape: correctly register vm_promscrape_config_* metrics (#3876)
* lib/promscrape: set `vm_promscrape_config_last_reload_successful` to 1 if there was no promscrape config provided

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: register `vm_promscrape_config_*` metrics only in case promscrape config is used

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 11:53:53 -08:00
Zakhar Bessarab
ac6d937372 doc: add changelog reference for vmgateway OpenID discovery (#3877)
* doc: add changelog reference for vmgateway OpenID discovery

* doc: add vmgateway docs

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-27 11:50:22 -08:00
Aliaksandr Valialkin
b3bb18d674 app/vmselect/promql: fix panic when calculating aggr_func(rollup*())
The panic has been introduced in dac21d874b
2023-02-27 11:48:27 -08:00
Zakhar Bessarab
ac4c7adec6 app/vmgateway: add new flag doc
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-27 11:18:34 -08:00
Zakhar Bessarab
5d7da8479f app/vmgateway: fix typo in docs
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-27 11:18:08 -08:00
Zakhar Bessarab
4ee73f54a6 app/vmgateway: add OpenID discovery of JWKS endpoints
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-27 11:17:39 -08:00
Aliaksandr Valialkin
23871fb0bf app/vmagent: add -remoteWrite.vmProtoCompressLevel command-line flag for tuning the compression level for VictoriaMetrics remote write protocol 2023-02-27 11:03:49 -08:00
Aliaksandr Valialkin
1a6f2f07fd lib/httpserver: use github.com/klauspost/compress/gzhttp for compressing http responses
This allows removing gzip-related code from lib/httpserver.
2023-02-27 10:33:43 -08:00
Dmytro Kozlov
27c9446520 app/vmctl: skip series if measurement not found (#3869)
app/vmctl: skip measurements with no fields for influxdb mode
2023-02-27 14:28:47 +01:00
Dmytro Kozlov
bcbf73225e app/vmctl: enable version flag (#3868) 2023-02-27 10:30:53 +01:00
Aliaksandr Valialkin
255a0cf635 all: add makefile rules for GOARCH=s390x for all the VictoriaMetrics components
This is a follow-up for 007530f882
2023-02-26 12:36:51 -08:00
v1gnesh
007530f882 Option to build for s390x (Linux on IBM Z) (#3870)
Added `victoria-metrics-linux-s390x` to allow single node builds for `s390x` platform.

Leaving other packaging options at the moment, as on this platform, they're mostly going to be built from/with container images hosted within the company as a base, and not alpine.
2023-02-26 12:29:50 -08:00
Aliaksandr Valialkin
f7ef80aaad .golangci.yml: properly enable revive linter and fix all the warnings it detects 2023-02-26 12:18:59 -08:00
Aliaksandr Valialkin
ffa327d6d1 app/vmagent: use the provided auth options when checking whether the remote storage supports VictoriaMetrics remote write protocol
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-26 12:07:47 -08:00
Aliaksandr Valialkin
eece61a611 deployment/marketplace: update VictoriaMetrics release from v1.87.1 to v1.88.0 2023-02-24 18:58:39 -08:00
Aliaksandr Valialkin
0c016279d5 deployment/docker: update VictoriaMetrics docker tag from v1.87.1 to v1.88.0 2023-02-24 18:57:31 -08:00
Aliaksandr Valialkin
2d36dbcfa9 docs/CHANGELOG.md: cut v1.88.0 2023-02-24 17:54:21 -08:00
Aliaksandr Valialkin
8cfe4064b5 vendor: make vendor-update 2023-02-24 17:26:51 -08:00
Aliaksandr Valialkin
b4bf8dc2f0 docs: update -help output after e1c3267e34 2023-02-24 17:16:13 -08:00
Roman Khavronenko
e1c3267e34 vmselect/promql: check for deadline in count_values fn (#3806)
* vmselect/promql: check for deadline in `count_values` fn

`count_values` could be very slow during the data processing.
Checking for deadline between iterations supposed to reduce
probability of exceeding `search.maxQueryDuration`.

The change also adds a new trace record, which captures the time
spent in aggregation function. Before that, the trace for aggr funcs
could be confusing since it doesn't account for all the places where
time was spent.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 16:59:26 -08:00
Aliaksandr Valialkin
c33cc4322c docs/CHANGELOG.md: document v1.87.2 release 2023-02-24 16:14:28 -08:00
Aliaksandr Valialkin
6a88d5402d docs/CHANGELOG.md: document v1.79.9 release 2023-02-24 15:10:48 -08:00
Roman Khavronenko
dac21d874b metricsql: support optional 2nd argument for rollup functions (#3841)
* metricsql: support optional 2nd argument for rollup functions

Support optional 2nd argument `min`, `max` or `avg` for rollup functions:
 * rollup
 * rollup_delta
 * rollup_deriv
 * rollup_increase
 * rollup_rate
 * rollup_scrape_interval

 If second argument is passed, then rollup function will return only the selected aggregation type.
 This change can be useful for situations where only one type of rollup calculation is needed.
 For example, `rollup_rate(requests_total[5m], "max")`.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-24 13:47:52 -08:00
Aliaksandr Valialkin
87aeeec3e8 docs/CHANGELOG.md: document d8eaa511b0 2023-02-24 12:42:02 -08:00
Zakhar Bessarab
d8eaa511b0 lib/{fs,mergeset,storage}: skip .must-remove. dirs when creating snapshot (#3858) (#3867) 2023-02-24 12:38:42 -08:00
Aliaksandr Valialkin
a2340e6c95 docs/CHANGELOG.md: typo fix: scrape scrape -> scrape 2023-02-24 12:33:29 -08:00
Aliaksandr Valialkin
c6ad3692ad lib/promscrape: follow-up for 43e104a83f
- Return immediately on context cancel during the backoff sleep.
  This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747

- Add a comment describing why the second attempt to obtain the response from remote side
  is perfromed immediately after the first attempt.

- Remove fasthttp dependency from lib/promscrape/discoveryutils

- Set context deadline before calling doRequestWithPossibleRetry().
  This simplifies the doRequestWithPossibleRetry() a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3293
2023-02-24 12:20:42 -08:00
Zakhar Bessarab
43e104a83f fix: do not use exponential backoff for first retry of scrape request (#3824)
* fix: do not use exponential backoff for first retry of scrape request (#3293)

* lib/promscrape: refactor `doRequestWithPossibleRetry` backoff to simplify logic

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update lib/promscrape/client.go

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>

* lib/promscrape: refactor `doRequestWithPossibleRetry` to make it more straightforward

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-24 11:39:56 -08:00
Aliaksandr Valialkin
c87c7d1e29 app/vmselect/promql: measure the time required for calculating the aggregate function from the prepared source time series 2023-02-23 20:05:14 -08:00
Aliaksandr Valialkin
8b7a828c65 app/vmselect/vmui: make vmui-update after d4fc0ed874 2023-02-23 19:25:52 -08:00
Yury Molodov
d4fc0ed874 vmui: improve mobile ui (#3848)
* feat: improve mobile ui

* feat: improve mobile ui

* fix: change style server url

* fix: improve ExploreMetrics mobile

* fix: display global settings on all pages
2023-02-23 19:18:49 -08:00
Aliaksandr Valialkin
a02cf92fd1 docs: update --help descriptions after recent changes 2023-02-23 19:02:27 -08:00
Aliaksandr Valialkin
c734416f86 lib/protoparser: fix golangci-lint warning after f579cac297 2023-02-23 18:50:34 -08:00
Aliaksandr Valialkin
b285207aa7 app/vmselect: add -search.logQueryMemoryUsage command-line flag for logging queries, which take big amounts of memory
Thanks to @michal-kralik for initial attempts for this feature:

- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3651
- https://github.com/VictoriaMetrics/VictoriaMetrics/pull/3715

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3553
2023-02-23 18:47:08 -08:00
Aliaksandr Valialkin
c080443fef app/vmagent: automatically detect whether the remote storage supports VictoriaMetrics remote write protocol
Substitute -remoteWrite.useVMProto with -remoteWrite.forcePromProto command-line flag,
which can be used for forcing Prometheus remote write protocol in cases when the remote storage
supports VictoriaMetrics remote write protocol.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3847
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-23 17:36:55 -08:00
Aliaksandr Valialkin
e688121de8 lib/promscrape/discovery/kuma: substitute blocking HTTP call with non-blocking HTTP call at discoveryutils.Client 2023-02-23 15:13:08 -08:00
Denys Holius
24915fd4bc Fix some typos and adds improvements for packer builds (#3825)
* update helper scripts to latest versions

* added missed command for initialisation variables from Makefile

* deployment/marketplace/vultr/helper-scripts/vultr-helper.sh: update helper script to latest version

* fixed typo for using VM_VERSION variable

* added an example of specifying the VM_VERSION and tokens for API's

* set packer logging to STDOUT by default
2023-02-23 12:42:55 +04:00
Aliaksandr Valialkin
637043a40e docs/CHANGELOG.md: document 6d019a3c37
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3830
2023-02-22 19:24:00 -08:00
Mattias Ängehov
6d019a3c37 Azure Service Discovery - Fix token fetch for Container Apps/App Services (#3832)
* Modify API version when running in Container App

* Handle expires on from token response

Response from IMDS does not always contain expires in value which is
currently used to get the token expiry time. An example resources that
doesn't provide it are Container Apps and App Service.

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>

* Fix client id parameter for user assigned identity

* Apply suggestions from code review

---------

Signed-off-by: Mattias Ängehov <mattias.angehov@castoredc.com>
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2023-02-22 19:19:53 -08:00
Aliaksandr Valialkin
510f78a96b all: consistently use http.Method{Get,Post,Put} across the codebase
This is a follow-up after 9dec3c8f80
2023-02-22 18:58:46 -08:00
my-git9
9dec3c8f80 chore: Use http constants to replace numbers (#3846)
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-02-22 18:53:05 -08:00
Alexander Marshalov
aaef0bac00 fix interpolate function for filling only intermediate gaps (#3816) (#3857)
* fix interpolate function for filling only intermediate gaps (#3816)

* Update docs/CHANGELOG.md

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-22 18:38:43 -08:00
Yury Molodov
fc720a5a78 fix: change query params update (#3860) 2023-02-22 18:25:32 -08:00
Aliaksandr Valialkin
383bf9689e docs/CHANGELOG.md: document d2b92d3264
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
2023-02-22 17:51:52 -08:00
Aliaksandr Valialkin
9fbd45a22f lib/promscrape/discovery/kuma: follow-up for 317fef95f9
- Do not generate __meta_server label, since it is unavailable in Prometheus.
- Add a link to https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs to docs/CHANGELOG.md,
  so users could click it and read the docs without the need to search the corresponding docs.
- Remove kumaTarget struct, since it is easier generating labels for discovered targets
  directly from the response returned by Kuma. This simplifies the code.
- Store the generated labels for discovered targets inside atomic.Value. This allows reading them
  from concurrent goroutines without the need to use mutex.
- Use synchronouse requests to Kuma instead of long polling, since there is a little sense
  in the long polling when the Kuma server may return 304 Not Modified response every -promscrape.kumaSDCheckInterval.
- Remove -promscrape.kuma.waitTime command-line flag, since it is no longer needed when long polling isn't used.
- Set default value for -promscrape.kumaSDCheckInterval to 30s in order to be consistent with Prometheus.
- Remove unnecessary indirections for string literals, which are used only once, in order to improve code readability.
- Remove unused fields from discoveryRequest and discoveryResponse.
- Update tests.
- Document why fetch_timeout and refresh_interval options are missing in kuma_sd_config.
- Add docs to discoveryutils.RequestCallback and discoveryutils.ResponseCallback,
  since these are public types.

Side notes: it is weird that Prometheus implementation for kuma_sd_configs sets `instance` label,
since usually this label is set by the Prometheus itself to __address__ after the relabeling phase.
See https://www.robustperception.io/life-of-a-label/

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3389

See https://github.com/prometheus/prometheus/issues/7919
and https://github.com/prometheus/prometheus/pull/8844
as a reference implementation in Prometheus
2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
eb08579452 lib/promscrape/discovery: add a comment explaining why duplicates are removed from the generated target labels 2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
4f649a0573 docs/CHANGELOG.md: document 110c3896e7
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3600
2023-02-22 17:51:51 -08:00
Aliaksandr Valialkin
f8811411fd app/vmagent/remotewrite: removed unneeded code in testPushWriteRequest after 57801660ab 2023-02-22 17:51:50 -08:00
Alexander Marshalov
b6845951a5 fixed typo in dns+srv documentation (#3861) 2023-02-23 02:40:00 +01:00
Zakhar Bessarab
d2b92d3264 lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3853)
* lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (see #3747)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: fix order of params for `doRequestWithPossibleRetry` to follow codestyle

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* lib/promscrape: accept deadline explicitly and extend passed context for local use

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-22 17:05:16 +01:00
Artem Navoiev
84c82e988d Add Dig Security case study
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 16:38:57 +01:00
Alexander Marshalov
317fef95f9 add kuma_sd_config for Kuma Control Plane targets discovery (#3389) (#3840) 2023-02-22 13:59:56 +01:00
Dmytro Kozlov
110c3896e7 app/vmctl: add retry backoff policy (#3844)
app/vmctl: move retries logic into a separate pkg
2023-02-22 13:06:55 +01:00
Alexander Marshalov
57801660ab Fix TestPushWriteRequest for remotewriteprotocol for pure-test (with native zstd) (#3850)
app/vmagent: fix TestPushWriteRequest for pure-test (with native zstd)

Co-authored-by: Roman Khavronenko <roman@victoriametrics.com>
2023-02-22 11:26:19 +01:00
Artem Navoiev
6b4ccc17c2 docs: fix typo OpentTSDB -> OpenTSDB (#3854)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2023-02-22 11:24:26 +01:00
Aliaksandr Valialkin
10cf6c9781 app/vmselect: allow zero value for -search.latencyOffset command-line flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2061#issuecomment-1299109836

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/218
2023-02-21 18:06:55 -08:00
Aliaksandr Valialkin
836d56876a vendor: make vendor-update 2023-02-21 18:06:20 -08:00
Aliaksandr Valialkin
d59dc7616d go.mod: update github.com/VictoriaMetrics/fastcache from v1.12.0 to v1.12.1 2023-02-21 17:51:41 -08:00
Aliaksandr Valialkin
ffebc20f6d vendor: update github.com/VictoriaMetrics/fasthttp from v1.1.0 to v1.2.0
The v1.2.0 adds HostClient.DoCtx() function, which is needed by https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3747
for implementing fast canceling of pending requests to scrape targets on config update
2023-02-21 17:49:30 -08:00
Aliaksandr Valialkin
f04ec714c2 docs/Articles.md: mention rules backfilling via vmalert article
This is a follow-up for 5446ce0018
2023-02-21 17:44:08 -08:00
Roman Khavronenko
5446ce0018 docs: mention rules replay blogpost in vmalert docs (#3851)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-21 22:16:15 +01:00
panguicai
21546e6922 docs: update operator release name to be consistent with the following (#3845)
Signed-off-by: panguicai008 <1121906548@qq.com>
2023-02-21 16:57:54 +01:00
Aliaksandr Valialkin
7be8a8a37b docs/CHANGELOG.md: fix a link to VictoriaMetrics remote write protocol 2023-02-20 19:59:44 -08:00
Aliaksandr Valialkin
6c38702212 docs/Single-server-VictoriaMetrics.md: remove + from start and end query args, since curl substitutes them with whitespace and breaks the query 2023-02-20 19:32:09 -08:00
Aliaksandr Valialkin
ef1ef0598b docs/guides/migrate-from-influx.md: remove misleading doublequotes from the query example 2023-02-20 19:25:51 -08:00
Zakhar Bessarab
9d91d8fc91 vmgateway: add support of JWKS endpoint usage for JWT keys verification (#521) 2023-02-20 19:22:55 -08:00
Aliaksandr Valialkin
f4f1f2f976 docs/vmagent.md: remove the claim that VictoriaMetrics remote write protocol reduces the network bandwidth usage by up to 10x comparing to Prometheus remote write protocol
The 10x savings are reproduced only on artificial data.
The savings on production data are usually in the range 2x-4x.
2023-02-20 19:11:31 -08:00
Aliaksandr Valialkin
a678cbe62e docs/vmagent.md: mention that Mimir doesnt support backfilling 2023-02-20 19:11:30 -08:00
Aliaksandr Valialkin
76f2c70be3 app/vmagent: add support for VictoriaMetrics remote write protocol, which allows saving up to 10x on network bandwidth costs under high load
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1225
2023-02-20 19:11:30 -08:00
Roman Khavronenko
74f122294d docs: update vmalert docs (#3843)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-20 16:39:31 +01:00
Corporte Gadfly
d5171c155c docs: typo fix (#3839) 2023-02-20 10:06:19 +01:00
Aliaksandr Valialkin
2cea923ff7 app/vmselect/promql: add share(q) aggregate function for normalizing results across multiple time series in [0..1] value range per each timestamp and aggregation group 2023-02-18 22:42:01 -08:00
Aliaksandr Valialkin
c86f1f1d1b app/vmselect/promql: add range_zscore(q) and range_trim_zscore(z, q) functions
These functions may be useful for dropping outliers at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759
2023-02-18 22:41:59 -08:00
Aliaksandr Valialkin
1030be91ae vendor: make vendor-update 2023-02-18 15:36:41 -08:00
Aliaksandr Valialkin
8bd2eee8e0 app/vmalert/README.md: sync with docs/vmalert.md after 6ef6f3a771 2023-02-18 15:21:24 -08:00
Aliaksandr Valialkin
406822a16c app/vmselect/promql: add range_mad(q) and range_trim_outliers(k, q) functions
These functions may help trimming outliers during query time
for the use case described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759
2023-02-18 15:19:10 -08:00
Aliaksandr Valialkin
c8467f37a9 vendor: update github.com/valyala/gozstd from v1.17.0 to v1.18.0 2023-02-18 15:19:10 -08:00
Haleygo
6ef6f3a771 vmalert: fix maxResolveDuration flag note (#3827)
Signed-off-by: Haleygo <hui.wang@daocloud.io>
2023-02-16 19:26:17 +01:00
Duc Tran
b58107c85a docs: fix links to alerts and alertmanager (#3829) 2023-02-16 19:25:10 +01:00
Aliaksandr Valialkin
87f1ed5d87 app/vmui: tooltip formatting enhancements according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706#issuecomment-1429980038 2023-02-14 23:37:26 -08:00
Aliaksandr Valialkin
11ce30820b all: update Go builder from Go1.20.0 to Go1.20.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.20.1+label%3ACherryPickApproved
2023-02-14 23:05:16 -08:00
Aliaksandr Valialkin
5123a61be9 vendor: make vendor-update 2023-02-13 11:14:17 -08:00
Aliaksandr Valialkin
5c4f5b83fc all: rename ParseStream -> stream.Parse
This is a follow-up for 057698f7fb
2023-02-13 10:52:05 -08:00
Aliaksandr Valialkin
ccdddf7996 lib/protoparser/promremotewrite: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:46:54 -08:00
Aliaksandr Valialkin
9be1398b92 lib/protoparser/native: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:43:05 -08:00
Aliaksandr Valialkin
8830607021 lib/protoparser/graphite: extract stream parsing code into a separate stream package 2023-02-13 10:32:36 -08:00
Aliaksandr Valialkin
a646841c07 lib/protoparser/csvimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:25:46 -08:00
Aliaksandr Valialkin
7568658c19 lib/protoparser/vmimport: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:20:19 -08:00
Aliaksandr Valialkin
af37717108 lib/protoparser/opentsdbhttp: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:16:03 -08:00
Aliaksandr Valialkin
7720d403c0 lib/protoparser/opentsdb: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 10:03:16 -08:00
Aliaksandr Valialkin
fe196e0b7a lib/protoparser/influx: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:58:52 -08:00
Aliaksandr Valialkin
f83d6d69b2 lib/protoparser/datadog: extract stream parsing code into a separate stream package
This is a follow-up for 057698f7fb
2023-02-13 09:51:47 -08:00
Aliaksandr Valialkin
f3be9483f4 docs/CHANGELOG.md: improve the docs for 8ea02eaa8e 2023-02-13 09:41:46 -08:00
Roman Khavronenko
057698f7fb lib/protoparser/prometheus: move streamparser to subpackage (#3814)
`lib/protoparser/prometheus` is used by various applications,
such as `app/vmalert`. The recent change to the
`lib/protoparser/prometheus` package introduced a new dependency
of `lib/writeconcurrencylimiter` which exposes some metrics.
Because of the dependency, now all applications which have this
dependency also expose these metrics.

Creating a new `lib/protoparser/prometheus/stream` package helps
to remove these metrics from apps which use `lib/protoparser/prometheus`
as dependency.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 09:26:07 -08:00
Roman Khavronenko
a645a95bd6 docs: improve troubleshooting docs for vmalert (#3812)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 17:29:30 +01:00
Roman Khavronenko
934646ccf7 follow-up after d1cbc35cf6 (#3813)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-13 16:23:50 +01:00
Droxenator
8ea02eaa8e fixed opentsdbListenAddr timestamp conversion (#3810)
Co-authored-by: Andrei Ivanov <a.ivanov@corp.mail.ru>
2023-02-13 16:07:53 +01:00
Artem Navoiev
2e2b1ba87e change docs for VictoriaMetrics Managed (#3803)
* change docs for VictoriaMetrics Managedd

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* Update docs/managed-victoriametrics/user-managment.md

Co-authored-by: Max Golionko <8kirk8@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
Co-authored-by: Max Golionko <8kirk8@gmail.com>
2023-02-13 13:29:03 +01:00
Oleksandr Redko
9fff48c3e3 app,lib: fix typos in comments (#3804) 2023-02-13 13:27:13 +01:00
Aliaksandr Valialkin
438b2e11bd app/vmauth: allow specifying max_concurrent_requests value on a per-user basis bigger than the -maxConcurrentPerUserRequests value 2023-02-11 20:53:08 -08:00
Aliaksandr Valialkin
e5070c0bcd docs/sd_configs.md: properly escape __address__ string 2023-02-11 14:45:54 -08:00
Aliaksandr Valialkin
c5b9f7f751 docs/sd_configs.md: document how the __address__ label is generated per each discovered target 2023-02-11 14:41:59 -08:00
Aliaksandr Valialkin
f9b3409ee3 lib/promscrape/discovery/openstack: use port 80 for the discovered target by default if it isnt specified in the config 2023-02-11 14:41:58 -08:00
Aliaksandr Valialkin
25a9017a72 app/vmui: show median instead of avg on graph tooltip and line legend
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706
2023-02-11 12:51:12 -08:00
Aliaksandr Valialkin
3ec8a4dc80 lib/{mergeset,storage}: allow at least 3 concurrent flushes during background merges on systems with 1 or 2 CPU cores
This should prevent from data ingestion slowdown and query performance degradation
on systems with small number of CPU cores (1 or 2), when big merge is performed.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
2023-02-11 12:08:52 -08:00
Aliaksandr Valialkin
4156e78e11 docs/Articles.md: add a link to https://www.techetio.com/2022/08/21/evaluating-backend-options-for-prometheus-metrics/ 2023-02-11 12:08:52 -08:00
Roman Khavronenko
b209d4ace0 dashboards: use median instead of avg (#3800)
`avg` can be affected by just one outlier, which may lead
to false conclusions. `median` is supposed to reflect
reality better by leveling outliers out.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-11 10:01:30 -08:00
Aliaksandr Valialkin
b11bdc46be app/vmselect/promql: add mad_over_time(m[d]) function
See https://github.com/prometheus/prometheus/issues/5514
2023-02-11 01:06:20 -08:00
Aliaksandr Valialkin
ed4492ddd5 all: update alpine base docker image from 1.17.1 to 1.17.2
See https://alpinelinux.org/posts/Alpine-3.17.2-released.html
2023-02-11 00:37:17 -08:00
Aliaksandr Valialkin
776391917f app/vmauth: improve load balancing by sending incoming requests to backends with the lowest number of concurrent requests
While at it, stop sending requests to unavailable backend for 3 seconds
before the next attempt. This should reduce the amounts of useless work
and the number of useless network packets when the backend is temporarily unavailable.
2023-02-11 00:30:31 -08:00
Aliaksandr Valialkin
f3625e4f3f app/vmauth: add -maxConcurrentPerUserRequests command-line option for limiting the number of concurrent requests on a per-user basis
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346
2023-02-10 21:58:21 -08:00
Aliaksandr Valialkin
70f8911ca7 app/vmauth: automatically retry failing GET requests on the remaining backends 2023-02-09 21:05:55 -08:00
Dmytro Kozlov
f582f9e8ab app/vmauth: add concurrent requests limit per auth record (#3749)
* app/vmauth: add concurent requests limit per auth record

* app/vmauth: added clarification comment

* app/vmauth: remove unused code

* app/vmauth: move read from limiter

* app/vmauth: fix text

* app/vmauth: fix comments

* - Clarify the docs for the max_concurrent_requests option at docs/vmauth.md
- Clarify the description of the change at docs/CHANGELOG.md
- Make sure that the -maxConcurrentRequests takes precedence over per-user max_concurrent_requests
- Update tests for verifying that the max_concurrent_requests option is parsed properly

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3346

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 20:03:01 -08:00
Aliaksandr Valialkin
73358571ee app/vmalert: follow-up after d3c64aae8768d58781ee7e358bd7f3d8e0eb836d
- Document the change at docs/CHANGELOG.md
- Add `Reading rules from object storage` section to docs/vmalert.md
- Add `s3` prefix to command-line flags related to the configuration of s3 and gcs clients
- Explicitly mention that reading rules from object storage is supported only in enterprise version
2023-02-09 18:52:00 -08:00
Roman Khavronenko
14c20c1843 vmalert: support object storage for rules (#519)
* vmalert: support object storage for rules

Support loading of alerting and recording rules from object
storages `gcs://`, `gs://`, `s3://`.

* review fixes
2023-02-09 18:50:48 -08:00
Aliaksandr Valialkin
c12eb2cc7f deployment/docker: update VictoriaMetrics Docker images from v1.87.0 to v1.87.1 2023-02-09 15:53:08 -08:00
Aliaksandr Valialkin
291c41978e vendor: make vendor-update 2023-02-09 14:48:16 -08:00
Aliaksandr Valialkin
d4b97b69bf docs/CHANGELOG.md: document d621d50d4fb3b43a0bcb4419bee979f0192d38fe 2023-02-09 14:40:07 -08:00
Aliaksandr Valialkin
035a2b5ed5 all: skip issues with low severity at docker scan 2023-02-09 14:25:13 -08:00
Aliaksandr Valialkin
0e0095d350 all: run apk update && apk upgrade in base Alpine Docker image in order to get all the recent security fixes 2023-02-09 14:01:32 -08:00
Aliaksandr Valialkin
a42e3e8dfb docs/CHANGELOG.md: cut v1.87.1 and mark 1.87.x as LTS release 2023-02-09 11:20:57 -08:00
Zakhar Bessarab
f13a255918 lib/promscrape: fix cancelling in-flight scrape requests during configuration reload (#3791)
* lib/promscrape: fix cancelling in-flight scrape requests during configuration reload when using `streamParse` mode (see #3747)

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 11:13:06 -08:00
Aliaksandr Valialkin
513707a8c7 app/vmui: UX enhancements for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3706
- Display `min` value additionally to `avg`, `max` and `last`
- Allow copy-n-pasting metric name with its labels from both legend and tooltup
2023-02-09 11:04:51 -08:00
Aliaksandr Valialkin
f40661e7b7 docs/vmagent.md: clarify that automatically generated metrics contain all the target-specific labels, including instance and job 2023-02-09 11:04:51 -08:00
Air
a1432e6b0a Possibly spelling in the Quick start 2023-02-09 15:50:20 +02:00
Yury Molodov
bff18cb5dd vmui: lazy loading predefined panels (#3795)
* fix: change logic lazy loading predefined panels

* app/vmselect/vmui: `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 00:11:55 -08:00
Yury Molodov
e1063ce3c1 vmui: improve tenant selector (#3794)
* fix: change styles tenant selector (#3792)

* docs/CHANGELOG.md: document the change

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3792

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-09 00:08:59 -08:00
Aliaksandr Valialkin
46a521191f docs/CHANGELOG.md: document changes at v1.79.8 LTS release 2023-02-08 23:38:46 -08:00
Yury Molodov
8afc0aef8d vmui: add last/max/avg values (#3789)
* feat: add last/max/avg values (#3706)

* fix: change filter exclude values

* app/vmui: wip

- improve the visualization for avg/max/last values
- make getAvgFromArray() function resilient against inf/undefined/nil
- export getLastFromArray() function, which is resilient against inf/undefined/nil
- run `make vmui-update`

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-08 22:41:20 -08:00
Aliaksandr Valialkin
114c14febf docs/CHANGELOG.md: document 75bcf86a31
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3740
2023-02-08 11:24:07 -08:00
Yury Molodov
75bcf86a31 fix: turn off the local dashboards(#3740) (#3793) 2023-02-08 11:13:15 -08:00
Aliaksandr Valialkin
a1ee679042 docs/CHANGELOG.md: add more context to the bugfix description in Nomad service discovery
See 146fd2eca3
2023-02-08 09:24:48 -08:00
Aliaksandr Valialkin
a8e88e74cc lib/backup/azremote: fix after upgrading github.com/Azure/azure-sdk-for-go/sdk/storage/azblob from v0.6.1 to v1.0.0 2023-02-08 09:18:23 -08:00
Aliaksandr Valialkin
c9d2934bb4 vendor: make vendor-update 2023-02-08 08:55:14 -08:00
Aliaksandr Valialkin
6c21b6ec09 docs/CHANGELOG.md: document the change at 67b01329a0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
2023-02-08 08:42:19 -08:00
Roman Khavronenko
e83f14210d Vmalert fixes (#3788)
* vmalert: use group's ID in UI to avoid collisions

Identical group names are allowed. So we should used IDs
for various groupings and aggregations in UI.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: prevent disabling state updates tracking

The minimum number of update states to track is now set to 1.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: properly update `debug` and `update_entries_limit` params on hot-reload

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: display `debug` field for rule in UI

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* vmalert: exclude `updates` field from json marhsaling

This field isn't correctly marshaled right now.
And implementing the correct marshaling for it doesn't
seem right, since json representation is mostly used
by systems like Grafana. And Grafana doesn't expect this
field to be present.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* fix test for disabled state

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 14:34:03 +01:00
Max Golionko
6495b62866 bump go to 1.20 in ci jobs (#3787) 2023-02-08 14:32:42 +01:00
Roman Khavronenko
86e47177dc docs: follow-up after 2e4bfcce63 (#3785)
2e4bfcce63

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-08 09:48:05 +01:00
Karan Sharma
146fd2eca3 sd/nomad: panic in nomad watcher because of nil map (#3784)
properly initialize url.Values
2023-02-08 09:43:29 +01:00
Aliaksandr Valialkin
67b01329a0 lib/writeconcurrencylimiter: initialize concurrencyLimitCh before exporting vm_concurrent_insert_capacity and vm_concurrent_insert_current metrics
This will result in proper calculations for the the alerting rule:

 avg_over_time(vm_concurrent_insert_current[1m]) >= vm_concurrent_insert_capacity

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3761
2023-02-07 11:08:17 -08:00
Aliaksandr Valialkin
f2be447270 Makefile: update golangci-lint from v1.50.1 to v1.51.1 2023-02-07 11:08:11 -08:00
Aliaksandr Valialkin
1901fbf19b docs/CHANGELOG.md: fix formatting for the change from 6fd10e8871 2023-02-07 09:34:57 -08:00
earthgecko
3a51a3bc42 Clarifications between standalone/cluster ingestion endpoints (#3771)
docs: clarifications between standalone/cluster ingestion endpoints

This is an attempt to make it a bit clearer to the user that the cluster version ingestion URLs are different from the standalone ones.  I have also changed the order of the list items to make it a bit clearer and hopefully stop the user simply inferring that `/prometheus/api/v1` is only related to Prometheus data.
2023-02-07 15:17:12 +01:00
Max Golionko
6f24fa2055 CI: speedup build by 2.4x. restore nightly build (#3772)
* setup docker buildx
* add snyk integration
* add go cache for docker build
* cancel redundant job if there is new commit into same PR or branch
2023-02-07 10:12:16 +08:00
Max Golionko
977c642934 docs: update formatiing for k8s monitoring with Managed VictoriaMetrics (#3768)
* jekyll formatting madness
2023-02-06 18:33:40 +08:00
Roman Khavronenko
c32d8ea29e vmalert: update docs (#3770)
vmalert: update flags description

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2023-02-06 09:51:30 +01:00
Denys Holius
8aa7559462 fixed wrong vmstorage port number (#3769) 2023-02-06 09:08:50 +01:00
Roman Khavronenko
6fd10e8871 vmalert: speed up state restore procedure on start (#3758)
* vmalert: speed up state restore procedure on start

Alerts state restore procedure has been changed to become asynchronous.
It doesn't block groups start anymore which significantly improves vmalert's startup time.
Instead, state restore is called by each group in their goroutines after the first rules
evaluation.

While previously state restore attempt was made for all loaded alerting rules,
now it is called only for alerts which became active after the first evaluation.
This reduces the amount of API calls to the configured remote read URL.

This also means that `remoteRead.ignoreRestoreErrors` command-line flag becomes deprecated now
and will have no effect if configured.

See relevant issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2608

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* make lint happy

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Apply suggestions from code review

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2023-02-03 19:46:13 -08:00
Aliaksandr Valialkin
0a824d9490 app/vmselect/vmui: make vmui-update after e4c04b6dbe 2023-02-03 19:34:01 -08:00
Yury Molodov
e4c04b6dbe vmui: set light theme for app mode (#3748)
* fix: set light theme for app mode

* fix: check inputTenantID flag

* fix: rename inputTenantID to useTenantID
2023-02-03 19:31:37 -08:00
Aliaksandr Valialkin
98dc968920 docs/CHANGELOG.md: document f63f487787
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3707
2023-02-03 19:30:12 -08:00
Yury Molodov
f63f487787 vmui: mobile view (#3742)
* feat: add detect the system theme

* fix: change logic fetch tenants

* feat: add docs and info to cardinality page

* feat: add mobile view #3707
2023-02-03 19:27:57 -08:00
Aliaksandr Valialkin
88fed0232c dashboards: typo fix Datapoints scanned per series -> Datapoints scanned per query 2023-02-03 19:12:33 -08:00
Aliaksandr Valialkin
af1a9c5eda deployment/docker: update Go builder from Go1.19.5 to Go1.20.0
See https://go.dev/blog/go1.20
2023-02-03 14:17:33 -08:00
Aliaksandr Valialkin
7440f971ab docs/MetricsQL.md: add links to "rollup results" explanation 2023-02-03 11:09:42 -08:00
Aliaksandr Valialkin
12cf8d9f69 docs/managed-victoriametrics/how-to-monitor-k8s.md: rename image files according to docs/assets/README.md 2023-02-03 10:43:02 -08:00
Max Golionko
e18f8e9413 docs: move managed victoria metics guide into right folder (#3750)
* move guide folder
* image width control
2023-02-03 03:13:36 +08:00
Max Golionko
07b7fe83c4 Update docs/managed_victoriametrics/how-to-monitor-k8s.md
Co-authored-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2023-02-02 15:43:55 +02:00
Max Golionko
a2ba1f09e4 add guide to list of guides 2023-02-02 15:43:55 +02:00
Max Golionko
79527441ec added k8s guide for managed VM 2023-02-02 15:43:55 +02:00
Max Golionko
df1e545c0e disable codeql for docs. merge build and test back to one job (#3746) 2023-02-02 20:59:08 +08:00
Aliaksandr Valialkin
dc142867b8 docs/CHANGELOG.md: typo fixes 2023-02-01 20:40:44 -08:00
Aliaksandr Valialkin
da539bc286 deployment/docker: update VictoriaMetrics docker image tag from v1.86.2 to v1.87.0 2023-02-01 20:03:04 -08:00
1739 changed files with 117246 additions and 40990 deletions

32
.github/ISSUE_TEMPLATE/question.yml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Question
description: Ask a question regarding VictoriaMetrics or its components
labels: [question]
body:
- type: textarea
id: describe-the-component
attributes:
label: Is your question request related to a specific component?
placeholder: |
VictoriaMetrics, vmagent, vmalert, vmui, etc...
validations:
required: false
- type: textarea
id: describe-the-question
attributes:
label: Describe the question in detail
description: |
A clear and concise description of the issue and the question.
validations:
required: true
- type: checkboxes
id: troubleshooting
attributes:
label: Troubleshooting docs
description: I am familiar with the following troubleshooting docs
options:
- label: General - https://docs.victoriametrics.com/Troubleshooting.html
required: false
- label: vmagent - https://docs.victoriametrics.com/vmagent.html#troubleshooting
required: false
- label: vmalert - https://docs.victoriametrics.com/vmalert.html#troubleshooting
required: false

View File

@@ -17,7 +17,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.5
go-version: 1.20.4
id: go
- name: Code checkout
uses: actions/checkout@master

View File

@@ -13,6 +13,10 @@ on:
schedule:
- cron: "30 18 * * 2"
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze

View File

@@ -15,6 +15,7 @@ on:
push:
branches: [master, cluster]
paths-ignore:
- "docs/**"
- "**.md"
- "**.txt"
- "**.js"
@@ -22,12 +23,17 @@ on:
# The branches below must be a subset of the branches above
branches: [master, cluster]
paths-ignore:
- "docs/**"
- "**.md"
- "**.txt"
- "**.js"
schedule:
- cron: "30 18 * * 2"
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
analyze:
name: Analyze
@@ -49,9 +55,9 @@ jobs:
uses: actions/checkout@v3
- name: Set up Go
uses: actions/setup-go@v3
uses: actions/setup-go@v4
with:
go-version: 1.19.5
go-version: 1.20.4
check-latest: true
cache: true
if: ${{ matrix.language == 'go' }}

View File

@@ -1,66 +0,0 @@
name: main - test
on:
push:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
permissions:
contents: read
jobs:
lint:
name: lint
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: Dependencies
run: |
make install-golangci-lint
make check-all
git diff --exit-code
test:
needs: lint
strategy:
matrix:
scenario: ["test-full", "test-pure", "test-full-386"]
name: test
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19.5
check-latest: true
cache: true
- name: run tests
run: |
make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt

View File

@@ -1,30 +1,95 @@
name: main
on:
workflow_run:
workflows: ["main - test"]
types:
- completed
push:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- master
- cluster
paths-ignore:
- "docs/**"
- "**.md"
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build:
name: Build
lint:
name: lint
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
with:
ref: ${{ github.event.workflow_run.head_branch }}
- name: Setup Go
uses: actions/setup-go@v3
uses: actions/setup-go@v4
with:
go-version: 1.19.5
go-version: 1.20.4
check-latest: true
cache: true
- name: Dependencies
run: |
make install-golangci-lint
make check-all
git diff --exit-code
test:
needs: lint
strategy:
matrix:
scenario: ["test-full", "test-pure", "test-full-386"]
name: test
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: 1.20.4
check-latest: true
cache: true
- name: run tests
run: |
make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.txt
build:
needs: test
name: build
runs-on: ubuntu-latest
steps:
- name: Code checkout
uses: actions/checkout@v3
- name: Setup Go
id: go
uses: actions/setup-go@v4
with:
go-version: 1.20.4
check-latest: true
cache: true
- uses: actions/cache@v3
with:
path: gocache-for-docker
key: gocache-docker-${{ runner.os }}-${{ steps.go.outputs.go-version }}-${{ hashFiles('go.mod') }}
- name: Build
run: |
make victoria-metrics-crossbuild

View File

@@ -12,28 +12,37 @@ jobs:
name: Build
runs-on: ubuntu-latest
steps:
-
name: Login to Docker Hub
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Setup Go
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.19.5
go-version: 1.20.4
id: go
-
name: Setup docker scan
- name: Setup docker scan
run: |
mkdir -p ~/.docker/cli-plugins && \
curl https://github.com/docker/scan-cli-plugin/releases/latest/download/docker-scan_linux_amd64 -L -s -S -o ~/.docker/cli-plugins/docker-scan &&\
chmod +x ~/.docker/cli-plugins/docker-scan
-
name: Code checkout
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Code checkout
uses: actions/checkout@master
-
name: Publish
- uses: actions/cache@v3
with:
path: gocache-for-docker
key: gocache-docker-${{ runner.os }}-${{ steps.go.outputs.go-version }}-${{ hashFiles('go.mod') }}
- name: build & publish
run: |
docker scan --severity=medium --login --token "$SNYK_TOKEN" --accept-license
LATEST_TAG=nightly PKG_TAG=nightly make publish
env:
SNYK_TOKEN: ${{ secrets.SNYK_AUTH_TOKEN }}

View File

@@ -1,14 +1,18 @@
run:
timeout: 2m
enable:
linters:
enable:
- revive
issues:
exclude-rules:
- linters:
- staticcheck
text: "SA(4003|1019|5011):"
- linters:
- staticcheck
text: "SA(4003|1019|5011):"
include:
- EXC0012
- EXC0014
linters-settings:
errcheck:

View File

@@ -141,6 +141,8 @@ vmutils-windows-amd64: \
vmagent-windows-amd64 \
vmalert-windows-amd64 \
vmauth-windows-amd64 \
vmbackup-windows-amd64 \
vmrestore-windows-amd64 \
vmctl-windows-amd64
victoria-metrics-crossbuild: \
@@ -186,7 +188,8 @@ release-victoria-metrics: \
release-victoria-metrics-darwin-amd64 \
release-victoria-metrics-darwin-arm64 \
release-victoria-metrics-freebsd-amd64 \
release-victoria-metrics-openbsd-amd64
release-victoria-metrics-openbsd-amd64 \
release-victoria-metrics-windows-amd64
# adds i386 arch
release-victoria-metrics-linux-386:
@@ -213,6 +216,9 @@ release-victoria-metrics-freebsd-amd64:
release-victoria-metrics-openbsd-amd64:
GOOS=openbsd GOARCH=amd64 $(MAKE) release-victoria-metrics-goos-goarch
release-victoria-metrics-windows-amd64:
GOARCH=amd64 $(MAKE) release-victoria-metrics-windows-goarch
release-victoria-metrics-goos-goarch: victoria-metrics-$(GOOS)-$(GOARCH)-prod
cd bin && \
tar --transform="flags=r;s|-$(GOOS)-$(GOARCH)||" -czf victoria-metrics-$(GOOS)-$(GOARCH)-$(PKG_TAG).tar.gz \
@@ -222,6 +228,16 @@ release-victoria-metrics-goos-goarch: victoria-metrics-$(GOOS)-$(GOARCH)-prod
| sed s/-$(GOOS)-$(GOARCH)-prod/-prod/ > victoria-metrics-$(GOOS)-$(GOARCH)-$(PKG_TAG)_checksums.txt
cd bin && rm -rf victoria-metrics-$(GOOS)-$(GOARCH)-prod
release-victoria-metrics-windows-goarch: victoria-metrics-windows-$(GOARCH)-prod
cd bin && \
zip victoria-metrics-windows-$(GOARCH)-$(PKG_TAG).zip \
victoria-metrics-windows-$(GOARCH)-prod.exe \
&& sha256sum victoria-metrics-windows-$(GOARCH)-$(PKG_TAG).zip \
victoria-metrics-windows-$(GOARCH)-prod.exe \
> victoria-metrics-windows-$(GOARCH)-$(PKG_TAG)_checksums.txt
cd bin && rm -rf \
victoria-metrics-windows-$(GOARCH)-prod.exe
release-vmutils: \
release-vmutils-linux-386 \
release-vmutils-linux-amd64 \
@@ -295,26 +311,33 @@ release-vmutils-windows-goarch: \
vmagent-windows-$(GOARCH)-prod \
vmalert-windows-$(GOARCH)-prod \
vmauth-windows-$(GOARCH)-prod \
vmbackup-windows-$(GOARCH)-prod \
vmrestore-windows-$(GOARCH)-prod \
vmctl-windows-$(GOARCH)-prod
cd bin && \
zip vmutils-windows-$(GOARCH)-$(PKG_TAG).zip \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmbackup-windows-$(GOARCH)-prod.exe \
vmrestore-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
&& sha256sum vmutils-windows-$(GOARCH)-$(PKG_TAG).zip \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmbackup-windows-$(GOARCH)-prod.exe \
vmrestore-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
> vmutils-windows-$(GOARCH)-$(PKG_TAG)_checksums.txt
cd bin && rm -rf \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmbackup-windows-$(GOARCH)-prod.exe \
vmrestore-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe
pprof-cpu:
go tool pprof -trim_path=github.com/VictoriaMetrics/VictoriaMetrics@ $(PPROF_FILE)
@@ -380,7 +403,7 @@ golangci-lint: install-golangci-lint
golangci-lint run
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.50.1
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.51.2
govulncheck: install-govulncheck
govulncheck ./...

251
README.md
View File

@@ -2,6 +2,7 @@
[![Latest Release](https://img.shields.io/github/release/VictoriaMetrics/VictoriaMetrics.svg?style=flat-square)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
[![Docker Pulls](https://img.shields.io/docker/pulls/victoriametrics/victoria-metrics.svg?maxAge=604800)](https://hub.docker.com/r/victoriametrics/victoria-metrics)
[![victoriametrics](https://snapcraft.io/victoriametrics/badge.svg)](https://snapcraft.io/victoriametrics)
[![Slack](https://img.shields.io/badge/join%20slack-%23victoriametrics-brightgreen.svg)](https://slack.victoriametrics.com/)
[![GitHub license](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics.svg)](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/LICENSE)
[![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics)](https://goreportcard.com/report/github.com/VictoriaMetrics/VictoriaMetrics)
@@ -39,7 +40,8 @@ VictoriaMetrics has the following prominent features:
* It can be used as long-term storage for Prometheus. See [these docs](#prometheus-setup) for details.
* It can be used as a drop-in replacement for Prometheus in Grafana, because it supports [Prometheus querying API](#prometheus-querying-api-usage).
* It can be used as a drop-in replacement for Graphite in Grafana, because it supports [Graphite API](#graphite-api-usage).
* It features easy setup and operation:
VictoriaMetrics allows reducing infrastructure costs by more than 10x comparing to Graphite - see [this case study](https://docs.victoriametrics.com/CaseStudies.html#grammarly).
* It is easy to setup and operate:
* VictoriaMetrics consists of a single [small executable](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d)
without external dependencies.
* All the configuration is done via explicit command-line flags with reasonable defaults.
@@ -105,6 +107,7 @@ Case studies:
* [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch)
* [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern)
* [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl)
* [Dig Security](https://docs.victoriametrics.com/CaseStudies.html#dig-security)
* [Fly.io](https://docs.victoriametrics.com/CaseStudies.html#flyio)
* [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence)
* [Grammarly](https://docs.victoriametrics.com/CaseStudies.html#grammarly)
@@ -125,11 +128,22 @@ See also [articles and slides about VictoriaMetrics from our users](https://docs
## Operation
### How to start VictoriaMetrics
### Install
Just download [VictoriaMetrics executable](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) or [Docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) and start it with the desired command-line flags.
To quickly try VictoriaMetrics, just download [VictoriaMetrics executable](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) or [Docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) and start it with the desired command-line flags.
See also [QuickStart guide](https://docs.victoriametrics.com/Quick-Start.html) for additional information.
VictoriaMetrics can also be installed via these installation methods:
* [Helm charts for single-node and cluster versions of VictoriaMetrics](https://github.com/VictoriaMetrics/helm-charts).
* [Kubernetes operator for VictoriaMetrics](https://github.com/VictoriaMetrics/operator).
* [Ansible role for installing cluster VictoriaMetrics (by VictoriaMetrics)](https://github.com/VictoriaMetrics/ansible-playbooks).
* [Ansible role for installing cluster VictoriaMetrics (by community)](https://github.com/Slapper/ansible-victoriametrics-cluster-role).
* [Ansible role for installing single-node VictoriaMetrics (by community)](https://github.com/dreamteam-gg/ansible-victoriametrics-role).
* [Snap package for VictoriaMetrics](https://snapcraft.io/victoriametrics).
### How to start VictoriaMetrics
The following command-line flags are used the most:
* `-storageDataPath` - VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
@@ -188,7 +202,7 @@ Changing scrape configuration is possible with text editor:
vi $SNAP_DATA/var/snap/victoriametrics/current/etc/victoriametrics-scrape-config.yaml
```
After changes were made, trigger config re-read with the command `curl 127.0.0.1:8248/-/reload`.
After changes were made, trigger config re-read with the command `curl 127.0.0.1:8428/-/reload`.
## Prometheus setup
@@ -268,7 +282,11 @@ http://<victoriametrics-addr>:8428
Substitute `<victoriametrics-addr>` with the hostname or IP address of VictoriaMetrics.
Then build graphs and dashboards for the created datasource using [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
Then build graphs and dashboards for the created datasource using [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/)
or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
Alternatively, use VictoriaMetrics [datasource plugin](https://github.com/VictoriaMetrics/grafana-datasource) with support of extra features.
See more in [description](https://github.com/VictoriaMetrics/grafana-datasource#victoriametrics-data-source-for-grafana).
## How to upgrade VictoriaMetrics
@@ -289,13 +307,21 @@ Prometheus doesn't drop data during VictoriaMetrics restart. See [this article](
## vmui
VictoriaMetrics provides UI for query troubleshooting and exploration. The UI is available at `http://victoriametrics:8428/vmui`.
The UI allows exploring query results via graphs and tables.
It also provides the following features:
The UI allows exploring query results via graphs and tables. It also provides the following features:
- [metrics explorer](#metrics-explorer)
- [cardinality explorer](#cardinality-explorer)
- [query tracer](#query-tracing)
- [top queries explorer](#top-queries)
- Explore:
- [Metrics explorer](#metrics-explorer) - automatically builds graphs for selected metrics;
- [Cardinality explorer](#cardinality-explorer) - stats about existing metrics in TSDB;
- [Top queries](#top-queries) - shows most frequently executed queries;
- Tools:
- [Trace analyzer](#query-tracing) - playground for loading query traces in JSON format;
- [WITH expressions playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/expand-with-exprs) - test how WITH expressions work;
- [Metric relabel debugger](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/relabeling) - playground for [relabeling](#relabeling) configs.
VMUI automatically switches from graph view to heatmap view when the query returns [histogram](https://docs.victoriametrics.com/keyConcepts.html#histogram) buckets
(both [Prometheus histograms](https://prometheus.io/docs/concepts/metric_types/#histogram)
and [VictoriaMetrics histograms](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) are supported).
Try, for example, [this query](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/?g0.expr=sum%28rate%28vm_promscrape_scrape_duration_seconds_bucket%29%29+by+%28vmrange%29&g0.range_input=24h&g0.end_input=2023-04-10T17%3A46%3A12&g0.relative_time=last_24_hours&g0.step_input=31m).
Graphs in `vmui` support scrolling and zooming:
@@ -307,9 +333,12 @@ Query history can be navigated by holding `Ctrl` (or `Cmd` on MacOS) and pressin
Multi-line queries can be entered by pressing `Shift-Enter` in query input field.
When querying the [backfilled data](https://docs.victoriametrics.com/#backfilling) or during [query troubleshooting](https://docs.victoriametrics.com/Troubleshooting.html#unexpected-query-results), it may be useful disabling response cache by clicking `Disable cache` checkbox.
When querying the [backfilled data](https://docs.victoriametrics.com/#backfilling)
or during [query troubleshooting](https://docs.victoriametrics.com/Troubleshooting.html#unexpected-query-results),
it may be useful disabling response cache by clicking `Disable cache` checkbox.
VMUI automatically adjusts the interval between datapoints on the graph depending on the horizontal resolution and on the selected time range. The step value can be customized by changing `Step value` input.
VMUI automatically adjusts the interval between datapoints on the graph depending on the horizontal resolution and on the selected time range.
The step value can be customized by changing `Step value` input.
VMUI allows investigating correlations between multiple queries on the same graph. Just click `Add Query` button,
enter an additional query in the newly appeared input field and press `Enter`.
@@ -353,8 +382,8 @@ VictoriaMetrics provides an ability to explore time series cardinality at `Explo
may show lower than expected number of unique label values for labels with small number of unique values.
This is because of [implementation limits](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5a6e617b5e41c9170e7c562aecd15ee0c901d489/app/vmselect/netstorage/netstorage.go#L1039-L1045).
By default cardinality explorer analyzes time series for the current date. It provides the ability to select different day at the top right corner.
By default all the time series for the selected date are analyzed. It is possible to narrow down the analysis to series
By default, cardinality explorer analyzes time series for the current date. It provides the ability to select different day at the top right corner.
By default, all the time series for the selected date are analyzed. It is possible to narrow down the analysis to series
matching the specified [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors).
Cardinality explorer is built on top of [/api/v1/status/tsdb](#tsdb-stats).
@@ -551,7 +580,8 @@ The `/api/v1/export` endpoint should return the following response:
```
Note that InfluxDB line protocol expects [timestamps in *nanoseconds* by default](https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/#timestamp),
while VictoriaMetrics stores them with *milliseconds* precision.
while VictoriaMetrics stores them with *milliseconds* precision. It is allowed to ingest timestamps with seconds,
microseconds or nanoseconds precision - VictoriaMetrics will automatically convert them to milliseconds.
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
@@ -614,7 +644,6 @@ The `__graphite__` pseudo-label supports e.g. alternate regexp filters such as `
VictoriaMetrics also supports Graphite query language - see [these docs](#graphite-render-api-usage).
## How to send data from OpenTSDB-compatible agents
VictoriaMetrics supports [telnet put protocol](http://opentsdb.net/docs/build/html/api_telnet/put.html)
@@ -729,20 +758,45 @@ All the Prometheus querying API handlers can be prepended with `/prometheus` pre
### Prometheus querying API enhancements
VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query arg, which can be used for enforcing additional label filters for queries. For example,
`/api/v1/query_range?extra_label=user_id=123&extra_label=group_id=456&query=<query>` would automatically add `{user_id="123",group_id="456"}` label filters to the given `<query>`. This functionality can be used for limiting the scope of time series visible to the given tenant. It is expected that the `extra_label` query args are automatically set by auth proxy sitting in front of VictoriaMetrics. See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query arg, which can be used
for enforcing additional label filters for queries. For example, `/api/v1/query_range?extra_label=user_id=123&extra_label=group_id=456&query=<query>`
would automatically add `{user_id="123",group_id="456"}` label filters to the given `<query>`.
This functionality can be used for limiting the scope of time series visible to the given tenant.
It is expected that the `extra_label` query args are automatically set by auth proxy sitting in front of VictoriaMetrics.
See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts optional `extra_filters[]=series_selector` query arg, which can be used for enforcing arbitrary label filters for queries. For example,
`/api/v1/query_range?extra_filters[]={env=~"prod|staging",user="xyz"}&query=<query>` would automatically add `{env=~"prod|staging",user="xyz"}` label filters to the given `<query>`. This functionality can be used for limiting the scope of time series visible to the given tenant. It is expected that the `extra_filters[]` query args are automatically set by auth proxy sitting in front of VictoriaMetrics. See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts optional `extra_filters[]=series_selector` query arg, which can be used for enforcing arbitrary label filters for queries.
For example, `/api/v1/query_range?extra_filters[]={env=~"prod|staging",user="xyz"}&query=<query>` would automatically
add `{env=~"prod|staging",user="xyz"}` label filters to the given `<query>`. This functionality can be used for limiting
the scope of time series visible to the given tenant. It is expected that the `extra_filters[]` query args are automatically
set by auth proxy sitting in front of VictoriaMetrics.
See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts multiple formats for `time`, `start` and `end` query args - see [these docs](#timestamp-formats).
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
VictoriaMetrics accepts `round_digits` query arg for [/api/v1/query](https://docs.victoriametrics.com/keyConcepts.html#instant-query)
and [/api/v1/query_range](https://docs.victoriametrics.com/keyConcepts.html#range-query) handlers. It can be used for rounding response values
to the given number of digits after the decimal point.
For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
VictoriaMetrics accepts `limit` query arg for `/api/v1/labels` and `/api/v1/label/<labelName>/values` handlers for limiting the number of returned entries. For example, the query to `/api/v1/labels?limit=5` returns a sample of up to 5 unique labels, while ignoring the rest of labels. If the provided `limit` value exceeds the corresponding `-search.maxTagKeys` / `-search.maxTagValues` command-line flag values, then limits specified in the command-line flags are used.
VictoriaMetrics accepts `limit` query arg for [/api/v1/labels](https://docs.victoriametrics.com/url-examples.html#apiv1labels)
and [`/api/v1/label/<labelName>/values`](https://docs.victoriametrics.com/url-examples.html#apiv1labelvalues) handlers for limiting the number of returned entries.
For example, the query to `/api/v1/labels?limit=5` returns a sample of up to 5 unique labels, while ignoring the rest of labels.
If the provided `limit` value exceeds the corresponding `-search.maxTagKeys` / `-search.maxTagValues` command-line flag values,
then limits specified in the command-line flags are used.
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, `/api/v1/labels` and `/api/v1/label/<labelName>/values` while the Prometheus API defaults to all time. Explicitly set `start` and `end` to select the desired time range.
VictoriaMetrics accepts `limit` query arg for `/api/v1/series` handlers for limiting the number of returned entries. For example, the query to `/api/v1/series?limit=5` returns a sample of up to 5 series, while ignoring the rest. If the provided `limit` value exceeds the corresponding `-search.maxSeries` command-line flag values, then limits specified in the command-line flags are used.
By default, VictoriaMetrics returns time series for the last day starting at 00:00 UTC
from [/api/v1/series](https://docs.victoriametrics.com/url-examples.html#apiv1series),
[/api/v1/labels](https://docs.victoriametrics.com/url-examples.html#apiv1labels) and
[`/api/v1/label/<labelName>/values`](https://docs.victoriametrics.com/url-examples.html#apiv1labelvalues),
while the Prometheus API defaults to all time. Explicitly set `start` and `end` to select the desired time range.
VictoriaMetrics rounds the specified `start..end` time range to day granularity because of performance optimization concerns.
If you need the exact set of label names and label values on the given time range, then send queries
to [/api/v1/query](https://docs.victoriametrics.com/keyConcepts.html#instant-query) or to [/api/v1/query_range](https://docs.victoriametrics.com/keyConcepts.html#range-query).
VictoriaMetrics accepts `limit` query arg at [/api/v1/series](https://docs.victoriametrics.com/url-examples.html#apiv1series)
for limiting the number of returned entries. For example, the query to `/api/v1/series?limit=5` returns a sample of up to 5 series, while ignoring the rest of series.
If the provided `limit` value exceeds the corresponding `-search.maxSeries` command-line flag values, then limits specified in the command-line flags are used.
Additionally, VictoriaMetrics provides the following handlers:
@@ -767,9 +821,11 @@ in [query APIs](https://docs.victoriametrics.com/#prometheus-querying-api-usage)
in [export APIs](https://docs.victoriametrics.com/#how-to-export-time-series).
- Unix timestamps in seconds with optional milliseconds after the point. For example, `1562529662.678`.
- [RFC3339](https://www.ietf.org/rfc/rfc3339.txt). For example, '2022-03-29T01:02:03Z`.
- Partial RFC3339. Examples: `2022`, `2022-03`, `2022-03-29`, `2022-03-29T01`, `2022-03-29T01:02`.
- Relative duration comparing to the current time. For example, `1h5m` means `one hour and five minutes ago`.
- [RFC3339](https://www.ietf.org/rfc/rfc3339.txt). For example, `2022-03-29T01:02:03Z` or `2022-03-29T01:02:03+02:30`.
- Partial RFC3339. Examples: `2022`, `2022-03`, `2022-03-29`, `2022-03-29T01`, `2022-03-29T01:02`, `2022-03-29T01:02:03`.
The partial RFC3339 time is in UTC timezone by default. It is possible to specify timezone there by adding `+hh:mm` or `-hh:mm` suffix to partial time.
For example, `2022-03-01+06:30` is `2022-03-01` at `06:30` timezone.
- Relative duration comparing to the current time. For example, `1h5m`, `-1h5m` or `now-1h5m` means `one hour and five minutes ago`, while `now` means `now`.
## Graphite API usage
@@ -791,10 +847,10 @@ VictoriaMetrics supports `__graphite__` pseudo-label for filtering time series w
### Graphite Render API usage
[VictoriaMetrics Enterprise](https://docs.victoriametrics.com/enterprise.html) supports [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) subset
VictoriaMetrics supports [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) subset
at `/render` endpoint, which is used by [Graphite datasource in Grafana](https://grafana.com/docs/grafana/latest/datasources/graphite/).
When configuring Graphite datasource in Grafana, the `Storage-Step` http request header must be set to a step between Graphite data points stored in VictoriaMetrics. For example, `Storage-Step: 10s` would mean 10 seconds distance between Graphite datapoints stored in VictoriaMetrics.
Enterprise binaries can be downloaded and evaluated for free from [the releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
When configuring Graphite datasource in Grafana, the `Storage-Step` http request header must be set to a step between Graphite data points
stored in VictoriaMetrics. For example, `Storage-Step: 10s` would mean 10 seconds distance between Graphite datapoints stored in VictoriaMetrics.
### Graphite Metrics API usage
@@ -806,7 +862,7 @@ VictoriaMetrics supports the following handlers from [Graphite Metrics API](http
VictoriaMetrics accepts the following additional query args at `/metrics/find` and `/metrics/expand`:
* `label` - for selecting arbitrary label values. By default `label=__name__`, i.e. metric names are selected.
* `label` - for selecting arbitrary label values. By default, `label=__name__`, i.e. metric names are selected.
* `delimiter` - for using different delimiters in metric name hierarchy. For example, `/metrics/find?delimiter=_&query=node_*` would return all the metric name prefixes
that start with `node_`. By default `delimiter=.`.
@@ -932,7 +988,7 @@ Note that background merges may never occur for data from previous months, so st
In this case [forced merge](#forced-merge) may help freeing up storage space.
It is recommended verifying which metrics will be deleted with the call to `http://<victoria-metrics-addr>:8428/api/v1/series?match[]=<timeseries_selector_for_delete>`
before actually deleting the metrics. By default this query will only scan series in the past 5 minutes, so you may need to
before actually deleting the metrics. By default, this query will only scan series in the past 5 minutes, so you may need to
adjust `start` and `end` to a suitable range to achieve match hits.
The `/api/v1/admin/tsdb/delete_series` handler may be protected with `authKey` if `-deleteAuthKey` command-line flag is set.
@@ -997,7 +1053,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
Optional `max_rows_per_line` arg may be added to the request for limiting the maximum number of rows exported per each JSON line.
@@ -1046,7 +1102,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export/csv -d 'format=<format>' -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
The exported CSV data can be imported to VictoriaMetrics via [/api/v1/import/csv](#how-to-import-csv-data).
@@ -1074,7 +1130,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/api/v1/export/native -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
The exported data can be imported to VictoriaMetrics via [/api/v1/import/native](#how-to-import-data-in-native-format).
@@ -1298,7 +1354,8 @@ Example contents for `-relabelConfig` file:
VictoriaMetrics provides additional relabeling features such as Graphite-style relabeling.
See [these docs](https://docs.victoriametrics.com/vmagent.html#relabeling) for more details.
The relabeling can be debugged at `http://victoriametrics:8428/metric-relabel-debug` page.
The relabeling can be debugged at `http://victoriametrics:8428/metric-relabel-debug` page
or at our [public playground](https://play.victoriametrics.com/select/accounting/1/6a716b0f-38bc-4856-90ce-448fd713e3fe/prometheus/graph/#/relabeling).
See [these docs](https://docs.victoriametrics.com/vmagent.html#relabel-debug) for more details.
@@ -1313,7 +1370,7 @@ See [allowed formats](#timestamp-formats) for these args.
For example:
```console
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=1654543486' -d 'end=1654543486'
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48+00:00' -d 'end=2022-06-06T19:29:07+00:00'
curl http://<victoriametrics-addr>:8428/federate -d 'match[]=<timeseries_selector_for_export>' -d 'start=2022-06-06T19:25:48' -d 'end=2022-06-06T19:29:07'
```
By default, the last point on the interval `[now - max_lookback ... now]` is scraped for each time series. The default value for `max_lookback` is `5m` (5 minutes), but it can be overridden with `max_lookback` query arg.
For instance, `/federate?match[]=up&max_lookback=1h` would return last points on the `[now - 1h ... now]` interval. This may be useful for time series federation
@@ -1345,7 +1402,7 @@ See also [resource usage limits docs](#resource-usage-limits).
## Resource usage limits
By default VictoriaMetrics is tuned for an optimal resource usage under typical workloads. Some workloads may need fine-grained resource usage limits. In these cases the following command-line flags may be useful:
By default, VictoriaMetrics is tuned for an optimal resource usage under typical workloads. Some workloads may need fine-grained resource usage limits. In these cases the following command-line flags may be useful:
- `-memory.allowedPercent` and `-memory.allowedBytes` limit the amounts of memory, which may be used for various internal caches at VictoriaMetrics. Note that VictoriaMetrics may use more memory, since these flags don't limit additional memory, which may be needed on a per-query basis.
- `-search.maxMemoryPerQuery` limits the amounts of memory, which can be used for processing a single query. Queries, which need more memory, are rejected. Heavy queries, which select big number of time series, may exceed the per-query memory limit by a small percent. The total memory limit for concurrently executed queries can be estimated as `-search.maxMemoryPerQuery` multiplied by `-search.maxConcurrentRequests`.
@@ -1356,6 +1413,7 @@ By default VictoriaMetrics is tuned for an optimal resource usage under typical
- `-search.maxSamplesPerQuery` limits the number of raw samples a single query can process. This allows limiting CPU usage for heavy queries.
- `-search.maxPointsPerTimeseries` limits the number of calculated points, which can be returned per each matching time series from [range query](https://docs.victoriametrics.com/keyConcepts.html#range-query).
- `-search.maxPointsSubqueryPerTimeseries` limits the number of calculated points, which can be generated per each matching time series during [subquery](https://docs.victoriametrics.com/MetricsQL.html#subqueries) evaluation.
- `-search.maxSeriesPerAggrFunc` limits the number of time series, which can be generated by [MetricsQL aggregate functions](https://docs.victoriametrics.com/MetricsQL.html#aggregate-functions) in a single query.
- `-search.maxSeries` limits the number of time series, which may be returned from [/api/v1/series](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers). This endpoint is used mostly by Grafana for auto-completion of metric names, label names and label values. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxSeries` to quite low value in order limit CPU and memory usage.
- `-search.maxTagKeys` limits the number of items, which may be returned from [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names). This endpoint is used mostly by Grafana for auto-completion of label names. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxTagKeys` to quite low value in order to limit CPU and memory usage.
- `-search.maxTagValues` limits the number of items, which may be returned from [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values). This endpoint is used mostly by Grafana for auto-completion of label values. Queries to this endpoint may take big amounts of CPU time and memory when the database contains big number of unique time series because of [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). In this case it might be useful to set the `-search.maxTagValues` to quite low value in order to limit CPU and memory usage.
@@ -1433,12 +1491,14 @@ can be configured with the `-inmemoryDataFlushInterval` command-line flag (note
In-memory parts are persisted to disk into `part` directories under the `<-storageDataPath>/data/small/YYYY_MM/` folder,
where `YYYY_MM` is the month partition for the stored data. For example, `2022_11` is the partition for `parts`
with [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) from `November 2022`.
Each partition directory contains `parts.json` file with the actual list of parts in the partition.
The `part` directory has the following name pattern: `rowsCount_blocksCount_minTimestamp_maxTimestamp`, where:
Every `part` directory contains `metadata.json` file with the following fields:
- `rowsCount` - the number of [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) stored in the part
- `blocksCount` - the number of blocks stored in the part (see details about blocks below)
- `minTimestamp` and `maxTimestamp` - minimum and maximum timestamps across raw samples stored in the part
- `RowsCount` - the number of [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples) stored in the part
- `BlocksCount` - the number of blocks stored in the part (see details about blocks below)
- `MinTimestamp` and `MaxTimestamp` - minimum and maximum timestamps across raw samples stored in the part
- `MinDedupInterval` - the [deduplication interval](#deduplication) applied to the given part.
Each `part` consists of `blocks` sorted by internal time series id (aka `TSID`).
Each `block` contains up to 8K [raw samples](https://docs.victoriametrics.com/keyConcepts.html#raw-samples),
@@ -1460,11 +1520,10 @@ for fast block lookups, which belong to the given `TSID` and cover the given tim
and [freeing up disk space for the deleted time series](#how-to-delete-time-series) are performed during the merge
Newly added `parts` either successfully appear in the storage or fail to appear.
The newly added `parts` are being created in a temporary directory under `<-storageDataPath>/data/{small,big}/YYYY_MM/tmp` folder.
When the newly added `part` is fully written and [fsynced](https://man7.org/linux/man-pages/man2/fsync.2.html)
to a temporary directory, then it is atomically moved to the storage directory.
Thanks to this alogrithm, storage never contains partially created parts, even if hardware power off
occurrs in the middle of writing the `part` to disk - such incompletely written `parts`
The newly added `part` is atomically registered in the `parts.json` file under the corresponding partition
after it is fully written and [fsynced](https://man7.org/linux/man-pages/man2/fsync.2.html) to the storage.
Thanks to this algorithm, storage never contains partially created parts, even if hardware power off
occurs in the middle of writing the `part` to disk - such incompletely written `parts`
are automatically deleted on the next VictoriaMetrics start.
The same applies to merge process — `parts` are either fully merged into a new `part` or fail to merge,
@@ -1491,15 +1550,14 @@ Retention is configured with the `-retentionPeriod` command-line flag, which tak
Data is split in per-month partitions inside `<-storageDataPath>/data/{small,big}` folders.
Data partitions outside the configured retention are deleted on the first day of the new month.
Each partition consists of one or more data parts with the following name pattern `rowsCount_blocksCount_minTimestamp_maxTimestamp`.
Data parts outside of the configured retention are eventually deleted during
Each partition consists of one or more data parts. Data parts outside the configured retention are eventually deleted during
[background merge](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
The maximum disk space usage for a given `-retentionPeriod` is going to be (`-retentionPeriod` + 1) months.
For example, if `-retentionPeriod` is set to 1, data for January is deleted on March 1st.
Please note, the time range covered by data part is not limited by retention period unit. Hence, data part may contain data
for multiple days and will be deleted only when fully outside of the configured retention.
for multiple days and will be deleted only when fully outside the configured retention.
It is safe to extend `-retentionPeriod` on existing data. If `-retentionPeriod` is set to a lower
value than before, then data outside the configured period will be eventually deleted.
@@ -1543,7 +1601,7 @@ For example, the following config sets 3 days retention for time series with `te
Important notes:
- The data outside of the configured retention isn't deleted instantly - it is deleted eventually during [background merges](https://docs.victoriametrics.com/#storage).
- The data outside the configured retention isn't deleted instantly - it is deleted eventually during [background merges](https://docs.victoriametrics.com/#storage).
- The `-retentionFilter` doesn't remove old data from `indexdb` (aka inverted index) until the configured [-retentionPeriod](#retention).
So the `indexdb` size can grow big under [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate)
even for small retentions configured via `-retentionFilter`.
@@ -1885,7 +1943,14 @@ are added to all the metrics before sending them to the remote storage:
## Cache removal
VictoriaMetrics uses various internal caches. These caches are stored to `<-storageDataPath>/cache` directory during graceful shutdown (e.g. when VictoriaMetrics is stopped by sending `SIGINT` signal). The caches are read on the next VictoriaMetrics startup. Sometimes it is needed to remove such caches on the next startup. This can be performed by placing `reset_cache_on_startup` file inside the `<-storageDataPath>/cache` directory before the restart of VictoriaMetrics. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447) for details.
VictoriaMetrics uses various internal caches. These caches are stored to `<-storageDataPath>/cache` directory during graceful shutdown
(e.g. when VictoriaMetrics is stopped by sending `SIGINT` signal). The caches are read on the next VictoriaMetrics startup.
Sometimes it is needed to remove such caches on the next startup. This can be done in the following ways:
- By manually removing the `<-storageDataPath>/cache` directory when VictoriaMetrics is stopped.
- By placing `reset_cache_on_startup` file inside the `<-storageDataPath>/cache` directory before the restart of VictoriaMetrics.
In this case VictoriaMetrics will automatically remove all the caches on the next start.
See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447) for details.
## Cache tuning
@@ -1990,9 +2055,9 @@ Enterprise binaries can be downloaded and evaluated for free from [the releases
A single-node VictoriaMetrics is capable of proxying requests to [vmalert](https://docs.victoriametrics.com/vmalert.html)
when `-vmalert.proxyURL` flag is set. Use this feature for the following cases:
* for proxying requests from [Grafana Alerting UI](https://grafana.com/docs/grafana/latest/alerting/);
* for accessing vmalert's UI through single-node VictoriaMetrics Web interface.
* for accessing vmalerts UI through single-node VictoriaMetrics Web interface.
For accessing vmalert's UI through single-node VictoriaMetrics configure `-vmalert.proxyURL` flag and visit
For accessing vmalerts UI through single-node VictoriaMetrics configure `-vmalert.proxyURL` flag and visit
`http://<victoriametrics-addr>:8428/vmalert/` link.
## Benchmarks
@@ -2037,17 +2102,10 @@ It is safe sharing the collected profiles from security point of view, since the
## Integrations
* [Helm charts for single-node and cluster versions of VictoriaMetrics](https://github.com/VictoriaMetrics/helm-charts).
* [Kubernetes operator for VictoriaMetrics](https://github.com/VictoriaMetrics/operator).
* [netdata](https://github.com/netdata/netdata) can push data into VictoriaMetrics via `Prometheus remote_write API`.
See [these docs](https://github.com/netdata/netdata#integrations).
* [go-graphite/carbonapi](https://github.com/go-graphite/carbonapi) can use VictoriaMetrics as time series backend.
See [this example](https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/carbonapi.example.victoriametrics.yaml).
* [Ansible role for installing cluster VictoriaMetrics (by VictoriaMetrics)](https://github.com/VictoriaMetrics/ansible-playbooks).
* [Ansible role for installing cluster VictoriaMetrics (by community)](https://github.com/Slapper/ansible-victoriametrics-cluster-role).
* [Ansible role for installing single-node VictoriaMetrics (by community)](https://github.com/dreamteam-gg/ansible-victoriametrics-role).
* [Snap package for VictoriaMetrics](https://snapcraft.io/victoriametrics).
* [netdata](https://github.com/netdata/netdata) can push data into VictoriaMetrics via `Prometheus remote_write API`.
See [these docs](https://github.com/netdata/netdata#integrations).
* [vmalert-cli](https://github.com/aorfanos/vmalert-cli) - a CLI application for managing [vmalert](https://docs.victoriametrics.com/vmalert.html).
## Third-party contributions
@@ -2130,7 +2188,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
```
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
Deprecated: this flag does nothing. Please use -smallMergeConcurrency for controlling the concurrency of background merges. See https://docs.victoriametrics.com/#storage
-cacheExpireDuration duration
Items are removed from in-memory caches after they aren't accessed for this duration. Lower values may reduce memory usage at the cost of higher CPU usage. See also -prevCacheRemovalPercent (default 30m0s)
-configAuthKey string
@@ -2147,16 +2205,16 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
Whether to deny queries outside the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
-denyQueryTracing
Whether to disable the ability to trace queries. See https://docs.victoriametrics.com/#query-tracing
-downsampling.period array
Comma-separated downsampling periods in the format 'offset:period'. For example, '30d:10m' instructs to leave a single sample per 10 minutes for samples older than 30 days. See https://docs.victoriametrics.com/#downsampling for details. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
Supports an array of values separated by comma or specified via multiple flags.
-dryRun
Whether to check only -promscrape.config and then exit. Unknown config entries aren't allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse=false command-line flag
Whether to check config files without running VictoriaMetrics. The following config files are checked: -promscrape.config, -relabelConfig and -streamAggr.config. Unknown config entries aren't allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse=false command-line flag
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
@@ -2172,7 +2230,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-forceMergeAuthKey string
authKey, which must be passed in query string to /internal/force_merge pages
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol
@@ -2182,7 +2240,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -2219,15 +2277,19 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metric name if InfluxDB line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-inmemoryDataFlushInterval duration
The interval for guaranteed saving of in-memory data to disk. The saved data survives unclean shutdown such as OOM crash, hardware reset, SIGKILL, etc. Bigger intervals may help increasing lifetime of flash storage with limited write cycles (e.g. Raspberry PI). Smaller intervals increase disk IO load. Minimum supported value is 1s (default 5s)
-insert.maxQueueDuration duration
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringCacheExpireDuration duration
The expiry duration for caches for interned strings. See https://en.wikipedia.org/wiki/String_interning . See also -internStringMaxLen and -internStringDisableCache (default 6m0s)
-internStringDisableCache
Whether to disable caches for interned strings. This may reduce memory usage at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringCacheExpireDuration and -internStringMaxLen
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringDisableCache and -internStringCacheExpireDuration (default 500)
-logNewSeries
Whether to log new series. This option is for debug purposes only. It can lead to performance issues when big number of new series are ingested into VictoriaMetrics
-loggerDisableTimestamps
@@ -2263,11 +2325,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
TCP and UDP address to listen for OpenTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
-opentsdbListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbTrimTimestamp duration
@@ -2286,9 +2348,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.azureSDCheckInterval duration
Interval for checking for changes in Azure. This works only if azure_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#azure_sd_configs for details (default 1m0s)
-promscrape.cluster.memberNum string
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster. Can be specified as pod name of Kubernetes StatefulSet - pod-name-Num, where Num is a numeric part of pod name (default "0")
The number of number in the cluster of scrapers. It must be a unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster. Can be specified as pod name of Kubernetes StatefulSet - pod-name-Num, where Num is a numeric part of pod name (default "0")
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
The number of members in a cluster of scrapers. Each member must have a unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default, cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.name string
Optional name of the cluster. If multiple vmagent clusters scrape the same targets, then each cluster must have unique name in order to properly de-duplicate samples received from these clusters. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679
-promscrape.cluster.replicationFactor int
@@ -2300,7 +2362,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.config.strictParse
Whether to deny unsupported fields in -promscrape.config . Set to false in order to silently skip unsupported fields (default true)
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
Interval for checking for changes in '-promscrape.config' file. By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consul.waitTime duration
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
@@ -2308,9 +2370,9 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.digitaloceanSDCheckInterval duration
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#digitalocean_sd_configs for details (default 1m0s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine-grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine-grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
@@ -2337,6 +2399,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
How frequently to reload the full state from Kubernetes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs for details (default 30s)
-promscrape.kumaSDCheckInterval duration
Interval for checking for changes in kuma service discovery. This works only if kuma_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxResponseHeadersSize size
@@ -2359,7 +2423,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-promscrape.seriesLimitPerTarget int
Optional limit on the number of unique time series a single scrape target can expose. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more info
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is possible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine-grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
@@ -2374,7 +2438,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-pushmetrics.interval duration
Interval for pushing metrics to -pushmetrics.url (default 10s)
-pushmetrics.url array
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default metrics exposed at /metrics page aren't pushed to any remote storage
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags.
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. The path can point either to local file or to http url. See https://docs.victoriametrics.com/#relabeling for details. The config is reloaded on SIGHUP signal
@@ -2393,13 +2457,16 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.graphiteMaxPointsPerSeries int
The maximum number of points per series Graphite render API can return. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default 1000000)
The maximum number of points per series Graphite render API can return (default 1000000)
-search.graphiteStorageStep duration
The interval between datapoints stored in the database. It is used at Graphite Render API handler for normalizing the interval between datapoints in case it isn't normalized. It can be overridden by sending 'storage_step' query arg to /render API or by sending the desired interval via 'Storage-Step' http header during querying /render API. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default 10s)
The interval between datapoints stored in the database. It is used at Graphite Render API handler for normalizing the interval between datapoints in case it isn't normalized. It can be overridden by sending 'storage_step' query arg to /render API or by sending the desired interval via 'Storage-Step' http header during querying /render API (default 10s)
-search.latencyOffset duration
The time when data points become visible in query results after the collection. It can be overridden on per-query basis via latency_offset arg. Too small value can result in incomplete last points for query results (default 30s)
-search.logQueryMemoryUsage size
Log queries, which require more memory than specified by this flag. This may help detecting and optimizing heavy queries. Query logging is disabled by default. See also -search.logSlowQueryDuration and -search.maxMemoryPerQuery
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
Log queries with execution time exceeding this value. Zero disables slow query logging. See also -search.logQueryMemoryUsage (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. See also -search.maxQueueDuration and -search.maxMemoryPerQuery (default 8)
-search.maxExportDuration duration
@@ -2409,11 +2476,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-search.maxFederateSeries int
The maximum number of time series, which can be returned from /federate. This option allows limiting memory usage (default 1000000)
-search.maxGraphiteSeries int
The maximum number of time series, which can be scanned during queries to Graphite Render API. See https://docs.victoriametrics.com/#graphite-render-api-usage . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default 300000)
The maximum number of time series, which can be scanned during queries to Graphite Render API. See https://docs.victoriametrics.com/#graphite-render-api-usage (default 300000)
-search.maxLookback duration
Synonym to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
Synonym to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaning due to historical reasons
-search.maxMemoryPerQuery size
The maximum amounts of memory a single query may consume. Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests
The maximum amounts of memory a single query may consume. Queries requiring more memory are rejected. The total memory limit for concurrently executed queries can be estimated as -search.maxMemoryPerQuery multiplied by -search.maxConcurrentRequests . See also -search.logQueryMemoryUsage
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB (default 0)
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as VMUI or Grafana. There is no sense in setting this limit to values bigger than the horizontal resolution of the graph (default 30000)
@@ -2432,8 +2499,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The maximum number of raw samples a single query can scan per each time series. This option allows limiting memory usage (default 30000000)
-search.maxSeries int
The maximum number of time series, which can be returned from /api/v1/series. This option allows limiting memory usage (default 30000)
-search.maxSeriesPerAggrFunc int
The maximum number of time series an aggregate MetricsQL function can generate (default 1000000)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.setLookbackToStep' flag
The maximum interval for staleness calculations. By default, it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.setLookbackToStep' flag
-search.maxStatusRequestDuration duration
The maximum duration for /api/v1/status/* requests (default 5m0s)
-search.maxStepForPointsAdjustment duration
@@ -2469,9 +2538,11 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-selfScrapeJob string
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
The maximum number of workers for background merges. See https://docs.victoriametrics.com/#storage . It isn't recommended tuning this flag in general case, since this may lead to uncontrolled increase in the number of parts and increased CPU usage during queries
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-snapshotCreateTimeout duration
The timeout for creating new snapshot. If set, make sure that timeout is lower than backup period
-snapshotsMaxAge value
Automatically delete snapshots older than -snapshotsMaxAge if it is set to non-zero duration. Make sure that backup process has enough time to finish the backup before the corresponding snapshot is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 0)
@@ -2503,7 +2574,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-streamAggr.dedupInterval duration
Input samples are de-duplicated with this interval before being aggregated. Only the last sample per each time series per each interval is aggregated if the interval is greater than zero
-streamAggr.keepInput
Whether to keep input samples after the aggregation with -streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is stored. See https://docs.victoriametrics.com/stream-aggregation.html
Whether to keep input samples after the aggregation with -streamAggr.config. By default, the input is dropped after the aggregation, so only the aggregate data is stored. See https://docs.victoriametrics.com/stream-aggregation.html
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -4,10 +4,10 @@
| Version | Supported |
|---------|--------------------|
| 1.81.x | :white_check_mark: |
| 1.80.x | :x: |
| 1.79.x | :white_check_mark: |
| < 1.78 | :x: |
| [latest release](https://docs.victoriametrics.com/CHANGELOG.html) | :white_check_mark: |
| v1.87.x LTS release | :white_check_mark: |
| v1.79.x LTS release | :white_check_mark: |
| other releases | :x: |
## Reporting a Vulnerability

View File

@@ -39,6 +39,9 @@ victoria-metrics-freebsd-amd64-prod:
victoria-metrics-openbsd-amd64-prod:
APP_NAME=victoria-metrics $(MAKE) app-via-docker-openbsd-amd64
victoria-metrics-windows-amd64-prod:
APP_NAME=victoria-metrics $(MAKE) app-via-docker-windows-amd64
package-victoria-metrics:
APP_NAME=victoria-metrics $(MAKE) package-via-docker
@@ -82,6 +85,9 @@ victoria-metrics-linux-arm64:
victoria-metrics-linux-ppc64le:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
victoria-metrics-linux-s390x:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
victoria-metrics-linux-386:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch
@@ -97,6 +103,9 @@ victoria-metrics-freebsd-amd64:
victoria-metrics-openbsd-amd64:
APP_NAME=victoria-metrics CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 $(MAKE) app-local-goos-goarch
victoria-metrics-windows-amd64:
GOARCH=amd64 APP_NAME=victoria-metrics $(MAKE) app-local-windows-goarch
victoria-metrics-pure:
APP_NAME=victoria-metrics $(MAKE) app-local-pure

View File

@@ -8,6 +8,8 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert"
vminsertcommon "github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
vminsertrelabel "github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
@@ -26,11 +28,13 @@ import (
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the last sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication and https://docs.victoriametrics.com/#downsampling")
dryRun = flag.Bool("dryRun", false, "Whether to check only -promscrape.config and then exit. "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse=false command-line flag")
dryRun = flag.Bool("dryRun", false, "Whether to check config files without running VictoriaMetrics. The following config files are checked: "+
"-promscrape.config, -relabelConfig and -streamAggr.config. Unknown config entries aren't allowed in -promscrape.config by default. "+
"This can be changed with -promscrape.config.strictParse=false command-line flag")
inmemoryDataFlushInterval = flag.Duration("inmemoryDataFlushInterval", 5*time.Second, "The interval for guaranteed saving of in-memory data to disk. "+
"The saved data survives unclean shutdown such as OOM crash, hardware reset, SIGKILL, etc. "+
"Bigger intervals may help increasing lifetime of flash storage with limited write cycles (e.g. Raspberry PI). "+
@@ -53,7 +57,13 @@ func main() {
if err := promscrape.CheckConfig(); err != nil {
logger.Fatalf("error when checking -promscrape.config: %s", err)
}
logger.Infof("-promscrape.config is ok; exitting with 0 status code")
if err := vminsertrelabel.CheckRelabelConfig(); err != nil {
logger.Fatalf("error when checking -relabelConfig: %s", err)
}
if err := vminsertcommon.CheckStreamAggrConfig(); err != nil {
logger.Fatalf("error when checking -streamAggr.config: %s", err)
}
logger.Infof("-promscrape.config is ok; exiting with 0 status code")
return
}
@@ -92,7 +102,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != "GET" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -85,6 +85,9 @@ vmagent-linux-arm64:
vmagent-linux-ppc64le:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmagent-linux-s390x:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmagent-linux-386:
APP_NAME=vmagent CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -13,7 +13,7 @@ See [Quick Start](#quick-start) for details.
While VictoriaMetrics provides an efficient solution to store and observe metrics, our users needed something fast
and RAM friendly to scrape metrics from Prometheus-compatible exporters into VictoriaMetrics.
Also, we found that our user's infrastructure are like snowflakes in that no two are alike. Therefore we decided to add more flexibility
Also, we found that our user's infrastructure are like snowflakes in that no two are alike. Therefore, we decided to add more flexibility
to `vmagent` such as the ability to [accept metrics via popular push protocols](#how-to-push-data-to-vmagent)
additionally to [discovering Prometheus-compatible targets and scraping metrics from them](#how-to-collect-metrics-in-prometheus-format).
@@ -25,7 +25,9 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
* Can accept data via all the ingestion protocols supported by VictoriaMetrics - see [these docs](#how-to-push-data-to-vmagent).
* Can aggregate incoming samples by time and by labels before sending them to remote storage - see [these docs](https://docs.victoriametrics.com/stream-aggregation.html).
* Can replicate collected metrics simultaneously to multiple remote storage systems - see [these docs](#replication-and-high-availability).
* Can replicate collected metrics simultaneously to multiple Prometheus-compatible remote storage systems - see [these docs](#replication-and-high-availability).
* Can save egress network bandwidth usage costs when [VictoriaMetrics remote write protocol](#victoriametrics-remote-write-protocol)
is used for sending the data to VictoriaMetrics.
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
@@ -45,38 +47,42 @@ additionally to [discovering Prometheus-compatible targets and scraping metrics
Please download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) (
`vmagent` is also available in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags)),
unpack it and pass the following flags to the `vmagent` binary in order to start scraping Prometheus-compatible targets:
unpack it and pass the following flags to the `vmagent` binary in order to start scraping Prometheus-compatible targets
and sending the data to the Prometheus-compatible remote storage:
* `-promscrape.config` with the path to Prometheus config file (usually located at `/etc/prometheus/prometheus.yml`).
* `-promscrape.config` with the path to [Prometheus config file](https://docs.victoriametrics.com/sd_configs.html) (usually located at `/etc/prometheus/prometheus.yml`).
The path can point either to local file or to http url. `vmagent` doesn't support some sections of Prometheus config file,
so you may need either to delete these sections or to run `vmagent` with `-promscrape.config.strictParse=false` command-line flag.
In this case `vmagent` ignores unsupported sections. See [the list of unsupported sections](#unsupported-prometheus-config-sections).
* `-remoteWrite.url` with the remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified
multiple times to replicate data concurrently to an arbitrary number of remote storage systems. See [various use cases](#use-cases).
* `-remoteWrite.url` with Prometheus-compatible remote storage endpoint such as VictoriaMetrics.
Example command line:
Example command for writing the data received via [supported push-based protocols](#how-to-push-data-to-vmagent)
to [single-node VictoriaMetrics](https://docs.victoriametrics.com/) located at `victoria-metrics-host:8428`:
```console
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) if you need writing
the data to [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
Example command for scraping Prometheus targets and writing the data to single-node VictoriaMetrics:
```console
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
Example of scrape configuration for `-promscrape.config` argument you can find [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/prometheus.yml)
See [how to scrape Prometheus-compatible targets](#how-to-collect-metrics-in-prometheus-format) for more details.
If you use single-node VictoriaMetrics, then you can discover and scrape Prometheus-compatible targets directly from VictoriaMetrics
without the need to use `vmagent` - see [these docs](https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter).
If you don't need to scrape Prometheus-compatible targets, then the `-promscrape.config` option isn't needed.
For example, the following command is sufficient for accepting data via [supported push-based protocols](#how-to-push-data-to-vmagent)
and sending it to the provided `-remoteWrite.url`:
```console
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
`vmagent` can save network bandwidth usage costs under high load when [VictoriaMetrics remote write protocol is used](#victoriametrics-remote-write-protocol).
See [troubleshooting docs](#troubleshooting) if you encounter common issues with `vmagent`.
See [various use cases](#use-cases) for vmagent.
Pass `-help` to `vmagent` in order to see [the full list of supported command-line flags with their descriptions](#advanced-usage).
## How to push data to vmagent
@@ -98,7 +104,7 @@ additionally to pull-based Prometheus-compatible targets' scraping:
`vmagent` should be restarted in order to update config options set via command-line args.
`vmagent` supports multiple approaches for reloading configs from updated config files such as
`-promscrape.config`, `-remoteWrite.relabelConfig` and `-remoteWrite.urlRelabelConfig`:
`-promscrape.config`, `-remoteWrite.relabelConfig`, `-remoteWrite.urlRelabelConfig` and `-remoteWrite.streamAggr.config`:
* Sending `SIGHUP` signal to `vmagent` process:
@@ -120,7 +126,8 @@ data to the remote storage. It re-tries sending the data to remote storage until
The maximum on-disk size for the buffered metrics can be limited with `-remoteWrite.maxDiskUsagePerURL`.
`vmagent` works on various architectures from the IoT world - 32-bit arm, 64-bit arm, ppc64, 386, amd64.
See [the corresponding Makefile rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmagent/Makefile) for details.
The `vmagent` can save network bandwidth usage costs by using [VictoriaMetrics remote write protocol](#victoriametrics-remote-write-protocol).
### Drop-in replacement for Prometheus
@@ -144,9 +151,14 @@ to other remote storage systems, which support Prometheus `remote_write` protoco
`vmagent` replicates the collected metrics among multiple remote storage instances configured via `-remoteWrite.url` args.
If a single remote storage instance temporarily is out of service, then the collected data remains available in another remote storage instance.
`vmagent` buffers the collected data in files at `-remoteWrite.tmpDataPath` until the remote storage becomes available again
`vmagent` buffers the collected data in files at `-remoteWrite.tmpDataPath` until the remote storage becomes available again,
and then it sends the buffered data to the remote storage in order to prevent data gaps.
[VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) already supports replication,
so there is no need in specifying multiple `-remoteWrite.url` flags when writing data to the same cluster.
See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety).
### Relabeling and filtering
`vmagent` can add, remove or update labels on the collected data before sending it to the remote storage. Additionally,
@@ -158,7 +170,7 @@ Please see [these docs](#relabeling) for details.
`vmagent` supports splitting the collected data between multiple destinations with the help of `-remoteWrite.urlRelabelConfig`,
which is applied independently for each configured `-remoteWrite.url` destination. For example, it is possible to replicate or split
data among long-term remote storage, short-term remote storage and a real-time analytical system [built on top of Kafka](https://github.com/Telefonica/prometheus-kafka-adapter).
Note that each destination can receive it's own subset of the collected data due to per-destination relabeling via `-remoteWrite.urlRelabelConfig`.
Note that each destination can receive its own subset of the collected data due to per-destination relabeling via `-remoteWrite.urlRelabelConfig`.
### Prometheus remote_write proxy
@@ -170,11 +182,37 @@ Also, Basic Auth can be enabled for the incoming `remote_write` requests with `-
### remote_write for clustered version
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets,
writes are always performed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html),
writes are always performed in Prometheus remote_write protocol. Therefore, for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html),
the `-remoteWrite.url` command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<accountID>/prometheus/api/v1/write`
according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
There is also support for multitenant writes. See [these docs](#multitenancy).
## VictoriaMetrics remote write protocol
`vmagent` supports sending data to the configured `-remoteWrite.url` either via Prometheus remote write protocol
or via VictoriaMetrics remote write protocol.
VictoriaMetrics remote write protocol provides the following benefits comparing to Prometheus remote write protocol:
- Reduced network bandwidth usage by 2x-5x. This allows saving network bandwidth usage costs when `vmagent` and
the configured remote storage systems are located in different datacenters, availability zones or regions.
- Reduced disk read/write IO and disk space usage at `vmagent` when the remote storage is temporarily unavailable.
In this case `vmagent` buffers the incoming data to disk using the VictoriaMetrics remote write format.
This reduces disk read/write IO and disk space usage by 2x-5x comparing to Prometheus remote write format.
`vmagent` automatically switches to VictoriaMetrics remote write protocol when it sends data to VictoriaMetrics components such as other `vmagent` instances,
[single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html)
or `vminsert` at [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
It is possible to force switch to VictoriaMetrics remote write protocol by specifying `-remoteWrite.forceVMProto`
command-line flag for the corresponding `-remoteWrite.url`.
It is possible to tune the compression level for VictoriaMetrics remote write protocol with `-remoteWrite.vmProtoCompressLevel` command-line flag.
Bigger values reduce network usage at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of higher network usage.
`vmagent` automatically switches to Prometheus remote write protocol when it sends data to old versions of VictoriaMetrics components
or to other Prometheus-compatible remote storage systems. It is possible to force switch to Prometheus remote write protocol
by specifying `-remoteWrite.forcePromProto` command-line flag for the corresponding `-remoteWrite.url`.
## Multitenancy
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`.
@@ -198,7 +236,7 @@ This allows using a single `vmagent` instance in front of multiple VictoriaMetri
If `-remoteWrite.multitenantURL` command-line flag is set and `vmagent` is configured to scrape Prometheus-compatible targets
(e.g. if `-promscrape.config` command-line flag is set) then `vmagent` reads tenantID from `__tenant_id__` label
for the discovered targets and routes all the metrics from this target to the given `__tenant_id__`,
e.g. to the url `<-remoteWrite.multitnenatURL>/insert/<__tenant_id__>/prometheus/api/v1/write`.
e.g. to the url `<-remoteWrite.multitenantURL>/insert/<__tenant_id__>/prometheus/api/v1/write`.
For example, the following relabeling rule instructs sending metrics to tenantID defined in the `prometheus.io/tenant` annotation of Kubernetes pod deployment:
@@ -244,14 +282,14 @@ scrape_configs:
- "My-Auth: TopSecret"
```
* `disable_compression: true` for disabling response compression on a per-job basis. By default `vmagent` requests compressed responses
* `disable_compression: true` for disabling response compression on a per-job basis. By default, `vmagent` requests compressed responses
from scrape targets for saving network bandwidth.
* `disable_keepalive: true` for disabling [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection)
on a per-job basis. By default `vmagent` uses keep-alive connections to scrape targets for reducing overhead on connection re-establishing.
on a per-job basis. By default, `vmagent` uses keep-alive connections to scrape targets for reducing overhead on connection re-establishing.
* `series_limit: N` for limiting the number of unique time series a single scrape target can expose. See [these docs](#cardinality-limiter).
* `stream_parse: true` for scraping targets in a streaming manner. This may be useful when targets export big number of metrics. See [these docs](#stream-parsing-mode).
* `scrape_align_interval: duration` for aligning scrapes to the given interval instead of using random offset
in the range `[0 ... scrape_interval]` for scraping each target. The random offset helps spreading scrapes evenly in time.
in the range `[0 ... scrape_interval]` for scraping each target. The random offset helps to spread scrapes evenly in time.
* `scrape_offset: duration` for specifying the exact offset for scraping instead of using random offset in the range `[0 ... scrape_interval]`.
See [scrape_configs docs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs) for more details on all the supported options.
@@ -300,7 +338,7 @@ There is no need in specifying top-level `scrape_configs` section in these files
The list of supported service discovery types is available [here](#how-to-collect-metrics-in-prometheus-format).
Additionally `vmagent` doesn't support `refresh_interval` option at service discovery sections.
Additionally, `vmagent` doesn't support `refresh_interval` option at service discovery sections.
This option is substituted with `-promscrape.*CheckInterval` command-line options, which are specific per each service discovery type.
See [the full list of command-line flags for vmagent](#advanced-usage).
@@ -324,7 +362,7 @@ Extra labels can be added to metrics collected by `vmagent` via the following me
## Automatically generated metrics
`vmagent` automatically generates the following metrics per each scrape of every [Prometheus-compatible target](#how-to-collect-metrics-in-prometheus-format)
and attaches target-specific `instance` and `job` labels to these metrics:
and attaches `instance`, `job` and other target-specific labels to these metrics:
* `up` - this metric exposes `1` value on successful scrape and `0` value on unsuccessful scrape. This allows monitoring
failing scrapes with the following [MetricsQL query](https://docs.victoriametrics.com/MetricsQL.html):
@@ -422,7 +460,7 @@ VictoriaMetrics components support [Prometheus-compatible relabeling](https://pr
with [additional enhancements](#relabeling-enhancements). The relabeling can be defined in the following places processed by `vmagent`:
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file.
This relabeling is used for modifying labels in discovered targets and for dropping unneded targets.
This relabeling is used for modifying labels in discovered targets and for dropping unneeded targets.
See [relabeling cookbook](https://docs.victoriametrics.com/relabeling.html) for details.
This relabeling can be debugged by clicking the `debug` link at the corresponding target on the `http://vmagent:8429/targets` page
@@ -509,7 +547,7 @@ The following articles contain useful information about Prometheus relabeling:
* VictoriaMetrics provides the following additional relabeling actions on top of standard actions
from the [Prometheus relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config):
* `replace_all` replaces all of the occurrences of `regex` in the values of `source_labels` with the `replacement`
* `replace_all` replaces all the occurrences of `regex` in the values of `source_labels` with the `replacement`
and stores the results in the `target_label`. For example, the following relabeling config replaces all the occurrences
of `-` char in metric names with `_` char (e.g. `foo-bar-baz` metric name is transformed into `foo_bar_baz`):
@@ -521,7 +559,7 @@ The following articles contain useful information about Prometheus relabeling:
replacement: "_"
```
* `labelmap_all` replaces all of the occurrences of `regex` in all the label names with the `replacement`.
* `labelmap_all` replaces all the occurrences of `regex` in all the label names with the `replacement`.
For example, the following relabeling config replaces all the occurrences of `-` char in all the label names
with `_` char (e.g. `foo-bar-baz` label name is transformed into `foo_bar_baz`):
@@ -607,18 +645,19 @@ provide the following tools for debugging target-level and metric-level relabeli
- Target-level debugging (e.g. `relabel_configs` section at [scrape_configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs))
can be performed by navigating to `http://vmagent:8429/targets` page (`http://victoriametrics:8428/targets` page for single-node VictoriaMetrics)
and clicking the `debug target relabeling` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual target relabeling rules applied to the discovered target labels.
The opened page shows step-by-step results for the actual target relabeling rules applied to the discovered target labels.
The page shows also the target URL generated after applying all the relabeling rules.
The `http://vmagent:8429/targets` page shows only active targets. If you need to understand why some target
is dropped during the relabeling, then navigate to `http://vmagent:8428/service-discovery` page
(`http://victoriametrics:8428/service-discovery` for single-node VictoriaMetrics), find the dropped target
and click the `debug` link there. The opened page will show step-by-step results for the actual relabeling rules,
and click the `debug` link there. The opened page shows step-by-step results for the actual relabeling rules,
which result to target drop.
- Metric-level debugging (e.g. `metric_relabel_configs` section at [scrape_configs](https://docs.victoriametrics.com/sd_configs.html#scrape_configs)
can be performed by navigating to `http://vmagent:8429/targets` page (`http://victoriametrics:8428/targets` page for single-node VictoriaMetrics)
and clicking the `debug metrics relabeling` link at the target, which must be debugged.
The opened page will show step-by-step results for the actual metric relabeling rules applied to the given target labels.
The opened page shows step-by-step results for the actual metric relabeling rules applied to the given target labels.
## Prometheus staleness markers
@@ -640,7 +679,7 @@ e.g. it sets `scrape_series_added` metric to zero. See [these docs](#automatical
## Stream parsing mode
By default `vmagent` reads the full response body from scrape target into memory, then parses it, applies [relabeling](#relabeling)
By default, `vmagent` reads the full response body from scrape target into memory, then parses it, applies [relabeling](#relabeling)
and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases
when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory
when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode.
@@ -668,8 +707,8 @@ scrape_configs:
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
- big-prometheus1
- big-prometheus2
honor_labels: true
metrics_path: /federate
params:
@@ -677,7 +716,7 @@ scrape_configs:
```
Note that `vmagent` in stream parsing mode stores up to `sample_limit` samples to the configured `-remoteStorage.url`
instead of droping all the samples read from the target, because the parsed data is sent to the remote storage
instead of dropping all the samples read from the target, because the parsed data is sent to the remote storage
as soon as it is parsed in stream parsing mode.
## Scraping big number of targets
@@ -697,7 +736,7 @@ spread scrape targets among a cluster of two `vmagent` instances:
The `-promscrape.cluster.memberNum` can be set to a StatefulSet pod name when `vmagent` runs in Kubernetes.
The pod name must end with a number in the range `0 ... promscrape.cluster.memberNum-1`. For example, `-promscrape.cluster.memberNum=vmagent-0`.
By default each scrape target is scraped only by a single `vmagent` instance in the cluster. If there is a need for replicating scrape targets among multiple `vmagent` instances,
By default, each scrape target is scraped only by a single `vmagent` instance in the cluster. If there is a need for replicating scrape targets among multiple `vmagent` instances,
then `-promscrape.cluster.replicationFactor` command-line flag must be set to the desired number of replicas. For example, the following commands
start a cluster of three `vmagent` instances, where each target is scraped by two `vmagent` instances:
@@ -763,7 +802,7 @@ scrape_configs:
## Cardinality limiter
By default `vmagent` doesn't limit the number of time series each scrape target can expose.
By default, `vmagent` doesn't limit the number of time series each scrape target can expose.
The limit can be enforced in the following places:
* Via `-promscrape.seriesLimitPerTarget` command-line option. This limit is applied individually
@@ -791,7 +830,7 @@ These metrics allow building the following alerting rules:
See also `sample_limit` option at [scrape_config section](https://docs.victoriametrics.com/sd_configs.html#scrape_configs).
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`.
By default, `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`.
The limit can be enforced by setting the following command-line flags:
* `-remoteWrite.maxHourlySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last hour.
@@ -834,7 +873,7 @@ If you have suggestions for improvements or have found a bug - please open an is
* `http://vmagent-host:8429/api/v1/targets`. This handler returns JSON response
compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
* `http://vmagent-host:8429/ready`. This handler returns http 200 status code when `vmagent` finishes
it's initialization for all the [service_discovery configs](https://docs.victoriametrics.com/sd_configs.html).
its initialization for all the [service_discovery configs](https://docs.victoriametrics.com/sd_configs.html).
It may be useful to perform `vmagent` rolling update without any scrape loss.
## Troubleshooting
@@ -862,9 +901,9 @@ If you have suggestions for improvements or have found a bug - please open an is
* The `/service-discovery` page could be useful for debugging relabeling process for scrape targets.
This page contains original labels for targets dropped during relabeling.
By default the `-promscrape.maxDroppedTargets` targets are shown here. If your setup drops more targets during relabeling,
By default, the `-promscrape.maxDroppedTargets` targets are shown here. If your setup drops more targets during relabeling,
then increase `-promscrape.maxDroppedTargets` command-line flag value to see all the dropped targets.
Note that tracking each dropped target requires up to 10Kb of RAM. Therefore big values for `-promscrape.maxDroppedTargets`
Note that tracking each dropped target requires up to 10Kb of RAM. Therefore, big values for `-promscrape.maxDroppedTargets`
may result in increased memory usage if a big number of scrape targets are dropped during relabeling.
* We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported
@@ -874,13 +913,13 @@ If you have suggestions for improvements or have found a bug - please open an is
* If you see gaps in the data pushed by `vmagent` to remote storage when `-remoteWrite.maxDiskUsagePerURL` is set,
try increasing `-remoteWrite.queues`. Such gaps may appear because `vmagent` cannot keep up with sending the collected data to remote storage.
Therefore it starts dropping the buffered data if the on-disk buffer size exceeds `-remoteWrite.maxDiskUsagePerURL`.
Therefore, it starts dropping the buffered data if the on-disk buffer size exceeds `-remoteWrite.maxDiskUsagePerURL`.
* `vmagent` drops data blocks if remote storage replies with `400 Bad Request` and `409 Conflict` HTTP responses.
The number of dropped blocks can be monitored via `vmagent_remotewrite_packets_dropped_total` metric exported at [/metrics page](#monitoring).
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling).
Such storage systems include Prometheus, Cortex and Thanos, which typically emit `out of order sample` errors.
Such storage systems include Prometheus, Mimir, Cortex and Thanos, which typically emit `out of order sample` errors.
The best solution is to use remote storage with [backfilling support](https://docs.victoriametrics.com/#backfilling) such as VictoriaMetrics.
* `vmagent` buffers scraped data at the `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
@@ -940,7 +979,7 @@ See also [troubleshooting docs](https://docs.victoriametrics.com/Troubleshooting
* [Writing metrics to Kafka](#writing-metrics-to-kafka)
The enterprise version of vmagent is available for evaluation at [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) page
in `vmutils-...-enteprise.tar.gz` archives and in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
in `vmutils-...-enterprise.tar.gz` archives and in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
### Reading metrics from Kafka
@@ -988,7 +1027,7 @@ data_format = "influx"
These command-line flags are available only in [enterprise](https://docs.victoriametrics.com/enterprise.html) version of `vmagent`,
which can be downloaded for evaluation from [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) page
(see `vmutils-...-enteprise.tar.gz` archives) and from [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
(see `vmutils-...-enterprise.tar.gz` archives) and from [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
```
-kafka.consumer.topic array
@@ -1124,7 +1163,7 @@ It is safe sharing the collected profiles from security point of view, since the
## Advanced usage
`vmagent` can be fine-tuned with various command-line flags. Run `./vmagent -help` in order to see the full list of these flags with their desciptions and default values:
`vmagent` can be fine-tuned with various command-line flags. Run `./vmagent -help` in order to see the full list of these flags with their descriptions and default values:
```
./vmagent -help
@@ -1147,9 +1186,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-denyQueryTracing
Whether to disable the ability to trace queries. See https://docs.victoriametrics.com/#query-tracing
-dryRun
Whether to check only config files without running vmagent. The following files are checked: -promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig . Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag
Whether to check config files without running vmagent. The following files are checked: -promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig, -remoteWrite.streamAggr.config . Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
@@ -1159,7 +1198,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-flagsAuthKey string
Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty. See also -graphiteListenAddr.useProxyProtocol
-graphiteListenAddr.useProxyProtocol
@@ -1169,7 +1208,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -1206,13 +1245,17 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metric name if InfluxDB line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringCacheExpireDuration duration
The expiry duration for caches for interned strings. See https://en.wikipedia.org/wiki/String_interning . See also -internStringMaxLen and -internStringDisableCache (default 6m0s)
-internStringDisableCache
Whether to disable caches for interned strings. This may reduce memory usage at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringCacheExpireDuration and -internStringMaxLen
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning . See also -internStringDisableCache and -internStringCacheExpireDuration (default 500)
-kafka.consumer.topic array
Kafka topic names for data consumption. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
Supports an array of values separated by comma or specified via multiple flags.
@@ -1268,11 +1311,11 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-metricsAuthKey string
Auth key for /metrics endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbHTTPListenAddr.useProxyProtocol
-opentsdbHTTPListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
TCP and UDP address to listen for OpenTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol
-opentsdbListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-opentsdbTrimTimestamp duration
@@ -1289,9 +1332,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-promscrape.azureSDCheckInterval duration
Interval for checking for changes in Azure. This works only if azure_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#azure_sd_configs for details (default 1m0s)
-promscrape.cluster.memberNum string
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster. Can be specified as pod name of Kubernetes StatefulSet - pod-name-Num, where Num is a numeric part of pod name (default "0")
The number of number in the cluster of scrapers. It must be a unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster. Can be specified as pod name of Kubernetes StatefulSet - pod-name-Num, where Num is a numeric part of pod name (default "0")
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
The number of members in a cluster of scrapers. Each member must have a unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default, cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.name string
Optional name of the cluster. If multiple vmagent clusters scrape the same targets, then each cluster must have unique name in order to properly de-duplicate samples received from these clusters. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2679
-promscrape.cluster.replicationFactor int
@@ -1303,17 +1346,19 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-promscrape.config.strictParse
Whether to deny unsupported fields in -promscrape.config . Set to false in order to silently skip unsupported fields (default true)
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
Interval for checking for changes in '-promscrape.config' file. By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consul.waitTime duration
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#consul_sd_configs for details (default 30s)
-promscrape.consulagentSDCheckInterval duration
Interval for checking for changes in Consul Agent. This works only if consulagent_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#consulagent_sd_configs for details (default 30s)
-promscrape.digitaloceanSDCheckInterval duration
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#digitalocean_sd_configs for details (default 1m0s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine-grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine-grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
@@ -1340,6 +1385,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
How frequently to reload the full state from Kubernetes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kubernetes_sd_configs for details (default 30s)
-promscrape.kumaSDCheckInterval duration
Interval for checking for changes in kuma service discovery. This works only if kuma_sd_configs is configured in '-promscrape.config' file. See https://docs.victoriametrics.com/sd_configs.html#kuma_sd_configs for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxResponseHeadersSize size
@@ -1362,7 +1409,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-promscrape.seriesLimitPerTarget int
Optional limit on the number of unique time series a single scrape target can expose. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more info
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is possible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine-grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
@@ -1377,7 +1424,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-pushmetrics.interval duration
Interval for pushing metrics to -pushmetrics.url (default 10s)
-pushmetrics.url array
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default metrics exposed at /metrics page aren't pushed to any remote storage
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.aws.accessKey array
Optional AWS AccessKey to use for the corresponding -remoteWrite.url if -remoteWrite.aws.useSigv4 is set
@@ -1420,6 +1467,12 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.flushInterval duration
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
-remoteWrite.forcePromProto array
Whether to force Prometheus remote write protocol for sending data to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.forceVMProto array
Whether to force VictoriaMetrics remote write protocol for sending data to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.headers array
Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'
Supports an array of values separated by comma or specified via multiple flags.
@@ -1432,7 +1485,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.maxDailySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter
-remoteWrite.maxDiskUsagePerURL array
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500MB. Disk usage is unlimited if the value is set to 0
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks. It is recommended to set the value for this flag to a multiple of the block size 500MB. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for size values: KB, MB, GB, TB, KiB, MiB, GiB, TiB.
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.maxHourlySeries int
@@ -1463,12 +1516,14 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.queues int
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs (default 8)
-remoteWrite.rateLimit array
Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. By default, the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.relabelConfig string
Optional path to file with relabeling configs, which are applied to all the metrics before sending them to -remoteWrite.url. See also -remoteWrite.urlRelabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
-remoteWrite.keepDanglingQueues
Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. Useful when -remoteWrite.url is changed temporarily and persistent queue files will be needed later on.
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default, digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.sendTimeout array
Timeout for sending a single block of data to the corresponding -remoteWrite.url
@@ -1485,10 +1540,10 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Input samples are de-duplicated with this interval before being aggregated. Only the last sample per each time series per each interval is aggregated if the interval is greater than zero
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.streamAggr.keepInput array
Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. See https://docs.victoriametrics.com/stream-aggregation.html
Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. By default, the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. See https://docs.victoriametrics.com/stream-aggregation.html
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. By default system CA is used
Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. By default, system CA is used
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to the corresponding -remoteWrite.url
@@ -1500,16 +1555,18 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Optional path to client-side TLS certificate key to use when connecting to the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsServerName array
Optional TLS server name to use for connections to the corresponding -remoteWrite.url. By default the server name from -remoteWrite.url is used
Optional TLS server name to use for connections to the corresponding -remoteWrite.url. By default, the server name from -remoteWrite.url is used
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tmpDataPath string
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
-remoteWrite.url array
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelConfig array
Optional path to relabel configs for the corresponding -remoteWrite.url. See also -remoteWrite.relabelConfig. The path can point either to local file or to http url. See https://docs.victoriametrics.com/vmagent.html#relabeling
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.vmProtoCompressLevel int
The compression level for VictoriaMetrics remote write protocol. Higher values reduce network traffic at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of increased network traffic. See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit
-tls

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -25,7 +26,7 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req, func(rows []parser.Row) error {
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadog/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -28,7 +29,7 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
return err
}
ce := req.Header.Get("Content-Encoding")
return parser.ParseStream(req.Body, ce, func(series []parser.Series) error {
return stream.Parse(req.Body, ce, func(series []parser.Series) error {
return insertRows(at, series, extraLabels)
})
}

View File

@@ -7,6 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/graphite/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -19,7 +20,7 @@ var (
//
// See https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol
func InsertHandler(r io.Reader) error {
return parser.ParseStream(r, insertRows)
return stream.Parse(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -15,13 +15,14 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
measurementFieldSeparator = flag.String("influxMeasurementFieldSeparator", "_", "Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol")
skipSingleField = flag.Bool("influxSkipSingleField", false, "Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field")
skipSingleField = flag.Bool("influxSkipSingleField", false, "Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metric name if InfluxDB line contains only a single field")
skipMeasurement = flag.Bool("influxSkipMeasurement", false, "Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'")
dbLabel = flag.String("influxDBLabel", "db", "Default label for the DB name sent over '?db={db_name}' query parameter")
)
@@ -36,7 +37,7 @@ var (
//
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return parser.ParseStream(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return stream.Parse(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
}
@@ -54,7 +55,7 @@ func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return stream.Parse(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(at, db, rows, extraLabels)
})
}

View File

@@ -46,7 +46,8 @@ var (
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write . "+
"See also -influxListenAddr.useProxyProtocol")
@@ -56,18 +57,18 @@ var (
"See also -graphiteListenAddr.useProxyProtocol")
graphiteUseProxyProtocol = flag.Bool("graphiteListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -graphiteListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpenTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
"Usually :4242 must be set. Doesn't work if empty. See also -opentsdbListenAddr.useProxyProtocol")
opentsdbUseProxyProtocol = flag.Bool("opentsdbListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -opentsdbListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
opentsdbHTTPListenAddr = flag.String("opentsdbHTTPListenAddr", "", "TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. "+
opentsdbHTTPListenAddr = flag.String("opentsdbHTTPListenAddr", "", "TCP address to listen for OpenTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty. "+
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig . "+
dryRun = flag.Bool("dryRun", false, "Whether to check config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig, -remoteWrite.streamAggr.config . "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
)
@@ -98,17 +99,20 @@ func main() {
if err := promscrape.CheckConfig(); err != nil {
logger.Fatalf("error when checking -promscrape.config: %s", err)
}
logger.Infof("-promscrape.config is ok; exitting with 0 status code")
logger.Infof("-promscrape.config is ok; exiting with 0 status code")
return
}
if *dryRun {
if err := remotewrite.CheckRelabelConfigs(); err != nil {
logger.Fatalf("error when checking relabel configs: %s", err)
}
if err := promscrape.CheckConfig(); err != nil {
logger.Fatalf("error when checking -promscrape.config: %s", err)
}
logger.Infof("all the configs are ok; exitting with 0 status code")
if err := remotewrite.CheckRelabelConfigs(); err != nil {
logger.Fatalf("error when checking relabel configs: %s", err)
}
if err := remotewrite.CheckStreamAggrConfigs(); err != nil {
logger.Fatalf("error when checking -remoteWrite.streamAggr.config: %s", err)
}
logger.Infof("all the configs are ok; exiting with 0 status code")
return
}
@@ -208,7 +212,7 @@ func getAuthTokenFromPath(path string) (*auth.Token, error) {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
if r.Method != "GET" {
if r.Method != http.MethodGet {
return false
}
w.Header().Add("Content-Type", "text/html; charset=utf-8")
@@ -253,6 +257,9 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
switch path {
case "/prometheus/api/v1/write", "/api/v1/write":
if common.HandleVMProtoServerHandshake(w, r) {
return true
}
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(nil, r); err != nil {
prometheusWriteErrors.Inc()

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -10,7 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -30,12 +30,12 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzip := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzip, func(block *parser.Block) error {
return stream.Parse(req.Body, isGzip, func(block *stream.Block) error {
return insertRows(at, block, extraLabels)
})
}
func insertRows(at *auth.Token, block *parser.Block, extraLabels []prompbmarshal.Label) error {
func insertRows(at *auth.Token, block *stream.Block, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)

View File

@@ -7,6 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdb/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -19,7 +20,7 @@ var (
//
// See http://opentsdb.net/docs/build/html/api_telnet/put.html
func InsertHandler(r io.Reader) error {
return parser.ParseStream(r, insertRows)
return stream.Parse(r, insertRows)
}
func insertRows(rows []parser.Row) error {

View File

@@ -9,6 +9,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp/stream"
"github.com/VictoriaMetrics/metrics"
)
@@ -24,7 +25,7 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req, func(rows []parser.Row) error {
return stream.Parse(req, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -10,6 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -31,7 +32,7 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return stream.Parse(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
}, func(s string) {
httpserver.LogError(req, s)

View File

@@ -23,7 +23,7 @@ var (
func TestInsertHandler(t *testing.T) {
setUp()
defer tearDown()
req := httptest.NewRequest("POST", "/insert/0/api/v1/import/prometheus", bytes.NewBufferString(`{"foo":"bar"}
req := httptest.NewRequest(http.MethodPost, "/insert/0/api/v1/import/prometheus", bytes.NewBufferString(`{"foo":"bar"}
go_memstats_alloc_bytes_total 1`))
if err := InsertHandler(nil, req); err != nil {
t.Errorf("unxepected error %s", err)

View File

@@ -10,7 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -27,7 +27,8 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
if err != nil {
return err
}
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
isVMRemoteWrite := req.Header.Get("Content-Encoding") == "zstd"
return stream.Parse(req.Body, isVMRemoteWrite, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
}

View File

@@ -15,13 +15,19 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/metrics"
)
var (
forcePromProto = flagutil.NewArrayBool("remoteWrite.forcePromProto", "Whether to force Prometheus remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
forceVMProto = flagutil.NewArrayBool("remoteWrite.forceVMProto", "Whether to force VictoriaMetrics remote write protocol for sending data "+
"to the corresponding -remoteWrite.url . See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
rateLimit = flagutil.NewArrayInt("remoteWrite.rateLimit", "Optional rate limit in bytes per second for data sent to the corresponding -remoteWrite.url. "+
"By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"By default, the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"is sent after temporary unavailability of the remote storage")
sendTimeout = flagutil.NewArrayDuration("remoteWrite.sendTimeout", "Timeout for sending a single block of data to the corresponding -remoteWrite.url")
proxyURL = flagutil.NewArrayString("remoteWrite.proxyURL", "Optional proxy URL for writing data to the corresponding -remoteWrite.url. "+
@@ -32,9 +38,9 @@ var (
"to the corresponding -remoteWrite.url")
tlsKeyFile = flagutil.NewArrayString("remoteWrite.tlsKeyFile", "Optional path to client-side TLS certificate key to use when connecting to the corresponding -remoteWrite.url")
tlsCAFile = flagutil.NewArrayString("remoteWrite.tlsCAFile", "Optional path to TLS CA file to use for verifying connections to the corresponding -remoteWrite.url. "+
"By default system CA is used")
"By default, system CA is used")
tlsServerName = flagutil.NewArrayString("remoteWrite.tlsServerName", "Optional TLS server name to use for connections to the corresponding -remoteWrite.url. "+
"By default the server name from -remoteWrite.url is used")
"By default, the server name from -remoteWrite.url is used")
headers = flagutil.NewArrayString("remoteWrite.headers", "Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. "+
"For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. "+
@@ -69,8 +75,12 @@ var (
type client struct {
sanitizedURL string
remoteWriteURL string
fq *persistentqueue.FastQueue
hc *http.Client
// Whether to use VictoriaMetrics remote write protocol for sending the data to remoteWriteURL
useVMProto bool
fq *persistentqueue.FastQueue
hc *http.Client
sendBlock func(block []byte) bool
authCfg *promauth.Config
@@ -122,19 +132,39 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
}
tr.Proxy = http.ProxyURL(pu)
}
hc := &http.Client{
Transport: tr,
Timeout: sendTimeout.GetOptionalArgOrDefault(argIdx, time.Minute),
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authCfg: authCfg,
awsCfg: awsCfg,
fq: fq,
hc: &http.Client{
Transport: tr,
Timeout: sendTimeout.GetOptionalArgOrDefault(argIdx, time.Minute),
},
stopCh: make(chan struct{}),
hc: hc,
stopCh: make(chan struct{}),
}
c.sendBlock = c.sendBlockHTTP
useVMProto := forceVMProto.GetOptionalArg(argIdx)
usePromProto := forcePromProto.GetOptionalArg(argIdx)
if useVMProto && usePromProto {
logger.Fatalf("-remoteWrite.useVMProto and -remoteWrite.usePromProto cannot be set simultaneously for -remoteWrite.url=%s", sanitizedURL)
}
if !useVMProto && !usePromProto {
// Auto-detect whether the remote storage supports VictoriaMetrics remote write protocol.
doRequest := func(url string) (*http.Response, error) {
return c.doRequest(url, nil)
}
useVMProto = common.HandleVMProtoClientHandshake(c.remoteWriteURL, doRequest)
if !useVMProto {
logger.Infof("the remote storage at %q doesn't support VictoriaMetrics remote write protocol. Switching to Prometheus remote write protocol. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol", sanitizedURL)
}
}
c.useVMProto = useVMProto
return c
}
@@ -291,38 +321,45 @@ func (c *client) runWorker() {
}
}
// sendBlockHTTP returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
retriesCount := 0
c.bytesSent.Add(len(block))
c.blocksSent.Inc()
sigv4Hash := ""
if c.awsCfg != nil {
sigv4Hash = awsapi.HashHex(block)
}
again:
req, err := http.NewRequest("POST", c.remoteWriteURL, bytes.NewBuffer(block))
func (c *client) doRequest(url string, body []byte) (*http.Response, error) {
reqBody := bytes.NewBuffer(body)
req, err := http.NewRequest(http.MethodPost, url, reqBody)
if err != nil {
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", c.sanitizedURL, err)
logger.Panicf("BUG: unexpected error from http.NewRequest(%q): %s", url, err)
}
c.authCfg.SetHeaders(req, true)
h := req.Header
h.Set("User-Agent", "vmagent")
h.Set("Content-Type", "application/x-protobuf")
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
if c.useVMProto {
h.Set("Content-Encoding", "zstd")
h.Set("X-VictoriaMetrics-Remote-Write-Version", "1")
} else {
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
}
if c.awsCfg != nil {
sigv4Hash := awsapi.HashHex(body)
if err := c.awsCfg.SignRequest(req, sigv4Hash); err != nil {
// there is no need in retry, request will be rejected by client.Do and retried by code below
logger.Warnf("cannot sign remoteWrite request with AWS sigv4: %s", err)
}
}
return c.hc.Do(req)
}
// sendBlockHTTP sends the given block to c.remoteWriteURL.
//
// The function returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
retriesCount := 0
again:
startTime := time.Now()
resp, err := c.hc.Do(req)
resp, err := c.doRequest(c.remoteWriteURL, block)
c.requestDuration.UpdateDuration(startTime)
if err != nil {
c.errorsCount.Inc()
@@ -347,6 +384,8 @@ again:
if statusCode/100 == 2 {
_ = resp.Body.Close()
c.requestsOKCount.Inc()
c.bytesSent.Add(len(block))
c.blocksSent.Inc()
return true
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -23,6 +24,9 @@ var (
"This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url")
maxUnpackedBlockSize = flagutil.NewBytes("remoteWrite.maxBlockSize", 8*1024*1024, "The maximum block size to send to remote storage. Bigger blocks may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxRowsPerBlock")
maxRowsPerBlock = flag.Int("remoteWrite.maxRowsPerBlock", 10000, "The maximum number of samples to send in each block to remote storage. Higher number may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxBlockSize")
vmProtoCompressLevel = flag.Int("remoteWrite.vmProtoCompressLevel", 0, "The compression level for VictoriaMetrics remote write protocol. "+
"Higher values reduce network traffic at the cost of higher CPU usage. Negative values reduce CPU usage at the cost of increased network traffic. "+
"See https://docs.victoriametrics.com/vmagent.html#victoriametrics-remote-write-protocol")
)
type pendingSeries struct {
@@ -33,9 +37,10 @@ type pendingSeries struct {
periodicFlusherWG sync.WaitGroup
}
func newPendingSeries(pushBlock func(block []byte), significantFigures, roundDigits int) *pendingSeries {
func newPendingSeries(pushBlock func(block []byte), isVMRemoteWrite bool, significantFigures, roundDigits int) *pendingSeries {
var ps pendingSeries
ps.wr.pushBlock = pushBlock
ps.wr.isVMRemoteWrite = isVMRemoteWrite
ps.wr.significantFigures = significantFigures
ps.wr.roundDigits = roundDigits
ps.stopCh = make(chan struct{})
@@ -88,6 +93,9 @@ type writeRequest struct {
// pushBlock is called when whe write request is ready to be sent.
pushBlock func(block []byte)
// Whether to encode the write request with VictoriaMetrics remote write protocol.
isVMRemoteWrite bool
// How many significant figures must be left before sending the writeRequest to pushBlock.
significantFigures int
@@ -104,7 +112,7 @@ type writeRequest struct {
}
func (wr *writeRequest) reset() {
// Do not reset pushBlock, significantFigures and roundDigits, since they are re-used.
// Do not reset lastFlushTime, pushBlock, isVMRemoteWrite, significantFigures and roundDigits, since they are re-used.
wr.wr.Timeseries = nil
@@ -126,7 +134,7 @@ func (wr *writeRequest) flush() {
wr.wr.Timeseries = wr.tss
wr.adjustSampleValues()
atomic.StoreUint64(&wr.lastFlushTime, fasttime.UnixTimestamp())
pushWriteRequest(&wr.wr, wr.pushBlock)
pushWriteRequest(&wr.wr, wr.pushBlock, wr.isVMRemoteWrite)
wr.reset()
}
@@ -188,7 +196,7 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
wr.buf = buf
}
func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byte)) {
func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byte), isVMRemoteWrite bool) {
if len(wr.Timeseries) == 0 {
// Nothing to push
return
@@ -197,7 +205,11 @@ func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byt
bb.B = prompbmarshal.MarshalWriteRequest(bb.B[:0], wr)
if len(bb.B) <= maxUnpackedBlockSize.IntN() {
zb := snappyBufPool.Get()
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
if isVMRemoteWrite {
zb.B = zstd.CompressLevel(zb.B[:0], bb.B, *vmProtoCompressLevel)
} else {
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
}
writeRequestBufPool.Put(bb)
if len(zb.B) <= persistentqueue.MaxBlockSize {
pushBlock(zb.B)
@@ -221,18 +233,18 @@ func pushWriteRequest(wr *prompbmarshal.WriteRequest, pushBlock func(block []byt
}
n := len(samples) / 2
wr.Timeseries[0].Samples = samples[:n]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries[0].Samples = samples[n:]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries[0].Samples = samples
return
}
timeseries := wr.Timeseries
n := len(timeseries) / 2
wr.Timeseries = timeseries[:n]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries = timeseries[n:]
pushWriteRequest(wr, pushBlock)
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
wr.Timeseries = timeseries
}

View File

@@ -2,40 +2,48 @@ package remotewrite
import (
"fmt"
"math"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/golang/snappy"
)
func TestPushWriteRequest(t *testing.T) {
for _, rowsCount := range []int{1, 10, 100, 1e3, 1e4} {
rowsCounts := []int{1, 10, 100, 1e3, 1e4}
expectedBlockLensProm := []int{216, 1848, 16424, 169882, 1757876}
expectedBlockLensVM := []int{138, 492, 3927, 34995, 288476}
for i, rowsCount := range rowsCounts {
expectedBlockLenProm := expectedBlockLensProm[i]
expectedBlockLenVM := expectedBlockLensVM[i]
t.Run(fmt.Sprintf("%d", rowsCount), func(t *testing.T) {
testPushWriteRequest(t, rowsCount)
testPushWriteRequest(t, rowsCount, expectedBlockLenProm, expectedBlockLenVM)
})
}
}
func testPushWriteRequest(t *testing.T, rowsCount int) {
wr := newTestWriteRequest(rowsCount, 10)
pushBlockLen := 0
pushBlock := func(block []byte) {
if pushBlockLen > 0 {
panic(fmt.Errorf("BUG: pushBlock called multiple times; pushBlockLen=%d at first call, len(block)=%d at second call", pushBlockLen, len(block)))
func testPushWriteRequest(t *testing.T, rowsCount, expectedBlockLenProm, expectedBlockLenVM int) {
f := func(isVMRemoteWrite bool, expectedBlockLen int, tolerancePrc float64) {
t.Helper()
wr := newTestWriteRequest(rowsCount, 20)
pushBlockLen := 0
pushBlock := func(block []byte) {
if pushBlockLen > 0 {
panic(fmt.Errorf("BUG: pushBlock called multiple times; pushBlockLen=%d at first call, len(block)=%d at second call", pushBlockLen, len(block)))
}
pushBlockLen = len(block)
}
pushWriteRequest(wr, pushBlock, isVMRemoteWrite)
if math.Abs(float64(pushBlockLen-expectedBlockLen)/float64(expectedBlockLen)*100) > tolerancePrc {
t.Fatalf("unexpected block len for rowsCount=%d, isVMRemoteWrite=%v; got %d bytes; expecting %d bytes +- %.0f%%",
rowsCount, isVMRemoteWrite, pushBlockLen, expectedBlockLen, tolerancePrc)
}
pushBlockLen = len(block)
}
pushWriteRequest(wr, pushBlock)
b := prompbmarshal.MarshalWriteRequest(nil, wr)
zb := snappy.Encode(nil, b)
maxPushBlockLen := len(zb)
minPushBlockLen := maxPushBlockLen / 2
if pushBlockLen < minPushBlockLen {
t.Fatalf("unexpected block len after pushWriteRequest; got %d bytes; must be at least %d bytes", pushBlockLen, minPushBlockLen)
}
if pushBlockLen > maxPushBlockLen {
t.Fatalf("unexpected block len after pushWriteRequest; got %d bytes; must be smaller or equal to %d bytes", pushBlockLen, maxPushBlockLen)
}
// Check Prometheus remote write
f(false, expectedBlockLenProm, 0)
// Check VictoriaMetrics remote write
f(true, expectedBlockLenVM, 15)
}
func newTestWriteRequest(seriesCount, labelsCount int) *prompbmarshal.WriteRequest {

View File

@@ -4,17 +4,21 @@ import (
"flag"
"fmt"
"net/url"
"path/filepath"
"strconv"
"sync"
"sync/atomic"
"time"
"github.com/cespare/xxhash/v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
@@ -24,32 +28,33 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/streamaggr"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
)
var (
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support Prometheus remote_write API. "+
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteURLs = flagutil.NewArrayString("remoteWrite.url", "Remote storage URL to write data to. It must support either VictoriaMetrics remote write protocol "+
"or Prometheus remote_write protocol. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url options in order to replicate the collected data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteMultitenantURLs = flagutil.NewArrayString("remoteWrite.multitenantURL", "Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
keepDanglingQueues = flag.Bool("remoteWrite.keepDanglingQueues", false, "Keep persistent queues contents at -remoteWrite.tmpDataPath in case there are no matching -remoteWrite.url. "+
"Useful when -remoteWrite.url is changed temporarily and persistent queue files will be needed later on.")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
maxPendingBytesPerURL = flagutil.NewArrayBytes("remoteWrite.maxDiskUsagePerURL", "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
"for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. "+
"Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500MB. "+
"Buffered data is stored in ~500MB chunks. It is recommended to set the value for this flag to a multiple of the block size 500MB. "+
"Disk usage is unlimited if the value is set to 0")
significantFigures = flagutil.NewArrayInt("remoteWrite.significantFigures", "The number of significant figures to leave in metric values before writing them "+
"to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. "+
"This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits")
roundDigits = flagutil.NewArrayInt("remoteWrite.roundDigits", "Round metric values to this number of decimal digits after the point before writing them to remote storage. "+
"Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. "+
"By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. "+
"By default, digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. "+
"This option may be used for improving data compression for the stored metrics")
sortLabels = flag.Bool("sortLabels", false, `Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. `+
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
@@ -64,7 +69,7 @@ var (
"See https://docs.victoriametrics.com/stream-aggregation.html . "+
"See also -remoteWrite.streamAggr.keepInput and -remoteWrite.streamAggr.dedupInterval")
streamAggrKeepInput = flagutil.NewArrayBool("remoteWrite.streamAggr.keepInput", "Whether to keep input samples after the aggregation with -remoteWrite.streamAggr.config. "+
"By default the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. "+
"By default, the input is dropped after the aggregation, so only the aggregate data is sent to the -remoteWrite.url. "+
"See https://docs.victoriametrics.com/stream-aggregation.html")
streamAggrDedupInterval = flagutil.NewArrayDuration("remoteWrite.streamAggr.dedupInterval", "Input samples are de-duplicated with this interval before being aggregated. "+
"Only the last sample per each time series per each interval is aggregated if the interval is greater than zero")
@@ -94,6 +99,8 @@ var allRelabelConfigs atomic.Value
// since it may lead to high memory usage due to big number of buffers.
var maxQueues = cgroup.AvailableCPUs() * 16
const persistentQueueDirname = "persistent-queue"
// InitSecretFlags must be called after flag.Parse and before any logging.
func InitSecretFlags() {
if !*showRemoteWriteURL {
@@ -150,9 +157,8 @@ func Init() {
logger.Fatalf("cannot load relabel configs: %s", err)
}
allRelabelConfigs.Store(rcs)
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
relabelConfigSuccess.Set(1)
relabelConfigTimestamp.Set(fasttime.UnixTimestamp())
if len(*remoteWriteURLs) > 0 {
rwctxsDefault = newRemoteWriteCtxs(nil, *remoteWriteURLs)
@@ -165,34 +171,56 @@ func Init() {
for {
select {
case <-sighupCh:
case <-stopCh:
case <-configReloaderStopCh:
return
}
configReloads.Inc()
logger.Infof("SIGHUP received; reloading relabel configs pointed by -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig")
rcs, err := loadRelabelConfigs()
if err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)
logger.Errorf("cannot reload relabel configs; preserving the previous configs; error: %s", err)
continue
}
allRelabelConfigs.Store(rcs)
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
logger.Infof("Successfully reloaded relabel configs")
reloadRelabelConfigs()
reloadStreamAggrConfigs()
}
}()
}
func reloadRelabelConfigs() {
relabelConfigReloads.Inc()
logger.Infof("reloading relabel configs pointed by -remoteWrite.relabelConfig and -remoteWrite.urlRelabelConfig")
rcs, err := loadRelabelConfigs()
if err != nil {
relabelConfigReloadErrors.Inc()
relabelConfigSuccess.Set(0)
logger.Errorf("cannot reload relabel configs; preserving the previous configs; error: %s", err)
return
}
allRelabelConfigs.Store(rcs)
relabelConfigSuccess.Set(1)
relabelConfigTimestamp.Set(fasttime.UnixTimestamp())
logger.Infof("successfully reloaded relabel configs")
}
var (
configReloads = metrics.NewCounter(`vmagent_relabel_config_reloads_total`)
configReloadErrors = metrics.NewCounter(`vmagent_relabel_config_reloads_errors_total`)
configSuccess = metrics.NewCounter(`vmagent_relabel_config_last_reload_successful`)
configTimestamp = metrics.NewCounter(`vmagent_relabel_config_last_reload_success_timestamp_seconds`)
relabelConfigReloads = metrics.NewCounter(`vmagent_relabel_config_reloads_total`)
relabelConfigReloadErrors = metrics.NewCounter(`vmagent_relabel_config_reloads_errors_total`)
relabelConfigSuccess = metrics.NewCounter(`vmagent_relabel_config_last_reload_successful`)
relabelConfigTimestamp = metrics.NewCounter(`vmagent_relabel_config_last_reload_success_timestamp_seconds`)
)
func reloadStreamAggrConfigs() {
if len(*remoteWriteMultitenantURLs) > 0 {
rwctxsMapLock.Lock()
for _, rwctxs := range rwctxsMap {
reinitStreamAggr(rwctxs)
}
rwctxsMapLock.Unlock()
} else {
reinitStreamAggr(rwctxsDefault)
}
}
func reinitStreamAggr(rwctxs []*remoteWriteCtx) {
for _, rwctx := range rwctxs {
rwctx.reinitStreamAggr()
}
}
func newRemoteWriteCtxs(at *auth.Token, urls []string) []*remoteWriteCtx {
if len(urls) == 0 {
logger.Panicf("BUG: urls must be non-empty")
@@ -225,17 +253,44 @@ func newRemoteWriteCtxs(at *auth.Token, urls []string) []*remoteWriteCtx {
}
rwctxs[i] = newRemoteWriteCtx(i, at, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
}
if !*keepDanglingQueues {
// Remove dangling queues, if any.
// This is required for the case when the number of queues has been changed or URL have been changed.
// See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4014
existingQueues := make(map[string]struct{}, len(rwctxs))
for _, rwctx := range rwctxs {
existingQueues[rwctx.fq.Dirname()] = struct{}{}
}
queuesDir := filepath.Join(*tmpDataPath, persistentQueueDirname)
files := fs.MustReadDir(queuesDir)
removed := 0
for _, f := range files {
dirname := f.Name()
if _, ok := existingQueues[dirname]; !ok {
logger.Infof("removing dangling queue %q", dirname)
fullPath := filepath.Join(queuesDir, dirname)
fs.MustRemoveAll(fullPath)
removed++
}
}
if removed > 0 {
logger.Infof("removed %d dangling queues from %q, active queues: %d", removed, *tmpDataPath, len(rwctxs))
}
}
return rwctxs
}
var stopCh = make(chan struct{})
var configReloaderStopCh = make(chan struct{})
var configReloaderWG sync.WaitGroup
// Stop stops remotewrite.
//
// It is expected that nobody calls Push during and after the call to this func.
func Stop() {
close(stopCh)
close(configReloaderStopCh)
configReloaderWG.Wait()
for _, rwctx := range rwctxsDefault {
@@ -450,7 +505,7 @@ type remoteWriteCtx struct {
fq *persistentqueue.FastQueue
c *client
sas *streamaggr.Aggregators
sas atomic.Pointer[streamaggr.Aggregators]
streamAggrKeepInput bool
pss []*pendingSeries
@@ -466,8 +521,13 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
pqURL.RawQuery = ""
pqURL.Fragment = ""
h := xxhash.Sum64([]byte(pqURL.String()))
queuePath := fmt.Sprintf("%s/persistent-queue/%d_%016X", *tmpDataPath, argIdx+1, h)
queuePath := filepath.Join(*tmpDataPath, persistentQueueDirname, fmt.Sprintf("%d_%016X", argIdx+1, h))
maxPendingBytes := maxPendingBytesPerURL.GetOptionalArgOrDefault(argIdx, 0)
if maxPendingBytes != 0 && maxPendingBytes < persistentqueue.DefaultChunkFileSize {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4195
logger.Warnf("rounding the -remoteWrite.maxDiskUsagePerURL=%d to the minimum supported value: %d", maxPendingBytes, persistentqueue.DefaultChunkFileSize)
maxPendingBytes = persistentqueue.DefaultChunkFileSize
}
fq := persistentqueue.MustOpenFastQueue(queuePath, sanitizedURL, maxInmemoryBlocks, maxPendingBytes)
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_pending_data_bytes{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetPendingBytes())
@@ -475,6 +535,7 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmagent_remotewrite_pending_inmemory_blocks{path=%q, url=%q}`, queuePath, sanitizedURL), func() float64 {
return float64(fq.GetInmemoryQueueLen())
})
var c *client
switch remoteWriteURL.Scheme {
case "http", "https":
@@ -495,7 +556,7 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
}
pss := make([]*pendingSeries, pssLen)
for i := range pss {
pss[i] = newPendingSeries(fq.MustWriteBlock, sf, rd)
pss[i] = newPendingSeries(fq.MustWriteBlock, c.useVMProto, sf, rd)
}
rwctx := &remoteWriteCtx{
@@ -514,10 +575,12 @@ func newRemoteWriteCtx(argIdx int, at *auth.Token, remoteWriteURL *url.URL, maxI
dedupInterval := streamAggrDedupInterval.GetOptionalArgOrDefault(argIdx, 0)
sas, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternal, dedupInterval)
if err != nil {
logger.Fatalf("cannot initialize stream aggregators from -remoteWrite.streamAggrFile=%q: %s", sasFile, err)
logger.Fatalf("cannot initialize stream aggregators from -remoteWrite.streamAggr.config=%q: %s", sasFile, err)
}
rwctx.sas = sas
rwctx.sas.Store(sas)
rwctx.streamAggrKeepInput = streamAggrKeepInput.GetOptionalArg(argIdx)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_successful{path=%q}`, sasFile)).Set(1)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_success_timestamp_seconds{path=%q}`, sasFile)).Set(fasttime.UnixTimestamp())
}
return rwctx
@@ -532,8 +595,10 @@ func (rwctx *remoteWriteCtx) MustStop() {
rwctx.fq.UnblockAllReaders()
rwctx.c.MustStop()
rwctx.c = nil
rwctx.sas.MustStop()
rwctx.sas = nil
sas := rwctx.sas.Swap(nil)
sas.MustStop()
rwctx.fq.MustClose()
rwctx.fq = nil
@@ -564,8 +629,9 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
rwctx.rowsPushedAfterRelabel.Add(rowsCount)
// Apply stream aggregation if any
rwctx.sas.Push(tss)
if rwctx.sas == nil || rwctx.streamAggrKeepInput {
sas := rwctx.sas.Load()
sas.Push(tss)
if sas == nil || rwctx.streamAggrKeepInput {
// Push samples to the remote storage
rwctx.pushInternal(tss)
}
@@ -584,6 +650,36 @@ func (rwctx *remoteWriteCtx) pushInternal(tss []prompbmarshal.TimeSeries) {
pss[idx].Push(tss)
}
func (rwctx *remoteWriteCtx) reinitStreamAggr() {
sas := rwctx.sas.Load()
if sas == nil {
// There is no stream aggregation for rwctx
return
}
sasFile := streamAggrConfig.GetOptionalArg(rwctx.idx)
logger.Infof("reloading stream aggregation configs pointed by -remoteWrite.streamAggr.config=%q", sasFile)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reloads_total{path=%q}`, sasFile)).Inc()
dedupInterval := streamAggrDedupInterval.GetOptionalArgOrDefault(rwctx.idx, 0)
sasNew, err := streamaggr.LoadFromFile(sasFile, rwctx.pushInternal, dedupInterval)
if err != nil {
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reloads_errors_total{path=%q}`, sasFile)).Inc()
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_successful{path=%q}`, sasFile)).Set(0)
logger.Errorf("cannot reload stream aggregation config from -remoteWrite.streamAggr.config=%q; continue using the previously loaded config; error: %s", sasFile, err)
return
}
if !sasNew.Equal(sas) {
sasOld := rwctx.sas.Swap(sasNew)
sasOld.MustStop()
logger.Infof("successfully reloaded stream aggregation configs at -remoteWrite.streamAggr.config=%q", sasFile)
} else {
sasNew.MustStop()
logger.Infof("the config at -remoteWrite.streamAggr.config=%q wasn't changed", sasFile)
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_successful{path=%q}`, sasFile)).Set(1)
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_streamaggr_config_reload_success_timestamp_seconds{path=%q}`, sasFile)).Set(fasttime.UnixTimestamp())
}
var tssRelabelPool = &sync.Pool{
New: func() interface{} {
a := []prompbmarshal.TimeSeries{}
@@ -598,3 +694,20 @@ func getRowsCount(tss []prompbmarshal.TimeSeries) int {
}
return rowsCount
}
// CheckStreamAggrConfigs checks configs pointed by -remoteWrite.streamAggr.config
func CheckStreamAggrConfigs() error {
pushNoop := func(tss []prompbmarshal.TimeSeries) {}
for idx, sasFile := range *streamAggrConfig {
if sasFile == "" {
continue
}
dedupInterval := streamAggrDedupInterval.GetOptionalArgOrDefault(idx, 0)
sas, err := streamaggr.LoadFromFile(sasFile, pushNoop, dedupInterval)
if err != nil {
return fmt.Errorf("cannot load -remoteWrite.streamAggr.config=%q: %w", sasFile, err)
}
sas.MustStop()
}
return nil
}

View File

@@ -11,6 +11,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
@@ -30,7 +31,7 @@ func InsertHandler(at *auth.Token, req *http.Request) error {
return err
}
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return stream.Parse(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
}

View File

@@ -114,6 +114,9 @@ vmalert-linux-arm64:
vmalert-linux-ppc64le:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmalert-linux-s390x:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmalert-linux-386:
APP_NAME=vmalert CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -14,7 +14,7 @@ or [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.ht
of VictoriaMetrics are capable of proxying requests to vmalert via `-vmalert.proxyURL` command-line flag.
Use this feature for the following cases:
* for proxying requests from [Grafana Alerting UI](https://grafana.com/docs/grafana/latest/alerting/);
* for accessing vmalert's UI through VictoriaMetrics Web interface.
* for accessing vmalerts UI through VictoriaMetrics Web interface.
## Features
@@ -23,12 +23,14 @@ Use this feature for the following cases:
support and expressions validation;
* Prometheus [alerting rules definition format](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#defining-alerting-rules)
support;
* Integration with [Alertmanager](https://github.com/prometheus/alertmanager) starting from [Alertmanager v0.16.0-aplha](https://github.com/prometheus/alertmanager/releases/tag/v0.16.0-alpha.0);
* Integration with [Alertmanager](https://github.com/prometheus/alertmanager) starting from [Alertmanager v0.16.0-alpha](https://github.com/prometheus/alertmanager/releases/tag/v0.16.0-alpha.0);
* Keeps the alerts [state on restarts](#alerts-state-on-restarts);
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite);
* Recording and Alerting rules backfilling (aka `replay`). See [these docs](#rules-backfilling);
* Lightweight and without extra dependencies.
* Supports [reusable templates](#reusable-templates) for annotations.
* Supports [reusable templates](#reusable-templates) for annotations;
* Load of recording and alerting rules from local filesystem, URL, GCS and S3;
* Detect alerting rules which [don't match any series](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039).
## Limitations
@@ -143,6 +145,15 @@ params:
headers:
[ <string>, ...]
# Optional list of HTTP headers in form `header-name: value`
# applied for all alert notifications sent to notifiers
# generated by rules of this group.
# For example:
# notifier_headers:
# - "TenantID: foo"
notifier_headers:
[ <string>, ...]
# Optional list of labels added to every rule within a group.
# It has priority over the external labels.
# Labels are commonly used for adding environment
@@ -260,7 +271,7 @@ Additionally, `vmalert` provides some extra templating functions listed [here](#
query at `-datasource.url` and returns the first result.
- `queryEscape` - escapes the input string, so it can be safely put inside [query arg](https://en.wikipedia.org/wiki/Percent-encoding) part of URL.
- `quotesEscape` - escapes the input string, so it can be safely embedded into JSON string.
- `reReplaceAll regex repl` - replaces all the occurences of the `regex` in input string with the `repl`.
- `reReplaceAll regex repl` - replaces all the occurrences of the `regex` in input string with the `repl`.
- `safeHtml` - marks the input string as safe to use in HTML context without the need to html-escape it.
- `sortByLabel name` - sorts the input query results by the label with the given `name`.
- `stripDomain` - leaves the first part of the domain. For example, `foo.bar.baz` is converted to `foo`.
@@ -404,6 +415,25 @@ The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz`
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### Reading rules from object storage
[Enterprise version](https://docs.victoriametrics.com/enterprise.html) of `vmalert` may read alerting and recording rules
from object storage:
- `./bin/vmalert -rule=s3://bucket/dir/alert.rules` would read rules from the given path at S3 bucket
- `./bin/vmalert -rule=gs://bucket/bir/alert.rules` would read rules from the given path at GCS bucket
S3 and GCS paths support only matching by prefix, e.g. `s3://bucket/dir/rule_` matches
all files with prefix `rule_` in the folder `dir`.
The following [command-line flags](#flags) can be used for fine-tuning access to S3 and GCS:
- `-s3.credsFilePath` - path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
- `-s3.configFilePath` - path to file with S3 configs. Configs are loaded from default location if not set.
- `-s3.configProfile` - profile name for S3 configs. If no set, the value of the environment variable will be loaded (`AWS_PROFILE` or `AWS_DEFAULT_PROFILE`).
- `-s3.customEndpoint` - custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set.
- `-s3.forcePathStyle` - prefixing endpoint with bucket name when set false, true by default.
### Topology examples
The following sections are showing how `vmalert` may be used and configured
@@ -453,7 +483,7 @@ Cluster mode could have multiple `vminsert` and `vmselect` components.
<img alt="vmalert cluster" src="vmalert_cluster.png">
In case when you want to spread the load on these components - add balancers before them and configure
`vmalert` with balancer's addresses. Please, see more about VM's cluster architecture
`vmalert` with balancer addresses. Please, see more about VM's cluster architecture
[here](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview).
#### HA vmalert
@@ -478,7 +508,7 @@ Alertmanagers.
To avoid recording rules results and alerts state duplication in VictoriaMetrics server
don't forget to configure [deduplication](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication).
The recommended value for `-dedup.minScrapeInterval` must be greater or equal to vmalert's `evaluation_interval`.
The recommended value for `-dedup.minScrapeInterval` must be greater or equal to vmalert `evaluation_interval`.
If you observe inconsistent or "jumping" values in series produced by vmalert, try disabling `-datasource.queryTimeAlignment`
command line flag. Because of alignment, two or more vmalert HA pairs will produce results with the same timestamps.
But due of backfilling (data delivered to the datasource with some delay) values of such results may differ,
@@ -488,7 +518,7 @@ Alertmanager will automatically deduplicate alerts with identical labels, so ens
all `vmalert`s are having the same config.
Don't forget to configure [cluster mode](https://prometheus.io/docs/alerting/latest/alertmanager/)
for Alertmanagers for better reliability. List all Alertmanager URLs in vmalert's `-notifier.url`
for Alertmanagers for better reliability. List all Alertmanager URLs in vmalert `-notifier.url`
to ensure [high availability](https://github.com/prometheus/alertmanager#high-availability).
This example uses single-node VM server for the sake of simplicity.
@@ -580,7 +610,7 @@ or time series modification via [relabeling](https://docs.victoriametrics.com/vm
`vmalert` web UI can be accessed from [single-node version of VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html)
and from [cluster version of VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
This may be used for better integraion with Grafana unified alerting system. See the following docs for details:
This may be used for better integration with Grafana unified alerting system. See the following docs for details:
* [How to query vmalert from single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#vmalert)
* [How to query vmalert from VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#vmalert)
@@ -601,6 +631,12 @@ can read the same rules configuration as normal, evaluate them on the given time
results via remote write to the configured storage. vmalert supports any PromQL/MetricsQL compatible
data source for backfilling.
Please note, that response caching may lead to unexpected results during and after backfilling process.
In order to avoid this you need to reset cache contents or disable caching when using backfilling
as described in [backfilling docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#backfilling).
See a blogpost about [Rules backfilling via vmalert](https://victoriametrics.com/blog/rules-replay/).
### How it works
In `replay` mode vmalert works as a cli-tool and exits immediately after work is done.
@@ -705,31 +741,42 @@ a review to the dashboard.
## Troubleshooting
vmalert executes configured rules within certain intervals. It is expected that at the moment when rule is executed,
the data is already present in configured `-datasource.url`:
### Data delay
Data delay is one of the most common issues with rules execution.
vmalert executes configured rules within certain intervals at specifics timestamps.
It expects that the data is already present in configured `-datasource.url` at the moment of time when rule is executed:
<img alt="vmalert expected evaluation" src="vmalert_ts_normal.gif">
Usually, troubles start to appear when data in `-datasource.url` is delayed or absent. In such cases, evaluations
may get empty response from datasource and produce empty recording rules or reset alerts state:
may get empty response from the datasource and produce empty recording rules or reset alerts state:
<img alt="vmalert evaluation when data is delayed" src="vmalert_ts_data_delay.gif">
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s.
This behavior is controlled by `-search.latencyOffset` command-line flag and the `latency_offset` query ag at `vmselect`.
Usually, this results into a 30s shift for recording rules results.
Note that too small value passed to `-search.latencyOffset` or to `latency_offest` query arg may lead to incomplete query results.
Try the following recommendations to reduce the chance of hitting the data delay issue:
Try the following recommendations in such cases:
* Always configure group's `evaluationInterval` to be bigger or at least equal to
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution);
* Ensure that `[duration]` value is at least twice bigger than
[time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution). For example,
if expression is `rate(my_metric[2m]) > 0` then ensure that `my_metric` resolution is at least `1m` or better `30s`.
If you use VictoriaMetrics as datasource, `[duration]` can be omitted and VictoriaMetrics will adjust it automatically.
* If you know in advance, that data in datasource is delayed - try changing vmalerts `-datasource.lookback`
command-line flag to add a time shift for evaluations. Or extend `[duration]` to tolerate the delay.
For example, `max_over_time(errors_total[10m]) > 0` will be active even if there is no data in datasource for last `9m`.
* If [time series resolution](https://docs.victoriametrics.com/keyConcepts.html#time-series-resolution)
in datasource is inconsistent or `>=5min` - try changing vmalerts `-datasource.queryStep` command-line flag to specify
how far search query can lookback for the recent datapoint. The recommendation is to have the step
at least two times bigger than the resolution.
* Always configure group's `evaluationInterval` to be bigger or equal to `scrape_interval` at which metrics
are delivered to the datasource;
* If you know in advance, that data in datasource is delayed - try changing vmalert's `-datasource.lookback`
command-line flag to add a time shift for evaluations;
* If time intervals between datapoints in datasource are irregular or `>=5min` - try changing vmalert's
`-datasource.queryStep` command-line flag to specify how far search query can lookback for the recent datapoint.
The recommendation is to have the step at least two times bigger than `scrape_interval`, since
there are no guarantees that scrape will not fail.
> Please note, data delay is inevitable in distributed systems. And it is better to account for it instead of ignoring.
By default, recently written samples to VictoriaMetrics aren't visible for queries for up to 30s
(see `-search.latencyOffset` command-line flag at vmselect). Such delay is needed to eliminate risk of incomplete
data on the moment of querying, since metrics collectors won't be able to deliver the data in time.
### Alerts state
Sometimes, it is not clear why some specific alert fired or didn't fire. It is very important to remember, that
alerts with `for: 0` fire immediately when their expression becomes true. And alerts with `for > 0` will fire only
@@ -752,6 +799,8 @@ HTTP request sent by vmalert to the `-datasource.url` during evaluation. If spec
no samples returned and curl command returns data - then it is very likely there was no data in datasource on the
moment when rule was evaluated.
### Debug mode
vmalert allows configuring more detailed logging for specific alerting rule. Just set `debug: true` in rule's configuration
and vmalert will start printing additional log messages:
```terminal
@@ -764,6 +813,22 @@ and vmalert will start printing additional log messages:
2022-09-15T13:36:56.153Z DEBUG rule "TestGroup":"Conns" (2601299393013563564) at 2022-09-15T15:36:56+02:00: alert 10705778000901301787 {alertgroup="TestGroup",alertname="Conns",cluster="east-1",instance="localhost:8429",replica="a"} PENDING => FIRING: 1m0s since becoming active at 2022-09-15 15:35:56.126006 +0200 CEST m=+39.384575417
```
### Never-firing alerts
vmalert can detect if alert's expression doesn't match any time series in runtime. This problem usually happens
when alerting expression selects time series which aren't present in the datasource (i.e. wrong `job` label)
or there is a typo in the series selector (i.e. `env=rpod`). Such alerting rules will be marked with special icon in
vmalerts UI and exposed via `vmalert_alerting_rules_last_evaluation_series_fetched` metric. The metric value will
show how many time series were matched before the filtering by rule's expression. If metric value is `-1`, then
this feature is not supported by the datasource (old versions of VictoriaMetrics). The following expression can be
used to detect rules matching no series:
```
max(vmalert_alerting_rules_last_evaluation_series_fetched) by(group, alertname) == 0
```
See more details [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039).
This feature is available only if vmalert is using VictoriaMetrics v1.90 or higher as a datasource.
## Profiling
@@ -807,7 +872,7 @@ The shortlist of configuration flags is the following:
-clusterMode
If clusterMode is enabled, then vmalert automatically adds the tenant specified in config groups to -datasource.url, -remoteWrite.url and -remoteRead.url. See https://docs.victoriametrics.com/vmalert.html#multitenancy . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-configCheckInterval duration
Interval for checking for changes in '-rule' or '-notifier.config' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.
Interval for checking for changes in '-rule' or '-notifier.config' files. By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes.
-datasource.appendTypePrefix
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
-datasource.basicAuth.password string
@@ -841,7 +906,7 @@ The shortlist of configuration flags is the following:
-datasource.queryStep duration
How far a value can fallback to when evaluating queries. For example, if -datasource.queryStep=15s then param "step" with value "15s" will be added to every query. If set to 0, rule's evaluation interval will be used instead. (default 5m0s)
-datasource.queryTimeAlignment
Whether to align "time" parameter with evaluation interval.Alignment supposed to produce deterministic results despite of number of vmalert replicas or time they were started. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1257 (default true)
Whether to align "time" parameter with evaluation interval.Alignment supposed to produce deterministic results despite number of vmalert replicas or time they were started. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1257 (default true)
-datasource.roundDigits int
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
-datasource.showURL
@@ -867,7 +932,7 @@ The shortlist of configuration flags is the following:
-dryRun
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
@@ -877,7 +942,7 @@ The shortlist of configuration flags is the following:
-evaluationInterval duration
How often to evaluate the rules (default 1m0s)
-external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service. Supports templating - see https://docs.victoriametrics.com/vmalert.html#templating . For example, link to Grafana: -external.alert.source='explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":{{$expr|jsonEscape|queryEscape}} },{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' . If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service. Supports templating - see https://docs.victoriametrics.com/vmalert.html#templating . For example, link to Grafana: -external.alert.source='explore?orgId=1&left={"datasource":"VictoriaMetrics","queries":[{"expr":{{$expr|jsonEscape|queryEscape}},"refId":"A"}],"range":{"from":"now-1h","to":"now"}}'. Link to VMUI: -external.alert.source='vmui/#/?g0.expr={{.Expr|queryEscape}}'. If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.
-external.label array
Optional label in the form 'Name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports an array of values separated by comma or specified via multiple flags.
@@ -886,11 +951,11 @@ The shortlist of configuration flags is the following:
-flagsAuthKey string
Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -910,7 +975,7 @@ The shortlist of configuration flags is the following:
-insert.maxQueueDuration duration
The maximum duration to wait in the queue when -maxConcurrentInserts concurrent insert requests are executed (default 1m0s)
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
@@ -971,7 +1036,7 @@ The shortlist of configuration flags is the following:
-notifier.suppressDuplicateTargetErrors
Whether to suppress 'duplicate target' errors during discovery
-notifier.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default, system CA is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
@@ -983,7 +1048,7 @@ The shortlist of configuration flags is the following:
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsServerName array
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Optional TLS server name to use for connections to -notifier.url. By default, the server name from -notifier.url is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus Alertmanager URL, e.g. http://127.0.0.1:9093. List all Alertmanager URLs if it runs in the cluster mode to ensure high availability.
@@ -1006,7 +1071,7 @@ The shortlist of configuration flags is the following:
-pushmetrics.interval duration
Interval for pushing metrics to -pushmetrics.url (default 10s)
-pushmetrics.url array
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default metrics exposed at /metrics page aren't pushed to any remote storage
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags.
-remoteRead.basicAuth.password string
Optional basic auth password for -remoteRead.url
@@ -1023,7 +1088,7 @@ The shortlist of configuration flags is the following:
-remoteRead.headers string
Optional HTTP headers to send with each request to the corresponding -remoteRead.url. For example, -remoteRead.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteRead.url. Multiple headers must be delimited by '^^': -remoteRead.headers='header1:value1^^header2:value2'
-remoteRead.ignoreRestoreErrors
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
Whether to ignore errors from remote storage when restoring alerts state on startup. DEPRECATED - this flag has no effect and will be removed in the next releases. (default true)
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.oauth2.clientID string
@@ -1039,7 +1104,7 @@ The shortlist of configuration flags is the following:
-remoteRead.showURL
Whether to show -remoteRead.url in the exported metrics. It is hidden by default, since it can contain sensitive info such as auth key
-remoteRead.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default, system CA is used
-remoteRead.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
-remoteRead.tlsInsecureSkipVerify
@@ -1047,7 +1112,7 @@ The shortlist of configuration flags is the following:
-remoteRead.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
-remoteRead.tlsServerName string
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
Optional TLS server name to use for connections to -remoteRead.url. By default, the server name from -remoteRead.url is used
-remoteRead.url vmalert
Optional URL to datasource compatible with Prometheus HTTP API. It can be single node VictoriaMetrics or vmselect.Remote read is used to restore alerts state.This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428. See also '-remoteRead.disablePathAppend', '-remoteRead.showURL'.
-remoteWrite.basicAuth.password string
@@ -1069,7 +1134,7 @@ The shortlist of configuration flags is the following:
-remoteWrite.headers string
Optional HTTP headers to send with each request to the corresponding -remoteWrite.url. For example, -remoteWrite.headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding -remoteWrite.url. Multiple headers must be delimited by '^^': -remoteWrite.headers='header1:value1^^header2:value2'
-remoteWrite.maxBatchSize int
Defines defines max number of timeseries to be flushed at once (default 1000)
Defines max number of timeseries to be flushed at once (default 1000)
-remoteWrite.maxQueueSize int
Defines the max number of pending datapoints to remote write endpoint (default 100000)
-remoteWrite.oauth2.clientID string
@@ -1087,7 +1152,7 @@ The shortlist of configuration flags is the following:
-remoteWrite.showURL
Whether to show -remoteWrite.url in the exported metrics. It is hidden by default, since it can contain sensitive info such as auth key
-remoteWrite.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default, system CA is used
-remoteWrite.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
-remoteWrite.tlsInsecureSkipVerify
@@ -1095,34 +1160,42 @@ The shortlist of configuration flags is the following:
-remoteWrite.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
-remoteWrite.tlsServerName string
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
Optional TLS server name to use for connections to -remoteWrite.url. By default, the server name from -remoteWrite.url is used
-remoteWrite.url string
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. For example, if -remoteWrite.url=http://127.0.0.1:8428 is specified, then the alerts state will be written to http://127.0.0.1:8428/api/v1/write . See also -remoteWrite.disablePathAppend, '-remoteWrite.showURL'.
-replay.disableProgressBar
Whether to disable rendering progress bars during the replay. Progress bar rendering might be verbose or break the logs parsing, so it is recommended to be disabled when not used in interactive mode.
-replay.maxDatapointsPerQuery int
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
-replay.maxDatapointsPerQuery /query_range
Max number of data points expected in one request. It affects the max time range for every /query_range request during the replay. The higher the value, the less requests will be made during replay. (default 1000)
-replay.ruleRetryAttempts int
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
-replay.rulesDelay duration
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule.Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
Delay between rules evaluation within the group. Could be important if there are chained rules inside the group and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule.Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
-replay.timeFrom string
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
-replay.timeTo string
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Path to the files with alerting and/or recording rules.
Supports hierarchical patterns and regexpes.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
-rule="/path/to/file". Path to a single file with alerting rules.
-rule="http://<some-server-addr>/path/to/rules". HTTP URL to a page with alerting rules.
-rule="dir/*.yaml" -rule="/*.yaml" -rule="gcs://vmalert-rules/tenant_%{TENANT_ID}/prod".
-rule="dir/**/*.yaml". Includes all the .yaml files in "dir" subfolders recursively.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
Supports an array of values separated by comma or specified via multiple flags.
-rule.configCheckInterval duration
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead
Interval for checking for changes in '-rule' files. By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead
-rule.maxResolveDuration duration
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group.
-rule.resendDelay duration
Minimum amount of time to wait before resending an alert to notifier
-rule.templates array
@@ -1132,13 +1205,26 @@ The shortlist of configuration flags is the following:
-rule.templates="/path/to/file". Path to a single file with go templates
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
absolute path to all .tpl files in root.
-rule.templates="dir/**/*.tpl". Includes all the .tpl files in "dir" subfolders recursively.
Supports an array of values separated by comma or specified via multiple flags.
-rule.updateEntriesLimit int
Defines the max number of rule's state updates stored in-memory. Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param. (default 20)
Defines the max number of rule's state updates stored in-memory. Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param. (default 20)
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
Whether to validate annotation and label templates (default true)
-s3.configFilePath string
Path to file with S3 configs. Configs are loaded from default location if not set.
See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.configProfile string
Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.credsFilePath string
Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html . This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.customEndpoint string
Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html
-s3.forcePathStyle
Prefixing endpoint with bucket name when set false, true by default. This flag is available only in VictoriaMetrics enterprise. See https://docs.victoriametrics.com/enterprise.html (default true)
-tls
Whether to enable TLS for incoming HTTP requests at -httpListenAddr (aka https). -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -47,10 +47,11 @@ type AlertingRule struct {
}
type alertingRuleMetrics struct {
errors *utils.Gauge
pending *utils.Gauge
active *utils.Gauge
samples *utils.Gauge
errors *utils.Gauge
pending *utils.Gauge
active *utils.Gauge
samples *utils.Gauge
seriesFetched *utils.Gauge
}
func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *AlertingRule {
@@ -121,6 +122,21 @@ func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule
e := ar.state.getLast()
return float64(e.samples)
})
ar.metrics.seriesFetched = utils.GetOrCreateGauge(fmt.Sprintf(`vmalert_alerting_rules_last_evaluation_series_fetched{%s}`, labels),
func() float64 {
e := ar.state.getLast()
if e.seriesFetched == nil {
// means seriesFetched is unsupported
return -1
}
seriesFetched := float64(*e.seriesFetched)
if seriesFetched == 0 && e.samples > 0 {
// `alert: 0.95` will fetch no series
// but will get one time series in response.
seriesFetched = float64(e.samples)
}
return seriesFetched
})
return ar
}
@@ -130,6 +146,7 @@ func (ar *AlertingRule) Close() {
ar.metrics.pending.Unregister()
ar.metrics.errors.Unregister()
ar.metrics.samples.Unregister()
ar.metrics.seriesFetched.Unregister()
}
// String implements Stringer interface
@@ -234,7 +251,7 @@ func (ar *AlertingRule) toLabels(m datasource.Metric, qFn templates.QueryFn) (*l
// to get time series for backfilling.
// It returns ALERT and ALERT_FOR_STATE time series as result.
func (ar *AlertingRule) ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
series, err := ar.q.QueryRange(ctx, ar.Expr, start, end)
res, err := ar.q.QueryRange(ctx, ar.Expr, start, end)
if err != nil {
return nil, err
}
@@ -242,7 +259,7 @@ func (ar *AlertingRule) ExecRange(ctx context.Context, start, end time.Time) ([]
qFn := func(query string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported in replay mode")
}
for _, s := range series {
for _, s := range res.Data {
a, err := ar.newAlert(s, nil, time.Time{}, qFn) // initial alert
if err != nil {
return nil, fmt.Errorf("failed to create alert: %s", err)
@@ -282,14 +299,15 @@ const resolvedRetention = 15 * time.Minute
// Based on the Querier results AlertingRule maintains notifier.Alerts
func (ar *AlertingRule) Exec(ctx context.Context, ts time.Time, limit int) ([]prompbmarshal.TimeSeries, error) {
start := time.Now()
qMetrics, req, err := ar.q.Query(ctx, ar.Expr, ts)
res, req, err := ar.q.Query(ctx, ar.Expr, ts)
curState := ruleStateEntry{
time: start,
at: ts,
duration: time.Since(start),
samples: len(qMetrics),
err: err,
curl: requestToCurl(req),
time: start,
at: ts,
duration: time.Since(start),
samples: len(res.Data),
seriesFetched: res.SeriesFetched,
err: err,
curl: requestToCurl(req),
}
defer func() {
@@ -315,11 +333,11 @@ func (ar *AlertingRule) Exec(ctx context.Context, ts time.Time, limit int) ([]pr
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res, err
return res.Data, err
}
updated := make(map[uint64]struct{})
// update list of active alerts
for _, m := range qMetrics {
for _, m := range res.Data {
ls, err := ar.toLabels(m, qFn)
if err != nil {
curState.err = fmt.Errorf("failed to expand labels: %s", err)
@@ -421,7 +439,9 @@ func (ar *AlertingRule) UpdateWith(r Rule) error {
ar.Labels = nr.Labels
ar.Annotations = nr.Annotations
ar.EvalInterval = nr.EvalInterval
ar.Debug = nr.Debug
ar.q = nr.q
ar.state = nr.state
return nil
}
@@ -483,21 +503,23 @@ func (ar *AlertingRule) AlertAPI(id uint64) *APIAlert {
func (ar *AlertingRule) ToAPI() APIRule {
lastState := ar.state.getLast()
r := APIRule{
Type: "alerting",
DatasourceType: ar.Type.String(),
Name: ar.Name,
Query: ar.Expr,
Duration: ar.For.Seconds(),
Labels: ar.Labels,
Annotations: ar.Annotations,
LastEvaluation: lastState.time,
EvaluationTime: lastState.duration.Seconds(),
Health: "ok",
State: "inactive",
Alerts: ar.AlertsToAPI(),
LastSamples: lastState.samples,
MaxUpdates: ar.state.size(),
Updates: ar.state.getAll(),
Type: "alerting",
DatasourceType: ar.Type.String(),
Name: ar.Name,
Query: ar.Expr,
Duration: ar.For.Seconds(),
Labels: ar.Labels,
Annotations: ar.Annotations,
LastEvaluation: lastState.time,
EvaluationTime: lastState.duration.Seconds(),
Health: "ok",
State: "inactive",
Alerts: ar.AlertsToAPI(),
LastSamples: lastState.samples,
LastSeriesFetched: lastState.seriesFetched,
MaxUpdates: ar.state.size(),
Updates: ar.state.getAll(),
Debug: ar.Debug,
// encode as strings to avoid rounding in JSON
ID: fmt.Sprintf("%d", ar.ID()),
@@ -604,54 +626,60 @@ func alertForToTimeSeries(a *notifier.Alert, timestamp int64) prompbmarshal.Time
return newTimeSeries([]float64{float64(a.ActiveAt.Unix())}, []int64{timestamp}, labels)
}
// Restore restores the state of active alerts basing on previously written time series.
// Restore restores only ActiveAt field. Field State will be always Pending and supposed
// to be updated on next Exec, as well as Value field.
// Only rules with For > 0 will be restored.
func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookback time.Duration, labels map[string]string) error {
if q == nil {
return fmt.Errorf("querier is nil")
// Restore restores the value of ActiveAt field for active alerts,
// based on previously written time series `alertForStateMetricName`.
// Only rules with For > 0 can be restored.
func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, ts time.Time, lookback time.Duration) error {
if ar.For < 1 {
return nil
}
ts := time.Now()
qFn := func(query string) ([]datasource.Metric, error) {
res, _, err := ar.q.Query(ctx, query, ts)
return res, err
ar.alertsMu.Lock()
defer ar.alertsMu.Unlock()
if len(ar.alerts) < 1 {
return nil
}
// account for external labels in filter
var labelsFilter string
for k, v := range labels {
labelsFilter += fmt.Sprintf(",%s=%q", k, v)
}
expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])",
alertForStateMetricName, ar.Name, labelsFilter, int(lookback.Seconds()))
qMetrics, _, err := q.Query(ctx, expr, ts)
if err != nil {
return err
}
for _, m := range qMetrics {
ls := &labelSet{
origin: make(map[string]string, len(m.Labels)),
processed: make(map[string]string, len(m.Labels)),
for _, a := range ar.alerts {
if a.Restored || a.State != notifier.StatePending {
continue
}
for _, l := range m.Labels {
if l.Name == "__name__" {
continue
}
ls.origin[l.Name] = l.Value
ls.processed[l.Name] = l.Value
var labelsFilter []string
for k, v := range a.Labels {
labelsFilter = append(labelsFilter, fmt.Sprintf("%s=%q", k, v))
}
a, err := ar.newAlert(m, ls, time.Unix(int64(m.Values[0]), 0), qFn)
sort.Strings(labelsFilter)
expr := fmt.Sprintf("last_over_time(%s{%s}[%ds])",
alertForStateMetricName, strings.Join(labelsFilter, ","), int(lookback.Seconds()))
ar.logDebugf(ts, nil, "restoring alert state via query %q", expr)
res, _, err := q.Query(ctx, expr, ts)
if err != nil {
return fmt.Errorf("failed to create alert: %w", err)
return err
}
a.ID = hash(ls.processed)
a.State = notifier.StatePending
qMetrics := res.Data
if len(qMetrics) < 1 {
ar.logDebugf(ts, nil, "no response was received from restore query")
continue
}
// only one series expected in response
m := qMetrics[0]
// __name__ supposed to be alertForStateMetricName
m.DelLabel("__name__")
// we assume that restore query contains all label matchers,
// so all received labels will match anyway if their number is equal.
if len(m.Labels) != len(a.Labels) {
ar.logDebugf(ts, nil, "state restore query returned not expected label-set %v", m.Labels)
continue
}
a.ActiveAt = time.Unix(int64(m.Values[0]), 0)
a.Restored = true
ar.alerts[a.ID] = a
logger.Infof("alert %q (%d) restored to state at %v", a.Name, a.ID, a.ActiveAt)
}
return nil

View File

@@ -6,12 +6,15 @@ import (
"reflect"
"sort"
"strings"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
func TestAlertingRule_ToTimeSeries(t *testing.T) {
@@ -502,118 +505,156 @@ func TestAlertingRule_ExecRange(t *testing.T) {
}
}
func TestAlertingRule_Restore(t *testing.T) {
testCases := []struct {
rule *AlertingRule
metrics []datasource.Metric
expAlerts map[uint64]*notifier.Alert
}{
{
newTestRuleWithLabels("no extra labels"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
),
},
map[uint64]*notifier.Alert{
hash(nil): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("metric labels"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
alertNameLabel, "metric labels",
alertGroupNameLabel, "groupID",
"foo", "bar",
"namespace", "baz",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{
alertNameLabel: "metric labels",
alertGroupNameLabel: "groupID",
"foo": "bar",
"namespace": "baz",
}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("rule labels", "source", "vm"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
"foo", "bar",
"namespace", "baz",
// extra labels set by rule
"source", "vm",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{
"foo": "bar",
"namespace": "baz",
"source": "vm",
}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
},
},
{
newTestRuleWithLabels("multiple alerts"),
[]datasource.Metric{
metricWithValueAndLabels(t, float64(time.Now().Truncate(time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-1",
),
metricWithValueAndLabels(t, float64(time.Now().Truncate(2*time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-2",
),
metricWithValueAndLabels(t, float64(time.Now().Truncate(3*time.Hour).Unix()),
"__name__", alertForStateMetricName,
"host", "localhost-3",
),
},
map[uint64]*notifier.Alert{
hash(map[string]string{"host": "localhost-1"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(time.Hour)},
hash(map[string]string{"host": "localhost-2"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(2 * time.Hour)},
hash(map[string]string{"host": "localhost-3"}): {State: notifier.StatePending,
ActiveAt: time.Now().Truncate(3 * time.Hour)},
},
},
func TestGroup_Restore(t *testing.T) {
defaultTS := time.Now()
fqr := &fakeQuerierWithRegistry{}
fn := func(rules []config.Rule, expAlerts map[uint64]*notifier.Alert) {
t.Helper()
defer fqr.reset()
for _, r := range rules {
fqr.set(r.Expr, metricWithValueAndLabels(t, 0, "__name__", r.Alert))
}
fg := newGroup(config.Group{Name: "TestRestore", Rules: rules}, fqr, time.Second, nil)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
nts := func() []notifier.Notifier { return []notifier.Notifier{&fakeNotifier{}} }
fg.start(context.Background(), nts, nil, fqr)
wg.Done()
}()
fg.close()
wg.Wait()
gotAlerts := make(map[uint64]*notifier.Alert)
for _, rs := range fg.Rules {
alerts := rs.(*AlertingRule).alerts
for k, v := range alerts {
if !v.Restored {
// set not restored alerts to predictable timestamp
v.ActiveAt = defaultTS
}
gotAlerts[k] = v
}
}
if len(gotAlerts) != len(expAlerts) {
t.Fatalf("expected %d alerts; got %d", len(expAlerts), len(gotAlerts))
}
for key, exp := range expAlerts {
got, ok := gotAlerts[key]
if !ok {
t.Fatalf("expected to have key %d", key)
}
if got.State != notifier.StatePending {
t.Fatalf("expected state %d; got %d", notifier.StatePending, got.State)
}
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
}
}
fakeGroup := Group{Name: "TestRule_Exec"}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if err := tc.rule.Restore(context.TODO(), fq, time.Hour, nil); err != nil {
t.Fatalf("unexpected err: %s", err)
}
if len(tc.rule.alerts) != len(tc.expAlerts) {
t.Fatalf("expected %d alerts; got %d", len(tc.expAlerts), len(tc.rule.alerts))
}
for key, exp := range tc.expAlerts {
got, ok := tc.rule.alerts[key]
if !ok {
t.Fatalf("expected to have key %d", key)
}
if got.State != exp.State {
t.Fatalf("expected state %d; got %d", exp.State, got.State)
}
if got.ActiveAt != exp.ActiveAt {
t.Fatalf("expected ActiveAt %v; got %v", exp.ActiveAt, got.ActiveAt)
}
}
stateMetric := func(name string, value time.Time, labels ...string) datasource.Metric {
labels = append(labels, "__name__", alertForStateMetricName)
labels = append(labels, alertNameLabel, name)
labels = append(labels, alertGroupNameLabel, "TestRestore")
return metricWithValueAndLabels(t, float64(value.Unix()), labels...)
}
// one active alert, no previous state
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
})
fqr.reset()
// one active alert with state restore
ts := time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// two rules, two active alerts, one with state restored
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("foo", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
{Alert: "bar", Expr: "bar", For: promutils.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// two rules, two active alerts, two with state restored
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo"}[3600s])`,
stateMetric("foo", ts))
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="bar"}[3600s])`,
stateMetric("bar", ts))
fn(
[]config.Rule{
{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)},
{Alert: "bar", Expr: "bar", For: promutils.NewDuration(time.Second)},
},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts,
},
hash(map[string]string{alertNameLabel: "bar", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: ts},
})
// one active alert but wrong state restore
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertname="bar",alertgroup="TestRestore"}[3600s])`,
stateMetric("wrong alert", ts))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore"}): {
ActiveAt: defaultTS,
},
})
// one active alert with labels
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
ActiveAt: ts,
},
})
// one active alert with restore labels missmatch
ts = time.Now().Truncate(time.Hour)
fqr.set(`last_over_time(ALERTS_FOR_STATE{alertgroup="TestRestore",alertname="foo",env="dev"}[3600s])`,
stateMetric("foo", ts, "env", "dev", "team", "foo"))
fn(
[]config.Rule{{Alert: "foo", Expr: "foo", Labels: map[string]string{"env": "dev"}, For: promutils.NewDuration(time.Second)}},
map[uint64]*notifier.Alert{
hash(map[string]string{alertNameLabel: "foo", alertGroupNameLabel: "TestRestore", "env": "dev"}): {
ActiveAt: defaultTS,
},
})
}
}
func TestAlertingRule_Exec_Negative(t *testing.T) {

View File

@@ -5,16 +5,14 @@ import (
"fmt"
"hash/fnv"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config/log"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
)
@@ -38,7 +36,8 @@ type Group struct {
Params url.Values `yaml:"params"`
// Headers contains optional HTTP headers added to each rule request
Headers []Header `yaml:"headers,omitempty"`
// NotifierHeaders contains optional HTTP headers sent to notifiers for generated notifications
NotifierHeaders []Header `yaml:"notifier_headers,omitempty"`
// Catches all undefined fields and must be empty after parsing.
XXX map[string]interface{} `yaml:",inline"`
}
@@ -201,21 +200,45 @@ func (r *Rule) Validate() error {
// ValidateTplFn must validate the given annotations
type ValidateTplFn func(annotations map[string]string) error
// cLogger is a logger with support of logs suppressing.
// it is used when logs emitted by config package needs
// to be suppressed.
var cLogger = &log.Logger{}
// ParseSilent parses rule configs from given file patterns without emitting logs
func ParseSilent(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
cLogger.Suppress(true)
defer cLogger.Suppress(false)
files, err := readFromFS(pathPatterns)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
}
return parse(files, validateTplFn, validateExpressions)
}
// Parse parses rule configs from given file patterns
func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
var fp []string
for _, pattern := range pathPatterns {
matches, err := filepath.Glob(pattern)
if err != nil {
return nil, fmt.Errorf("error reading file pattern %s: %w", pattern, err)
}
fp = append(fp, matches...)
files, err := readFromFS(pathPatterns)
if err != nil {
return nil, fmt.Errorf("failed to read from the config: %s", err)
}
groups, err := parse(files, validateTplFn, validateExpressions)
if err != nil {
return nil, fmt.Errorf("failed to parse %s: %s", pathPatterns, err)
}
if len(groups) < 1 {
cLogger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))
}
return groups, nil
}
func parse(files map[string][]byte, validateTplFn ValidateTplFn, validateExpressions bool) ([]Group, error) {
errGroup := new(utils.ErrGroup)
var groups []Group
for _, file := range fp {
for file, data := range files {
uniqueGroups := map[string]struct{}{}
gr, err := parseFile(file)
gr, err := parseConfig(data)
if err != nil {
errGroup.Add(fmt.Errorf("failed to parse file %q: %w", file, err))
continue
@@ -237,20 +260,19 @@ func Parse(pathPatterns []string, validateTplFn ValidateTplFn, validateExpressio
if err := errGroup.Err(); err != nil {
return nil, err
}
if len(groups) < 1 {
logger.Warnf("no groups found in %s", strings.Join(pathPatterns, ";"))
}
sort.SliceStable(groups, func(i, j int) bool {
if groups[i].File != groups[j].File {
return groups[i].File < groups[j].File
}
return groups[i].Name < groups[j].Name
})
return groups, nil
}
func parseFile(path string) ([]Group, error) {
data, err := os.ReadFile(path)
func parseConfig(data []byte) ([]Group, error) {
data, err := envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("error reading alert rule file %q: %w", path, err)
}
data, err = envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("cannot expand environment vars in %q: %w", path, err)
return nil, fmt.Errorf("cannot expand environment vars: %w", err)
}
g := struct {
Groups []Group `yaml:"groups"`

View File

@@ -1,6 +1,8 @@
package config
import (
"net/http"
"net/http/httptest"
"net/url"
"os"
"strings"
@@ -27,6 +29,40 @@ func TestParseGood(t *testing.T) {
}
}
func TestParseFromURL(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("/bad", func(w http.ResponseWriter, _ *http.Request) {
w.Write([]byte("foo bar"))
})
mux.HandleFunc("/good-alert", func(w http.ResponseWriter, _ *http.Request) {
w.Write([]byte(`
groups:
- name: TestGroup
rules:
- alert: Conns
expr: vm_tcplistener_conns > 0`))
})
mux.HandleFunc("/good-rr", func(w http.ResponseWriter, _ *http.Request) {
w.Write([]byte(`
groups:
- name: TestGroup
rules:
- record: conns
expr: max(vm_tcplistener_conns)`))
})
srv := httptest.NewServer(mux)
defer srv.Close()
if _, err := Parse([]string{srv.URL + "/good-alert", srv.URL + "/good-rr"}, notifier.ValidateTemplates, true); err != nil {
t.Errorf("error parsing URLs %s", err)
}
if _, err := Parse([]string{srv.URL + "/bad"}, notifier.ValidateTemplates, true); err == nil {
t.Errorf("expected parsing error: %s", err)
}
}
func TestParseBad(t *testing.T) {
testCases := []struct {
path []string
@@ -64,6 +100,10 @@ func TestParseBad(t *testing.T) {
[]string{"testdata/dir/rules6-bad.rules"},
"missing ':' in header",
},
{
[]string{"http://unreachable-url"},
"failed to read",
},
}
for _, tc := range testCases {
_, err := Parse(tc.path, notifier.ValidateTemplates, true)
@@ -102,7 +142,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "group name must be set",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{
Record: "record",
@@ -113,7 +154,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{
Record: "record",
@@ -125,7 +167,8 @@ func TestGroup_Validate(t *testing.T) {
validateExpressions: true,
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{
Alert: "alert",
@@ -139,7 +182,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{
Alert: "alert",
@@ -156,7 +200,8 @@ func TestGroup_Validate(t *testing.T) {
validateAnnotations: true,
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{
Alert: "alert",
@@ -171,7 +216,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "duplicate",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
@@ -184,7 +230,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "duplicate",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{Record: "record", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
@@ -197,7 +244,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "duplicate",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
@@ -210,7 +258,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "",
},
{
group: &Group{Name: "test",
group: &Group{
Name: "test",
Rules: []Rule{
{Record: "alert", Expr: "up == 1", Labels: map[string]string{
"summary": "{{ value|query }}",
@@ -223,7 +272,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "",
},
{
group: &Group{Name: "test thanos",
group: &Group{
Name: "test thanos",
Type: NewRawType("thanos"),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
@@ -235,7 +285,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "unknown datasource type",
},
{
group: &Group{Name: "test graphite",
group: &Group{
Name: "test graphite",
Type: NewGraphiteType(),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
@@ -247,7 +298,8 @@ func TestGroup_Validate(t *testing.T) {
expErr: "",
},
{
group: &Group{Name: "test prometheus",
group: &Group{
Name: "test prometheus",
Type: NewPrometheusType(),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
@@ -538,6 +590,24 @@ rules:
`)
})
t.Run("`notifier_headers` change", func(t *testing.T) {
f(t, `
name: TestGroup
notifier_headers:
- "TenantID: foo"
rules:
- alert: foo
expr: sum by(job) (up == 1)
`, `
name: TestGroup
notifier_headers:
- "TenantID: bar"
rules:
- alert: foo
expr: sum by(job) (up == 1)
`)
})
t.Run("`debug` change", func(t *testing.T) {
f(t, `
name: TestGroup

111
app/vmalert/config/fs.go Normal file
View File

@@ -0,0 +1,111 @@
package config
import (
"fmt"
"strings"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config/fslocal"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config/fsurl"
)
// FS represent a file system abstract for reading files.
type FS interface {
// Init initializes FS.
Init() error
// String must return human-readable representation of FS.
String() string
// List returns the list of file names which will be read via Read fn
List() ([]string, error)
// Read returns a list of read files in form of a map
// where key is a file name and value is a content of read file.
// Read must be called only after the successful Init call.
Read(files []string) (map[string][]byte, error)
}
var (
fsRegistryMu sync.Mutex
fsRegistry = make(map[string]FS)
)
// readFromFS parses the given path list and inits FS for each item.
// Once initialed, readFromFS will try to read and return files from each FS.
// readFromFS returns an error if at least one FS failed to init.
// The function can be called multiple times but each unique path
// will be initialed only once.
//
// It is allowed to mix different FS types in path list.
func readFromFS(paths []string) (map[string][]byte, error) {
var err error
result := make(map[string][]byte)
for _, path := range paths {
fsRegistryMu.Lock()
fs, ok := fsRegistry[path]
if !ok {
fs, err = newFS(path)
if err != nil {
fsRegistryMu.Unlock()
return nil, fmt.Errorf("error while parsing path %q: %w", path, err)
}
if err := fs.Init(); err != nil {
fsRegistryMu.Unlock()
return nil, fmt.Errorf("error while initializing path %q: %w", path, err)
}
fsRegistry[path] = fs
}
fsRegistryMu.Unlock()
list, err := fs.List()
if err != nil {
return nil, fmt.Errorf("failed to list files from %q", fs)
}
cLogger.Infof("found %d files to read from %q", len(list), fs)
if len(list) < 1 {
continue
}
ts := time.Now()
files, err := fs.Read(list)
if err != nil {
return nil, fmt.Errorf("error while reading files from %q: %w", fs, err)
}
cLogger.Infof("finished reading %d files in %v from %q", len(list), time.Since(ts), fs)
for k, v := range files {
if _, ok := result[k]; ok {
return nil, fmt.Errorf("duplicate found for file name %q: file names must be unique", k)
}
result[k] = v
}
}
return result, nil
}
// newFS creates FS based on the give path.
// Supported file systems are: fs
func newFS(originPath string) (FS, error) {
scheme := "fs"
path := originPath
n := strings.Index(path, "://")
if n >= 0 {
scheme = path[:n]
path = path[n+len("://"):]
}
if len(path) == 0 {
return nil, fmt.Errorf("path cannot be empty")
}
switch scheme {
case "fs":
return &fslocal.FS{Pattern: path}, nil
case "http", "https":
return &fsurl.FS{Path: originPath}, nil
default:
return nil, fmt.Errorf("unsupported scheme %q", scheme)
}
}

View File

@@ -0,0 +1,39 @@
package config
import (
"strings"
"testing"
)
func TestNewFS(t *testing.T) {
f := func(path, expStr string) {
t.Helper()
fs, err := newFS(path)
if err != nil {
t.Fatalf("unexpected err: %s", err)
}
if fs.String() != expStr {
t.Fatalf("expected FS %q; got %q", expStr, fs.String())
}
}
f("/foo/bar", "Local FS{MatchPattern: \"/foo/bar\"}")
f("fs:///foo/bar", "Local FS{MatchPattern: \"/foo/bar\"}")
}
func TestNewFSNegative(t *testing.T) {
f := func(path, expErr string) {
t.Helper()
_, err := newFS(path)
if err == nil {
t.Fatalf("expected to have err: %s", expErr)
}
if !strings.Contains(err.Error(), expErr) {
t.Fatalf("expected to have err %q; got %q instead", expErr, err)
}
}
f("", "path cannot be empty")
f("fs://", "path cannot be empty")
f("foobar://baz", `unsupported scheme "foobar"`)
}

View File

@@ -0,0 +1,50 @@
package fslocal
import (
"fmt"
"os"
"github.com/bmatcuk/doublestar/v4"
)
// FS represents a local file system
type FS struct {
// Pattern is used for matching one or multiple files.
// The pattern may describe hierarchical names such as
// /usr/*/bin/ed (assuming the Separator is '/').
Pattern string
}
// Init verifies that configured Pattern is correct
func (fs *FS) Init() error {
_, err := doublestar.FilepathGlob(fs.Pattern)
return err
}
// String implements Stringer interface
func (fs *FS) String() string {
return fmt.Sprintf("Local FS{MatchPattern: %q}", fs.Pattern)
}
// List returns the list of file names which will be read via Read fn
func (fs *FS) List() ([]string, error) {
matches, err := doublestar.FilepathGlob(fs.Pattern)
if err != nil {
return nil, fmt.Errorf("error while matching files via pattern %s: %w", fs.Pattern, err)
}
return matches, nil
}
// Read returns a map of read files where
// key is the file name and value is file's content.
func (fs *FS) Read(files []string) (map[string][]byte, error) {
result := make(map[string][]byte)
for _, path := range files {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("error while reading file %q: %w", path, err)
}
result[path] = data
}
return result, nil
}

View File

@@ -0,0 +1,57 @@
package fsurl
import (
"fmt"
"io"
"net/http"
"net/url"
)
// FS represents a struct which can read content from URL Path
type FS struct {
// Path defines the URL to read the data from
Path string
}
// Init verifies that configured Path is correct
func (fs *FS) Init() error {
_, err := url.Parse(fs.Path)
return err
}
// String implements Stringer interface
func (fs *FS) String() string {
return fmt.Sprintf("URL {Path: %q}", fs.Path)
}
// List returns the list of file names which will be read via Read fn
// List isn't supported by FS and reads from Path only
func (fs *FS) List() ([]string, error) {
return []string{fs.Path}, nil
}
// Read returns a map of read files where
// key is the file name and value is file's content.
func (fs *FS) Read(files []string) (map[string][]byte, error) {
result := make(map[string][]byte)
for _, path := range files {
resp, err := http.Get(path)
if err != nil {
return nil, fmt.Errorf("failed to read from %q: %w", path, err)
}
data, err := io.ReadAll(resp.Body)
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
if len(data) > 4*1024 {
data = data[:4*1024]
}
return nil, fmt.Errorf("unexpected status code when fetching %q: %d, expecting %d; response: %q",
path, resp.StatusCode, http.StatusOK, data)
}
if err != nil {
return nil, fmt.Errorf("cannot read %q: %s", path, err)
}
result[path] = data
}
return result, nil
}

View File

@@ -0,0 +1,59 @@
package log
import (
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
// Logger is using lib/logger for logging
// but can be suppressed via Suppress method
type Logger struct {
mu sync.RWMutex
disabled bool
}
// Suppress whether to ignore message logging.
// Once suppressed, logging continues to be ignored
// until logger is un-suppressed.
func (l *Logger) Suppress(v bool) {
l.mu.Lock()
l.disabled = v
l.mu.Unlock()
}
func (l *Logger) isDisabled() bool {
l.mu.RLock()
defer l.mu.RUnlock()
return l.disabled
}
// Errorf logs error message.
func (l *Logger) Errorf(format string, args ...interface{}) {
if l.isDisabled() {
return
}
logger.Errorf(format, args...)
}
// Warnf logs warning message.
func (l *Logger) Warnf(format string, args ...interface{}) {
if l.isDisabled() {
return
}
logger.Warnf(format, args...)
}
// Infof logs info message.
func (l *Logger) Infof(format string, args ...interface{}) {
if l.isDisabled() {
return
}
logger.Infof(format, args...)
}
// Panicf logs panic message and panics.
// Panicf can't be suppressed
func (l *Logger) Panicf(format string, args ...interface{}) {
logger.Panicf(format, args...)
}

View File

@@ -0,0 +1,54 @@
package log
import (
"bytes"
"fmt"
"strings"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
func TestOutput(t *testing.T) {
testOutput := &bytes.Buffer{}
logger.SetOutputForTests(testOutput)
defer logger.ResetOutputForTest()
log := &Logger{}
mustMatch := func(exp string) {
t.Helper()
if exp == "" {
if testOutput.String() != "" {
t.Errorf("expected output to be empty; got %q", testOutput.String())
return
}
}
if !strings.Contains(testOutput.String(), exp) {
t.Errorf("output %q should contain %q", testOutput.String(), exp)
}
fmt.Println(testOutput.String())
testOutput.Reset()
}
log.Warnf("foo")
mustMatch("foo")
log.Infof("info %d", 2)
mustMatch("info 2")
log.Errorf("error %s %d", "baz", 5)
mustMatch("error baz 5")
log.Suppress(true)
log.Warnf("foo")
mustMatch("")
log.Infof("info %d", 2)
mustMatch("")
log.Errorf("error %q %d", "baz", 5)
mustMatch("")
}

View File

@@ -5,6 +5,8 @@ groups:
limit: 1000
headers:
- "MyHeader: foo"
notifier_headers:
- "MyHeader: foo"
params:
denyPartialResponse: ["true"]
rules:

View File

@@ -13,11 +13,22 @@ type Querier interface {
// It returns list of Metric in response, the http.Request used for sending query
// and error if any. Returned http.Request can't be reused and its body is already read.
// Query should stop once ctx is cancelled.
Query(ctx context.Context, query string, ts time.Time) ([]Metric, *http.Request, error)
Query(ctx context.Context, query string, ts time.Time) (Result, *http.Request, error)
// QueryRange executes range request with the given query on the given time range.
// It returns list of Metric in response and error if any.
// QueryRange should stop once ctx is cancelled.
QueryRange(ctx context.Context, query string, from, to time.Time) ([]Metric, error)
QueryRange(ctx context.Context, query string, from, to time.Time) (Result, error)
}
// Result represents expected response from the datasource
type Result struct {
// Data contains list of received Metric
Data []Metric
// SeriesFetched contains amount of time series processed by datasource
// during query evaluation.
// If nil, then this feature is not supported by the datasource.
// SeriesFetched is supported by VictoriaMetrics since v1.90.
SeriesFetched *int
}
// QuerierBuilder builds Querier with given params.
@@ -72,6 +83,15 @@ func (m *Metric) AddLabel(key, value string) {
m.Labels = append(m.Labels, Label{Name: key, Value: value})
}
// DelLabel deletes the given label from the label set
func (m *Metric) DelLabel(key string) {
for i, l := range m.Labels {
if l.Name == key {
m.Labels = append(m.Labels[:i], m.Labels[i+1:]...)
}
}
}
// Label returns the given label value.
// If label is missing empty string will be returned
func (m *Metric) Label(key string) string {

View File

@@ -47,7 +47,7 @@ var (
"For example, if -datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query. "+
"If set to 0, rule's evaluation interval will be used instead.")
queryTimeAlignment = flag.Bool("datasource.queryTimeAlignment", true, `Whether to align "time" parameter with evaluation interval.`+
"Alignment supposed to produce deterministic results despite of number of vmalert replicas or time they were started. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1257")
"Alignment supposed to produce deterministic results despite number of vmalert replicas or time they were started. See more details here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1257")
maxIdleConnections = flag.Int("datasource.maxIdleConnections", 100, `Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state.`)
disableKeepAlive = flag.Bool("datasource.disableKeepAlive", false, `Whether to disable long-lived connections to the datasource. `+
`If true, disables HTTP keep-alives and will only use the connection to the server for a single HTTP request.`)

View File

@@ -2,6 +2,7 @@ package datasource
import (
"context"
"errors"
"fmt"
"io"
"net/http"
@@ -98,10 +99,10 @@ func NewVMStorage(baseURL string, authCfg *promauth.Config, lookBack time.Durati
}
// Query executes the given query and returns parsed response
func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) ([]Metric, *http.Request, error) {
func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) (Result, *http.Request, error) {
req, err := s.newRequestPOST()
if err != nil {
return nil, nil, err
return Result{}, nil, err
}
switch s.dataSourceType {
@@ -110,12 +111,12 @@ func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) ([]Me
case datasourceGraphite:
s.setGraphiteReqParams(req, query, ts)
default:
return nil, nil, fmt.Errorf("engine not found: %q", s.dataSourceType)
return Result{}, nil, fmt.Errorf("engine not found: %q", s.dataSourceType)
}
resp, err := s.do(ctx, req)
if err != nil {
return nil, req, err
return Result{}, req, err
}
defer func() {
_ = resp.Body.Close()
@@ -132,24 +133,24 @@ func (s *VMStorage) Query(ctx context.Context, query string, ts time.Time) ([]Me
// QueryRange executes the given query on the given time range.
// For Prometheus type see https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries
// Graphite type isn't supported.
func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end time.Time) ([]Metric, error) {
func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end time.Time) (res Result, err error) {
if s.dataSourceType != datasourcePrometheus {
return nil, fmt.Errorf("%q is not supported for QueryRange", s.dataSourceType)
return res, fmt.Errorf("%q is not supported for QueryRange", s.dataSourceType)
}
req, err := s.newRequestPOST()
if err != nil {
return nil, err
return res, err
}
if start.IsZero() {
return nil, fmt.Errorf("start param is missing")
return res, fmt.Errorf("start param is missing")
}
if end.IsZero() {
return nil, fmt.Errorf("end param is missing")
return res, fmt.Errorf("end param is missing")
}
s.setPrometheusRangeReqParams(req, query, start, end)
resp, err := s.do(ctx, req)
if err != nil {
return nil, err
return res, err
}
defer func() {
_ = resp.Body.Close()
@@ -162,6 +163,11 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
logger.Infof("DEBUG datasource request: executing %s request with params %q", req.Method, req.URL.RawQuery)
}
resp, err := s.c.Do(req.WithContext(ctx))
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
// something in the middle between client and datasource might be closing
// the connection. So we do a one more attempt in hope request will succeed.
resp, err = s.c.Do(req.WithContext(ctx))
}
if err != nil {
return nil, fmt.Errorf("error getting response from %s: %w", req.URL.Redacted(), err)
}
@@ -174,7 +180,7 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
}
func (s *VMStorage) newRequestPOST() (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
req, err := http.NewRequest(http.MethodPost, s.datasourceURL, nil)
if err != nil {
return nil, err
}

View File

@@ -35,12 +35,12 @@ func (r graphiteResponse) metrics() []Metric {
return ms
}
func parseGraphiteResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
func parseGraphiteResponse(req *http.Request, resp *http.Response) (Result, error) {
r := &graphiteResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL.Redacted(), err)
return Result{}, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL.Redacted(), err)
}
return r.metrics(), nil
return Result{Data: r.metrics()}, nil
}
const (

View File

@@ -22,6 +22,10 @@ type promResponse struct {
ResultType string `json:"resultType"`
Result json.RawMessage `json:"result"`
} `json:"data"`
// Stats supported by VictoriaMetrics since v1.90
Stats struct {
SeriesFetched *string `json:"seriesFetched,omitempty"`
} `json:"stats,omitempty"`
}
type promInstant struct {
@@ -96,39 +100,54 @@ const (
rtVector, rtMatrix, rScalar = "vector", "matrix", "scalar"
)
func parsePrometheusResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
func parsePrometheusResponse(req *http.Request, resp *http.Response) (res Result, err error) {
r := &promResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL.Redacted(), err)
if err = json.NewDecoder(resp.Body).Decode(r); err != nil {
return res, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL.Redacted(), err)
}
if r.Status == statusError {
return nil, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL.Redacted(), r.ErrorType, r.Error)
return res, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL.Redacted(), r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return nil, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
return res, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
}
var parseFn func() ([]Metric, error)
switch r.Data.ResultType {
case rtVector:
var pi promInstant
if err := json.Unmarshal(r.Data.Result, &pi.Result); err != nil {
return nil, fmt.Errorf("umarshal err %s; \n %#v", err, string(r.Data.Result))
return res, fmt.Errorf("umarshal err %s; \n %#v", err, string(r.Data.Result))
}
return pi.metrics()
parseFn = pi.metrics
case rtMatrix:
var pr promRange
if err := json.Unmarshal(r.Data.Result, &pr.Result); err != nil {
return nil, err
return res, err
}
return pr.metrics()
parseFn = pr.metrics
case rScalar:
var ps promScalar
if err := json.Unmarshal(r.Data.Result, &ps); err != nil {
return nil, err
return res, err
}
return ps.metrics()
parseFn = ps.metrics
default:
return nil, fmt.Errorf("unknown result type %q", r.Data.ResultType)
return res, fmt.Errorf("unknown result type %q", r.Data.ResultType)
}
ms, err := parseFn()
if err != nil {
return res, err
}
res = Result{Data: ms}
if r.Stats.SeriesFetched != nil {
intV, err := strconv.Atoi(*r.Stats.SeriesFetched)
if err != nil {
return res, fmt.Errorf("failed to convert stats.seriesFetched to int: %w", err)
}
res.SeriesFetched = &intV
}
return res, nil
}
func (s *VMStorage) setPrometheusInstantReqParams(r *http.Request, query string, timestamp time.Time) {

View File

@@ -35,13 +35,6 @@ func TestVMInstantQuery(t *testing.T) {
t.Errorf("should not be called")
})
c := -1
mux.HandleFunc("/render", func(w http.ResponseWriter, request *http.Request) {
c++
switch c {
case 8:
w.Write([]byte(`[{"target":"constantLine(10)","tags":{"name":"constantLine(10)"},"datapoints":[[10,1611758343],[10,1611758373],[10,1611758403]]}]`))
}
})
mux.HandleFunc("/api/v1/query", func(w http.ResponseWriter, r *http.Request) {
c++
if r.Method != http.MethodPost {
@@ -62,22 +55,28 @@ func TestVMInstantQuery(t *testing.T) {
}
switch c {
case 0:
conn, _, _ := w.(http.Hijacker).Hijack()
_ = conn.Close()
case 1:
w.WriteHeader(500)
case 2:
case 1:
w.Write([]byte("[]"))
case 3:
case 2:
w.Write([]byte(`{"status":"error", "errorType":"type:", "error":"some error msg"}`))
case 4:
case 3:
w.Write([]byte(`{"status":"unknown"}`))
case 5:
case 4:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix"}}`))
case 6:
case 5:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows","foo":"bar"},"value":[1583786142,"13763"]},{"metric":{"__name__":"vm_requests","foo":"baz"},"value":[1583786140,"2000"]}]}}`))
case 7:
case 6:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
case 7:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]},"stats":{"seriesFetched": "42"}}`))
}
})
mux.HandleFunc("/render", func(w http.ResponseWriter, request *http.Request) {
c++
switch c {
case 8:
w.Write([]byte(`[{"target":"constantLine(10)","tags":{"name":"constantLine(10)"},"datapoints":[[10,1611758343],[10,1611758373],[10,1611758403]]}]`))
}
})
@@ -95,24 +94,27 @@ func TestVMInstantQuery(t *testing.T) {
ts := time.Now()
expErr := func(err string) {
if _, _, err := pq.Query(ctx, query, ts); err == nil {
_, _, gotErr := pq.Query(ctx, query, ts)
if gotErr == nil {
t.Fatalf("expected %q got nil", err)
}
if !strings.Contains(gotErr.Error(), err) {
t.Fatalf("expected err %q; got %q", err, gotErr)
}
}
expErr("connection error") // 0
expErr("invalid response status error") // 1
expErr("response body error") // 2
expErr("error status") // 3
expErr("unknown status") // 4
expErr("non-vector resultType error") // 5
expErr("500") // 0
expErr("error parsing prometheus metrics") // 1
expErr("response error") // 2
expErr("unknown status") // 3
expErr("unexpected end of JSON input") // 4
m, _, err := pq.Query(ctx, query, ts) // 6 - vector
res, _, err := pq.Query(ctx, query, ts) // 5 - vector
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 2 {
t.Fatalf("expected 2 metrics got %d in %+v", len(m), m)
if len(res.Data) != 2 {
t.Fatalf("expected 2 metrics got %d in %+v", len(res.Data), res.Data)
}
expected := []Metric{
{
@@ -126,17 +128,17 @@ func TestVMInstantQuery(t *testing.T) {
Values: []float64{2000},
},
}
metricsEqual(t, m, expected)
metricsEqual(t, res.Data, expected)
m, req, err := pq.Query(ctx, query, ts) // 7 - scalar
res, req, err := pq.Query(ctx, query, ts) // 6 - scalar
if err != nil {
t.Fatalf("unexpected %s", err)
}
if req == nil {
t.Fatalf("expected request to be non-nil")
}
if len(m) != 1 {
t.Fatalf("expected 1 metrics got %d in %+v", len(m), m)
if len(res.Data) != 1 {
t.Fatalf("expected 1 metrics got %d in %+v", len(res.Data), res.Data)
}
expected = []Metric{
{
@@ -144,18 +146,44 @@ func TestVMInstantQuery(t *testing.T) {
Values: []float64{1},
},
}
if !reflect.DeepEqual(m, expected) {
t.Fatalf("unexpected metric %+v want %+v", m, expected)
if !reflect.DeepEqual(res.Data, expected) {
t.Fatalf("unexpected metric %+v want %+v", res.Data, expected)
}
if res.SeriesFetched != nil {
t.Fatalf("expected `seriesFetched` field to be nil when it is missing in datasource response; got %v instead",
res.SeriesFetched)
}
res, _, err = pq.Query(ctx, query, ts) // 7 - scalar with stats
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(res.Data) != 1 {
t.Fatalf("expected 1 metrics got %d in %+v", len(res.Data), res)
}
expected = []Metric{
{
Timestamps: []int64{1583786142},
Values: []float64{1},
},
}
if !reflect.DeepEqual(res.Data, expected) {
t.Fatalf("unexpected metric %+v want %+v", res.Data, expected)
}
if *res.SeriesFetched != 42 {
t.Fatalf("expected `seriesFetched` field to be 42; got %d instead",
*res.SeriesFetched)
}
gq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourceGraphite)})
m, _, err = gq.Query(ctx, queryRender, ts) // 8 - graphite
res, _, err = gq.Query(ctx, queryRender, ts) // 8 - graphite
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
if len(res.Data) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(res.Data), res.Data)
}
exp := []Metric{
{
@@ -164,7 +192,77 @@ func TestVMInstantQuery(t *testing.T) {
Values: []float64{10},
},
}
metricsEqual(t, m, exp)
metricsEqual(t, res.Data, exp)
}
func TestVMInstantQueryWithRetry(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Errorf("should not be called")
})
c := -1
mux.HandleFunc("/api/v1/query", func(w http.ResponseWriter, r *http.Request) {
c++
if r.URL.Query().Get("query") != query {
t.Errorf("expected %s in query param, got %s", query, r.URL.Query().Get("query"))
}
switch c {
case 0:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "1"]}}`))
case 1:
conn, _, _ := w.(http.Hijacker).Hijack()
_ = conn.Close()
case 2:
w.Write([]byte(`{"status":"success","data":{"resultType":"scalar","result":[1583786142, "2"]}}`))
case 3:
conn, _, _ := w.(http.Hijacker).Hijack()
_ = conn.Close()
case 4:
conn, _, _ := w.(http.Hijacker).Hijack()
_ = conn.Close()
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
s := NewVMStorage(srv.URL, nil, time.Minute, 0, false, srv.Client())
pq := s.BuildWithParams(QuerierParams{DataSourceType: string(datasourcePrometheus)})
expErr := func(err string) {
_, _, gotErr := pq.Query(ctx, query, time.Now())
if gotErr == nil {
t.Fatalf("expected %q got nil", err)
}
if !strings.Contains(gotErr.Error(), err) {
t.Fatalf("expected err %q; got %q", err, gotErr)
}
}
expValue := func(v float64) {
res, _, err := pq.Query(ctx, query, time.Now())
if err != nil {
t.Fatalf("unexpected %s", err)
}
m := res.Data
if len(m) != 1 {
t.Fatalf("expected 1 metrics got %d in %+v", len(m), m)
}
expected := []Metric{
{
Timestamps: []int64{1583786142},
Values: []float64{v},
},
}
if !reflect.DeepEqual(m, expected) {
t.Fatalf("unexpected metric %+v want %+v", m, expected)
}
}
expValue(1) // 0
expValue(2) // 1 - fail, 2 - retry
expErr("EOF") // 3, 4 - retries
}
func metricsEqual(t *testing.T, gotM, expectedM []Metric) {
@@ -250,10 +348,11 @@ func TestVMRangeQuery(t *testing.T) {
start, end := time.Now().Add(-time.Minute), time.Now()
m, err := pq.QueryRange(ctx, query, start, end)
res, err := pq.QueryRange(ctx, query, start, end)
if err != nil {
t.Fatalf("unexpected %s", err)
}
m := res.Data
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}

View File

@@ -2,6 +2,7 @@ package main
import (
"context"
"errors"
"fmt"
"hash/fnv"
"net/url"
@@ -35,15 +36,19 @@ type Group struct {
Checksum string
LastEvaluation time.Time
Labels map[string]string
Params url.Values
Headers map[string]string
Labels map[string]string
Params url.Values
Headers map[string]string
NotifierHeaders map[string]string
doneCh chan struct{}
finishedCh chan struct{}
// channel accepts new Group obj
// which supposed to update current group
updateCh chan *Group
// evalCancel stores the cancel fn for interrupting
// rules evaluation. Used on groups update() and close().
evalCancel context.CancelFunc
metrics *groupMetrics
}
@@ -89,16 +94,17 @@ func mergeLabels(groupName, ruleName string, set1, set2 map[string]string) map[s
func newGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval time.Duration, labels map[string]string) *Group {
g := &Group{
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval.Duration(),
Limit: cfg.Limit,
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
Params: cfg.Params,
Headers: make(map[string]string),
Labels: cfg.Labels,
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval.Duration(),
Limit: cfg.Limit,
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
Params: cfg.Params,
Headers: make(map[string]string),
NotifierHeaders: make(map[string]string),
Labels: cfg.Labels,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
@@ -113,6 +119,9 @@ func newGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval ti
for _, h := range cfg.Headers {
g.Headers[h.Key] = h.Value
}
for _, h := range cfg.NotifierHeaders {
g.NotifierHeaders[h.Key] = h.Value
}
g.metrics = newGroupMetrics(g)
rules := make([]Rule, len(cfg.Rules))
for i, r := range cfg.Rules {
@@ -158,23 +167,23 @@ func (g *Group) ID() uint64 {
}
// Restore restores alerts state for group rules
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, lookback time.Duration, labels map[string]string) error {
labels = mergeLabels(g.Name, "", labels, g.Labels)
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, ts time.Time, lookback time.Duration) error {
for _, rule := range g.Rules {
rr, ok := rule.(*AlertingRule)
ar, ok := rule.(*AlertingRule)
if !ok {
continue
}
if rr.For < 1 {
if ar.For < 1 {
continue
}
// ignore QueryParams on purpose, because they could contain
// query filters. This may affect the restore procedure.
q := qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: g.Type.String(),
Headers: g.Headers,
DataSourceType: g.Type.String(),
EvaluationInterval: g.Interval,
QueryParams: g.Params,
Headers: g.Headers,
Debug: ar.Debug,
})
if err := rr.Restore(ctx, q, lookback, labels); err != nil {
if err := ar.Restore(ctx, q, ts, lookback); err != nil {
return fmt.Errorf("error while restoring rule %q: %w", rule, err)
}
}
@@ -226,6 +235,7 @@ func (g *Group) updateWith(newGroup *Group) error {
g.Concurrency = newGroup.Concurrency
g.Params = newGroup.Params
g.Headers = newGroup.Headers
g.NotifierHeaders = newGroup.NotifierHeaders
g.Labels = newGroup.Labels
g.Limit = newGroup.Limit
g.Checksum = newGroup.Checksum
@@ -233,11 +243,24 @@ func (g *Group) updateWith(newGroup *Group) error {
return nil
}
// interruptEval interrupts in-flight rules evaluations
// within the group. It is expected that g.evalCancel
// will be repopulated after the call.
func (g *Group) interruptEval() {
g.mu.RLock()
defer g.mu.RUnlock()
if g.evalCancel != nil {
g.evalCancel()
}
}
func (g *Group) close() {
if g.doneCh == nil {
return
}
close(g.doneCh)
g.interruptEval()
<-g.finishedCh
g.metrics.iterationDuration.Unregister()
@@ -251,14 +274,9 @@ func (g *Group) close() {
var skipRandSleepOnGroupStart bool
func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *remotewrite.Client) {
func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *remotewrite.Client, rr datasource.QuerierBuilder) {
defer func() { close(g.finishedCh) }()
e := &executor{
rw: rw,
notifiers: nts,
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label)}
// Spread group rules evaluation over time in order to reduce load on VictoriaMetrics.
if !skipRandSleepOnGroupStart {
randSleep := uint64(float64(g.Interval) * (float64(g.ID()) / (1 << 64)))
@@ -279,11 +297,18 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
}
}
e := &executor{
rw: rw,
notifiers: nts,
notifierHeaders: g.NotifierHeaders,
previouslySentSeriesToRW: make(map[uint64]map[string][]prompbmarshal.Label),
}
evalTS := time.Now()
logger.Infof("group %q started; interval=%v; concurrency=%d", g.Name, g.Interval, g.Concurrency)
eval := func(ts time.Time) {
eval := func(ctx context.Context, ts time.Time) {
g.metrics.iterationTotal.Inc()
start := time.Now()
@@ -305,10 +330,26 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
g.LastEvaluation = start
}
eval(evalTS)
evalCtx, cancel := context.WithCancel(ctx)
g.mu.Lock()
g.evalCancel = cancel
g.mu.Unlock()
defer g.evalCancel()
eval(evalCtx, evalTS)
t := time.NewTicker(g.Interval)
defer t.Stop()
// restore the rules state after the first evaluation
// so only active alerts can be restored.
if rr != nil {
err := g.Restore(ctx, rr, evalTS, *remoteReadLookBack)
if err != nil {
logger.Errorf("error while restoring ruleState for group %q: %s", g.Name, err)
}
}
for {
select {
case <-ctx.Done():
@@ -319,6 +360,14 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
return
case ng := <-g.updateCh:
g.mu.Lock()
// it is expected that g.evalCancel will be evoked
// somewhere else to unblock group from the rules evaluation.
// we recreate the evalCtx and g.evalCancel, so it can
// be called again.
evalCtx, cancel = context.WithCancel(ctx)
g.evalCancel = cancel
err := g.updateWith(ng)
if err != nil {
logger.Errorf("group %q: failed to update: %s", g.Name, err)
@@ -329,6 +378,8 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
// ensure that staleness is tracked or existing rules only
e.purgeStaleSeries(g.Rules)
e.notifierHeaders = g.NotifierHeaders
if g.Interval != ng.Interval {
g.Interval = ng.Interval
t.Stop()
@@ -343,7 +394,7 @@ func (g *Group) start(ctx context.Context, nts func() []notifier.Notifier, rw *r
}
evalTS = evalTS.Add((missed + 1) * g.Interval)
eval(evalTS)
eval(evalCtx, evalTS)
}
}
}
@@ -362,8 +413,10 @@ func getResolveDuration(groupInterval, delta, maxDuration time.Duration) time.Du
}
type executor struct {
notifiers func() []notifier.Notifier
rw *remotewrite.Client
notifiers func() []notifier.Notifier
notifierHeaders map[string]string
rw *remotewrite.Client
previouslySentSeriesToRWMu sync.Mutex
// previouslySentSeriesToRW stores series sent to RW on previous iteration
@@ -417,6 +470,11 @@ func (e *executor) exec(ctx context.Context, rule Rule, ts time.Time, resolveDur
tss, err := rule.Exec(ctx, ts, limit)
if err != nil {
if errors.Is(err, context.Canceled) {
// the context can be cancelled on graceful shutdown
// or on group update. So no need to handle the error as usual.
return nil
}
execErrors.Inc()
return fmt.Errorf("rule %q: failed to execute: %w", rule, err)
}
@@ -458,7 +516,7 @@ func (e *executor) exec(ctx context.Context, rule Rule, ts time.Time, resolveDur
for _, nt := range e.notifiers() {
wg.Add(1)
go func(nt notifier.Notifier) {
if err := nt.Send(ctx, alerts); err != nil {
if err := nt.Send(ctx, alerts, e.notifierHeaders); err != nil {
errGr.Add(fmt.Errorf("rule %q: failed to send alerts to addr %q: %w", rule, nt.Addr(), err))
}
wg.Done()

View File

@@ -209,7 +209,7 @@ func TestGroupStart(t *testing.T) {
fs.add(m1)
fs.add(m2)
go func() {
g.start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil)
g.start(context.Background(), func() []notifier.Notifier { return []notifier.Notifier{fn} }, nil, fs)
close(finished)
}()
@@ -474,3 +474,31 @@ func TestFaultyRW(t *testing.T) {
t.Fatalf("expected to get an error from faulty RW client, got nil instead")
}
}
func TestCloseWithEvalInterruption(t *testing.T) {
groups, err := config.Parse([]string{"config/testdata/rules/rules1-good.rules"}, notifier.ValidateTemplates, true)
if err != nil {
t.Fatalf("failed to parse rules: %s", err)
}
const delay = time.Second * 2
fq := &fakeQuerierWithDelay{delay: delay}
const evalInterval = time.Millisecond
g := newGroup(groups[0], fq, evalInterval, nil)
go g.start(context.Background(), nil, nil, nil)
time.Sleep(evalInterval * 20)
go func() {
g.close()
}()
deadline := time.Tick(delay / 2)
select {
case <-deadline:
t.Fatalf("deadline for close exceeded")
case <-g.finishedCh:
}
}

View File

@@ -44,21 +44,82 @@ func (fq *fakeQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Qu
return fq
}
func (fq *fakeQuerier) QueryRange(ctx context.Context, q string, _, _ time.Time) ([]datasource.Metric, error) {
func (fq *fakeQuerier) QueryRange(ctx context.Context, q string, _, _ time.Time) (datasource.Result, error) {
req, _, err := fq.Query(ctx, q, time.Now())
return req, err
}
func (fq *fakeQuerier) Query(_ context.Context, _ string, _ time.Time) ([]datasource.Metric, *http.Request, error) {
func (fq *fakeQuerier) Query(_ context.Context, _ string, _ time.Time) (datasource.Result, *http.Request, error) {
fq.Lock()
defer fq.Unlock()
if fq.err != nil {
return nil, nil, fq.err
return datasource.Result{}, nil, fq.err
}
cp := make([]datasource.Metric, len(fq.metrics))
copy(cp, fq.metrics)
req, _ := http.NewRequest(http.MethodPost, "foo.com", nil)
return cp, req, nil
return datasource.Result{Data: cp}, req, nil
}
type fakeQuerierWithRegistry struct {
sync.Mutex
registry map[string][]datasource.Metric
}
func (fqr *fakeQuerierWithRegistry) set(key string, metrics ...datasource.Metric) {
fqr.Lock()
if fqr.registry == nil {
fqr.registry = make(map[string][]datasource.Metric)
}
fqr.registry[key] = metrics
fqr.Unlock()
}
func (fqr *fakeQuerierWithRegistry) reset() {
fqr.Lock()
fqr.registry = nil
fqr.Unlock()
}
func (fqr *fakeQuerierWithRegistry) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fqr
}
func (fqr *fakeQuerierWithRegistry) QueryRange(ctx context.Context, q string, _, _ time.Time) (datasource.Result, error) {
req, _, err := fqr.Query(ctx, q, time.Now())
return req, err
}
func (fqr *fakeQuerierWithRegistry) Query(_ context.Context, expr string, _ time.Time) (datasource.Result, *http.Request, error) {
fqr.Lock()
defer fqr.Unlock()
req, _ := http.NewRequest(http.MethodPost, "foo.com", nil)
metrics, ok := fqr.registry[expr]
if !ok {
return datasource.Result{}, req, nil
}
cp := make([]datasource.Metric, len(metrics))
copy(cp, metrics)
return datasource.Result{Data: cp}, req, nil
}
type fakeQuerierWithDelay struct {
fakeQuerier
delay time.Duration
}
func (fqd *fakeQuerierWithDelay) Query(ctx context.Context, expr string, ts time.Time) (datasource.Result, *http.Request, error) {
timer := time.NewTimer(fqd.delay)
select {
case <-ctx.Done():
case <-timer.C:
}
return fqd.fakeQuerier.Query(ctx, expr, ts)
}
func (fqd *fakeQuerierWithDelay) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fqd
}
type fakeNotifier struct {
@@ -70,7 +131,7 @@ type fakeNotifier struct {
func (*fakeNotifier) Close() {}
func (*fakeNotifier) Addr() string { return "" }
func (fn *fakeNotifier) Send(_ context.Context, alerts []notifier.Alert) error {
func (fn *fakeNotifier) Send(_ context.Context, alerts []notifier.Alert, _ map[string]string) error {
fn.Lock()
defer fn.Unlock()
fn.counter += len(alerts)
@@ -94,7 +155,7 @@ type faultyNotifier struct {
fakeNotifier
}
func (fn *faultyNotifier) Send(ctx context.Context, _ []notifier.Alert) error {
func (fn *faultyNotifier) Send(ctx context.Context, _ []notifier.Alert, _ map[string]string) error {
d, ok := ctx.Deadline()
if ok {
time.Sleep(time.Until(d))

View File

@@ -10,6 +10,8 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
@@ -24,56 +26,67 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
rulePath = flagutil.NewArrayString("rule", `Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
rulePath = flagutil.NewArrayString("rule", `Path to the files or http url with alerting and/or recording rules.
Supports hierarchical patterns and regexpes.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
-rule="/path/to/file". Path to a single file with alerting rules.
-rule="http://<some-server-addr>/path/to/rules". HTTP URL to a page with alerting rules.
-rule="dir/*.yaml" -rule="/*.yaml" -rule="gcs://vmalert-rules/tenant_%{TENANT_ID}/prod".
-rule="dir/**/*.yaml". Includes all the .yaml files in "dir" subfolders recursively.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Enterprise version of vmalert supports S3 and GCS paths to rules.
For example: gs://bucket/path/to/rules, s3://bucket/path/to/rules
S3 and GCS paths support only matching by prefix, e.g. s3://bucket/dir/rule_ matches
all files with prefix rule_ in folder dir.
See https://docs.victoriametrics.com/vmalert.html#reading-rules-from-object-storage
`)
ruleTemplatesPath = flagutil.NewArrayString("rule.templates", `Path or glob pattern to location with go template definitions
for rules annotations templating. Flag can be specified multiple times.
Examples:
-rule.templates="/path/to/file". Path to a single file with go templates
-rule.templates="dir/*.tpl" -rule.templates="/*.tpl". Relative path to all .tpl files in "dir" folder,
absolute path to all .tpl files in root.`)
absolute path to all .tpl files in root.
-rule.templates="dir/**/*.tpl". Includes all the .tpl files in "dir" subfolders recursively.
`)
rulesCheckInterval = flag.Duration("rule.configCheckInterval", 0, "Interval for checking for changes in '-rule' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead")
"By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes. DEPRECATED - see '-configCheckInterval' instead")
configCheckInterval = flag.Duration("configCheckInterval", 0, "Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
"By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
validateTemplates = flag.Bool("rule.validateTemplates", true, "Whether to validate annotation and label templates")
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
"which is by default equal to 3 evaluation intervals of the parent group.")
"which by default is 4 times evaluationInterval of the parent group.")
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier")
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overriden per rule via update_entries_limit param.")
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param.")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager `+
`for cases where you want to build a custom link to Grafana, Prometheus or any other service. `+
`Supports templating - see https://docs.victoriametrics.com/vmalert.html#templating . `+
`For example, link to Grafana: -external.alert.source='explore?orgId=1&left=["now-1h","now","VictoriaMetrics",{"expr":{{$expr|jsonEscape|queryEscape}} },{"mode":"Metrics"},{"ui":[true,true,true,"none"]}]' . `+
`For example, link to Grafana: -external.alert.source='explore?orgId=1&left={"datasource":"VictoriaMetrics","queries":[{"expr":{{$expr|jsonEscape|queryEscape}},"refId":"A"}],"range":{"from":"now-1h","to":"now"}}'. `+
`Link to VMUI: -external.alert.source='vmui/#/?g0.expr={{.Expr|queryEscape}}'. `+
`If empty 'vmalert/alert?group_id={{.GroupID}}&alert_id={{.AlertID}}' is used.`)
externalLabels = flagutil.NewArrayString("external.label", "Optional label in the form 'Name=value' to add to all generated recording rules and alerts. "+
"Pass multiple -label flags in order to add multiple label sets.")
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup. DEPRECATED - this flag has no effect and will be removed in the next releases.")
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's Name as label to generated alerts and time series.")
@@ -94,6 +107,10 @@ func main() {
logger.Init()
pushmetrics.Init()
if !*remoteReadIgnoreRestoreErrors {
logger.Warnf("flag `remoteRead.ignoreRestoreErrors` is deprecated and will be removed in next releases.")
}
err := templates.Load(*ruleTemplatesPath, true)
if err != nil {
logger.Fatalf("failed to parse %q: %s", *ruleTemplatesPath, err)
@@ -270,7 +287,10 @@ func getAlertURLGenerator(externalURL *url.URL, externalAlertSource string, vali
"tpl": externalAlertSource,
}
return func(alert notifier.Alert) string {
templated, err := alert.ExecTemplate(nil, alert.Labels, m)
qFn := func(query string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported for alert source template")
}
templated, err := alert.ExecTemplate(qFn, alert.Labels, m)
if err != nil {
logger.Errorf("can not exec source template %s", err)
}
@@ -308,6 +328,7 @@ func configReload(ctx context.Context, m *manager, groupsCfg []config.Group, sig
// init reload metrics with positive values to improve alerting conditions
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
parseFn := config.Parse
for {
select {
case <-ctx.Done():
@@ -319,7 +340,11 @@ func configReload(ctx context.Context, m *manager, groupsCfg []config.Group, sig
}
logger.Infof("SIGHUP received. Going to reload rules %q %s...", *rulePath, tmplMsg)
configReloads.Inc()
// allow logs emitting during manual config reload
parseFn = config.Parse
case <-configCheckCh:
// disable logs emitting during per-interval config reload
parseFn = config.ParseSilent
}
if err := notifier.Reload(); err != nil {
configReloadErrors.Inc()
@@ -334,7 +359,7 @@ func configReload(ctx context.Context, m *manager, groupsCfg []config.Group, sig
logger.Errorf("failed to load new templates: %s", err)
continue
}
newGroupsCfg, err := config.Parse(*rulePath, validateTplFn, *validateExpressions)
newGroupsCfg, err := parseFn(*rulePath, validateTplFn, *validateExpressions)
if err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)

View File

@@ -82,24 +82,18 @@ func (m *manager) close() {
m.wg.Wait()
}
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) error {
if restore && m.rr != nil {
err := group.Restore(ctx, m.rr, *remoteReadLookBack, m.labels)
if err != nil {
if !*remoteReadIgnoreRestoreErrors {
return fmt.Errorf("failed to restore ruleState for group %q: %w", group.Name, err)
}
logger.Errorf("error while restoring ruleState for group %q: %s", group.Name, err)
}
}
func (m *manager) startGroup(ctx context.Context, g *Group, restore bool) error {
m.wg.Add(1)
id := group.ID()
id := g.ID()
go func() {
group.start(ctx, m.notifiers, m.rw)
m.wg.Done()
defer m.wg.Done()
if restore {
g.start(ctx, m.notifiers, m.rw, m.rr)
} else {
g.start(ctx, m.notifiers, m.rw, nil)
}
}()
m.groups[id] = group
m.groups[id] = g
return nil
}
@@ -153,6 +147,7 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
}
for _, ng := range groupsRegistry {
if err := m.startGroup(ctx, ng, restore); err != nil {
m.groupsMu.Unlock()
return err
}
}
@@ -166,6 +161,7 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
old.updateCh <- new
wg.Done()
}(item.old, item.new)
item.old.interruptEval()
}
wg.Wait()
}
@@ -180,15 +176,17 @@ func (g *Group) toAPI() APIGroup {
// encode as string to avoid rounding
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.Seconds(),
LastEvaluation: g.LastEvaluation,
Concurrency: g.Concurrency,
Params: urlValuesToStrings(g.Params),
Headers: headersToStrings(g.Headers),
Labels: g.Labels,
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.Seconds(),
LastEvaluation: g.LastEvaluation,
Concurrency: g.Concurrency,
Params: urlValuesToStrings(g.Params),
Headers: headersToStrings(g.Headers),
NotifierHeaders: headersToStrings(g.NotifierHeaders),
Labels: g.Labels,
}
for _, r := range g.Rules {
ag.Rules = append(ag.Rules, r.ToAPI())

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -41,7 +41,7 @@ type Alert struct {
LastSent time.Time
// Value stores the value returned from evaluating expression from Expr field
Value float64
// ID is the unique identifer for the Alert
// ID is the unique identifier for the Alert
ID uint64
// Restored is true if Alert was restored after restart
Restored bool
@@ -111,11 +111,7 @@ func (a *Alert) ExecTemplate(q templates.QueryFn, labels, annotations map[string
ActiveAt: a.ActiveAt,
For: a.For,
}
tmpl, err := templates.GetWithFuncs(templates.FuncsWithQuery(q))
if err != nil {
return nil, fmt.Errorf("error getting a template: %w", err)
}
return templateAnnotations(annotations, tplData, tmpl, true)
return ExecTemplate(q, annotations, tplData)
}
// ExecTemplate executes the given template for given annotations map.
@@ -174,6 +170,8 @@ func templateAnnotation(dst io.Writer, text string, data tplData, tmpl *textTpl.
if err != nil {
return fmt.Errorf("error cloning template before parse annotation: %w", err)
}
// Clone() doesn't copy tpl Options, so we set them manually
tpl = tpl.Option("missingkey=zero")
tpl, err = tpl.Parse(text)
if err != nil {
return fmt.Errorf("error parsing annotation template: %w", err)

View File

@@ -200,6 +200,9 @@ func TestAlert_ExecTemplate(t *testing.T) {
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
if err := ValidateTemplates(tc.annotations); err != nil {
t.Fatal(err)
}
tpl, err := tc.alert.ExecTemplate(qFn, tc.alert.Labels, tc.annotations)
if err != nil {
t.Fatal(err)

View File

@@ -51,24 +51,27 @@ func (am *AlertManager) Close() {
func (am AlertManager) Addr() string { return am.addr }
// Send an alert or resolve message
func (am *AlertManager) Send(ctx context.Context, alerts []Alert) error {
func (am *AlertManager) Send(ctx context.Context, alerts []Alert, headers map[string]string) error {
am.metrics.alertsSent.Add(len(alerts))
err := am.send(ctx, alerts)
err := am.send(ctx, alerts, headers)
if err != nil {
am.metrics.alertsSendErrors.Add(len(alerts))
}
return err
}
func (am *AlertManager) send(ctx context.Context, alerts []Alert) error {
func (am *AlertManager) send(ctx context.Context, alerts []Alert, headers map[string]string) error {
b := &bytes.Buffer{}
writeamRequest(b, alerts, am.argFunc, am.relabelConfigs)
req, err := http.NewRequest("POST", am.addr, b)
req, err := http.NewRequest(http.MethodPost, am.addr, b)
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/json")
for key, value := range headers {
req.Header.Set(key, value)
}
if am.timeout > 0 {
var cancel context.CancelFunc
@@ -105,7 +108,8 @@ const alertManagerPath = "/api/v2/alerts"
// NewAlertManager is a constructor for AlertManager
func NewAlertManager(alertManagerURL string, fn AlertURLGenerator, authCfg promauth.HTTPClientConfig,
relabelCfg *promrelabel.ParsedConfigs, timeout time.Duration) (*AlertManager, error) {
relabelCfg *promrelabel.ParsedConfigs, timeout time.Duration,
) (*AlertManager, error) {
tls := &promauth.TLSConfig{}
if authCfg.TLSConfig != nil {
tls = authCfg.TLSConfig

View File

@@ -25,6 +25,7 @@ func TestAlertManager_Addr(t *testing.T) {
func TestAlertManager_Send(t *testing.T) {
const baUser, baPass = "foo", "bar"
const headerKey, headerValue = "TenantID", "foo"
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Errorf("should not be called")
@@ -73,6 +74,11 @@ func TestAlertManager_Send(t *testing.T) {
if a[0].EndAt.IsZero() {
t.Errorf("expected non-zero end time")
}
case 3:
if r.Header.Get(headerKey) != headerValue {
t.Errorf("expected header %q to be set to %q; got %q instead",
headerKey, headerValue, r.Header.Get(headerKey))
}
}
})
srv := httptest.NewServer(mux)
@@ -90,10 +96,10 @@ func TestAlertManager_Send(t *testing.T) {
if err != nil {
t.Errorf("unexpected error: %s", err)
}
if err := am.Send(context.Background(), []Alert{{}, {}}); err == nil {
if err := am.Send(context.Background(), []Alert{{}, {}}, nil); err == nil {
t.Error("expected connection error got nil")
}
if err := am.Send(context.Background(), []Alert{}); err == nil {
if err := am.Send(context.Background(), []Alert{}, nil); err == nil {
t.Error("expected wrong http code error got nil")
}
if err := am.Send(context.Background(), []Alert{{
@@ -102,10 +108,13 @@ func TestAlertManager_Send(t *testing.T) {
Start: time.Now().UTC(),
End: time.Now().UTC(),
Annotations: map[string]string{"a": "b", "c": "d", "e": "f"},
}}); err != nil {
}}, nil); err != nil {
t.Errorf("unexpected error %s", err)
}
if c != 2 {
t.Errorf("expected 2 calls(count from zero) to server got %d", c)
}
if err := am.Send(context.Background(), nil, map[string]string{headerKey: headerValue}); err != nil {
t.Errorf("unexpected error %s", err)
}
}

View File

@@ -29,7 +29,7 @@ type Config struct {
// ConsulSDConfigs contains list of settings for service discovery via Consul
// see https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
ConsulSDConfigs []consul.SDConfig `yaml:"consul_sd_configs,omitempty"`
// DNSSDConfigs ontains list of settings for service discovery via DNS.
// DNSSDConfigs contains list of settings for service discovery via DNS.
// See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
DNSSDConfigs []dns.SDConfig `yaml:"dns_sd_configs,omitempty"`

View File

@@ -31,9 +31,9 @@ var (
tlsCertFile = flagutil.NewArrayString("notifier.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting to -notifier.url")
tlsKeyFile = flagutil.NewArrayString("notifier.tlsKeyFile", "Optional path to client-side TLS certificate key to use when connecting to -notifier.url")
tlsCAFile = flagutil.NewArrayString("notifier.tlsCAFile", "Optional path to TLS CA file to use for verifying connections to -notifier.url. "+
"By default system CA is used")
"By default, system CA is used")
tlsServerName = flagutil.NewArrayString("notifier.tlsServerName", "Optional TLS server name to use for connections to -notifier.url. "+
"By default the server name from -notifier.url is used")
"By default, the server name from -notifier.url is used")
oauth2ClientID = flagutil.NewArrayString("notifier.oauth2.clientID", "Optional OAuth2 clientID to use for -notifier.url. "+
"If multiple args are set, then they are applied independently for the corresponding -notifier.url")
@@ -81,10 +81,6 @@ var (
//
// Init returns an error if both mods are used.
func Init(gen AlertURLGenerator, extLabels map[string]string, extURL string) (func() []Notifier, error) {
if externalLabels != nil || externalURL != "" {
return nil, fmt.Errorf("BUG: notifier.Init was called multiple times")
}
externalURL = extURL
externalLabels = extLabels
eu, err := url.Parse(externalURL)

View File

@@ -0,0 +1,37 @@
package notifier
import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"testing"
)
func TestInit(t *testing.T) {
oldAddrs := *addrs
defer func() { *addrs = oldAddrs }()
*addrs = flagutil.ArrayString{"127.0.0.1", "127.0.0.2"}
fn, err := Init(nil, nil, "")
if err != nil {
t.Fatalf("%s", err)
}
nfs := fn()
if len(nfs) != 2 {
t.Fatalf("expected to get 2 notifiers; got %d", len(nfs))
}
targets := GetTargets()
if targets == nil || targets[TargetStatic] == nil {
t.Fatalf("expected to get static targets in response")
}
nf1 := targets[TargetStatic][0]
if nf1.Addr() != "127.0.0.1/api/v2/alerts" {
t.Fatalf("expected to get \"127.0.0.1/api/v2/alerts\"; got %q instead", nf1.Addr())
}
nf2 := targets[TargetStatic][1]
if nf2.Addr() != "127.0.0.2/api/v2/alerts" {
t.Fatalf("expected to get \"127.0.0.2/api/v2/alerts\"; got %q instead", nf2.Addr())
}
}

View File

@@ -7,7 +7,7 @@ type Notifier interface {
// Send sends the given list of alerts.
// Returns an error if fails to send the alerts.
// Must unblock if the given ctx is cancelled.
Send(ctx context.Context, alerts []Alert) error
Send(ctx context.Context, alerts []Alert, notifierHeaders map[string]string) error
// Addr returns address where alerts are sent.
Addr() string
// Close is a destructor for the Notifier

View File

@@ -99,13 +99,13 @@ func (rr *RecordingRule) Close() {
// It doesn't update internal states of the Rule and meant to be used just
// to get time series for backfilling.
func (rr *RecordingRule) ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
series, err := rr.q.QueryRange(ctx, rr.Expr, start, end)
res, err := rr.q.QueryRange(ctx, rr.Expr, start, end)
if err != nil {
return nil, err
}
duplicates := make(map[string]struct{}, len(series))
duplicates := make(map[string]struct{}, len(res.Data))
var tss []prompbmarshal.TimeSeries
for _, s := range series {
for _, s := range res.Data {
ts := rr.toTimeSeries(s)
key := stringifyLabels(ts)
if _, ok := duplicates[key]; ok {
@@ -120,13 +120,14 @@ func (rr *RecordingRule) ExecRange(ctx context.Context, start, end time.Time) ([
// Exec executes RecordingRule expression via the given Querier.
func (rr *RecordingRule) Exec(ctx context.Context, ts time.Time, limit int) ([]prompbmarshal.TimeSeries, error) {
start := time.Now()
qMetrics, req, err := rr.q.Query(ctx, rr.Expr, ts)
res, req, err := rr.q.Query(ctx, rr.Expr, ts)
curState := ruleStateEntry{
time: start,
at: ts,
duration: time.Since(start),
samples: len(qMetrics),
curl: requestToCurl(req),
time: start,
at: ts,
duration: time.Since(start),
samples: len(res.Data),
seriesFetched: res.SeriesFetched,
curl: requestToCurl(req),
}
defer func() {
@@ -138,6 +139,7 @@ func (rr *RecordingRule) Exec(ctx context.Context, ts time.Time, limit int) ([]p
return nil, curState.err
}
qMetrics := res.Data
numSeries := len(qMetrics)
if limit > 0 && numSeries > limit {
curState.err = fmt.Errorf("exec exceeded limit of %d with %d series", limit, numSeries)
@@ -208,17 +210,18 @@ func (rr *RecordingRule) UpdateWith(r Rule) error {
func (rr *RecordingRule) ToAPI() APIRule {
lastState := rr.state.getLast()
r := APIRule{
Type: "recording",
DatasourceType: rr.Type.String(),
Name: rr.Name,
Query: rr.Expr,
Labels: rr.Labels,
LastEvaluation: lastState.time,
EvaluationTime: lastState.duration.Seconds(),
Health: "ok",
LastSamples: lastState.samples,
MaxUpdates: rr.state.size(),
Updates: rr.state.getAll(),
Type: "recording",
DatasourceType: rr.Type.String(),
Name: rr.Name,
Query: rr.Expr,
Labels: rr.Labels,
LastEvaluation: lastState.time,
EvaluationTime: lastState.duration.Seconds(),
Health: "ok",
LastSamples: lastState.samples,
LastSeriesFetched: lastState.seriesFetched,
MaxUpdates: rr.state.size(),
Updates: rr.state.getAll(),
// encode as strings to avoid rounding
ID: fmt.Sprintf("%d", rr.ID()),

View File

@@ -34,9 +34,9 @@ var (
tlsCertFile = flag.String("remoteRead.tlsCertFile", "", "Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url")
tlsKeyFile = flag.String("remoteRead.tlsKeyFile", "", "Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url")
tlsCAFile = flag.String("remoteRead.tlsCAFile", "", "Optional path to TLS CA file to use for verifying connections to -remoteRead.url. "+
"By default system CA is used")
"By default, system CA is used")
tlsServerName = flag.String("remoteRead.tlsServerName", "", "Optional TLS server name to use for connections to -remoteRead.url. "+
"By default the server name from -remoteRead.url is used")
"By default, the server name from -remoteRead.url is used")
oauth2ClientID = flag.String("remoteRead.oauth2.clientID", "", "Optional OAuth2 clientID to use for -remoteRead.url.")
oauth2ClientSecret = flag.String("remoteRead.oauth2.clientSecret", "", "Optional OAuth2 clientSecret to use for -remoteRead.url.")

View File

@@ -29,7 +29,7 @@ var (
bearerTokenFile = flag.String("remoteWrite.bearerTokenFile", "", "Optional path to bearer token file to use for -remoteWrite.url.")
maxQueueSize = flag.Int("remoteWrite.maxQueueSize", 1e5, "Defines the max number of pending datapoints to remote write endpoint")
maxBatchSize = flag.Int("remoteWrite.maxBatchSize", 1e3, "Defines defines max number of timeseries to be flushed at once")
maxBatchSize = flag.Int("remoteWrite.maxBatchSize", 1e3, "Defines max number of timeseries to be flushed at once")
concurrency = flag.Int("remoteWrite.concurrency", 1, "Defines number of writers for concurrent writing into remote querier")
flushInterval = flag.Duration("remoteWrite.flushInterval", 5*time.Second, "Defines interval of flushes to remote write endpoint")
@@ -37,9 +37,9 @@ var (
tlsCertFile = flag.String("remoteWrite.tlsCertFile", "", "Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url")
tlsKeyFile = flag.String("remoteWrite.tlsKeyFile", "", "Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url")
tlsCAFile = flag.String("remoteWrite.tlsCAFile", "", "Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. "+
"By default system CA is used")
"By default, system CA is used")
tlsServerName = flag.String("remoteWrite.tlsServerName", "", "Optional TLS server name to use for connections to -remoteWrite.url. "+
"By default the server name from -remoteWrite.url is used")
"By default, the server name from -remoteWrite.url is used")
oauth2ClientID = flag.String("remoteWrite.oauth2.clientID", "", "Optional OAuth2 clientID to use for -remoteWrite.url.")
oauth2ClientSecret = flag.String("remoteWrite.oauth2.clientSecret", "", "Optional OAuth2 clientSecret to use for -remoteWrite.url.")

View File

@@ -0,0 +1,20 @@
package remotewrite
import (
"context"
"testing"
)
func TestInit(t *testing.T) {
oldAddr := *addr
defer func() { *addr = oldAddr }()
*addr = "http://localhost:8428"
cl, err := Init(context.Background())
if err != nil {
t.Fatal(err)
}
if err := cl.Close(); err != nil {
t.Fatal(err)
}
}

View File

@@ -186,7 +186,7 @@ var (
)
// flush is a blocking function that marshals WriteRequest and sends
// it to remote write endpoint. Flush performs limited amount of retries
// it to remote-write endpoint. Flush performs limited amount of retries
// if request fails.
func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
if len(wr.Timeseries) < 1 {
@@ -201,9 +201,14 @@ func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
return
}
const attempts = 5
b := snappy.Encode(nil, data)
for i := 0; i < attempts; i++ {
const (
retryCount = 5
retryBackoff = time.Second
)
for attempts := 0; attempts < retryCount; attempts++ {
err := c.send(ctx, b)
if err == nil {
sentRows.Add(len(wr.Timeseries))
@@ -211,21 +216,34 @@ func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
return
}
logger.Warnf("attempt %d to send request failed: %s", i+1, err)
_, isRetriable := err.(*retriableError)
logger.Warnf("attempt %d to send request failed: %s (retriable: %v)", attempts+1, err, isRetriable)
if !isRetriable {
// exit fast if error isn't retriable
break
}
// check if request has been cancelled before backoff
select {
case <-ctx.Done():
break
default:
}
// sleeping to avoid remote db hammering
time.Sleep(time.Second)
continue
time.Sleep(retryBackoff)
}
droppedRows.Add(len(wr.Timeseries))
droppedBytes.Add(len(b))
logger.Errorf("all %d attempts to send request failed - dropping %d time series",
attempts, len(wr.Timeseries))
logger.Errorf("attempts to send remote-write request failed - dropping %d time series",
len(wr.Timeseries))
}
func (c *Client) send(ctx context.Context, data []byte) error {
r := bytes.NewReader(data)
req, err := http.NewRequest("POST", c.addr, r)
req, err := http.NewRequest(http.MethodPost, c.addr, r)
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}
@@ -249,10 +267,31 @@ func (c *Client) send(ctx context.Context, data []byte) error {
req.URL.Redacted(), err, len(data), r.Size())
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusNoContent && resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
body, _ := io.ReadAll(resp.Body)
// according to https://prometheus.io/docs/concepts/remote_write_spec/
// Prometheus remote Write compatible receivers MUST
switch resp.StatusCode / 100 {
case 2:
// respond with a HTTP 2xx status code when the write is successful.
return nil
case 5:
// respond with HTTP status code 5xx when the write fails and SHOULD be retried.
return &retriableError{fmt.Errorf("unexpected response code %d for %s. Response body %q",
resp.StatusCode, req.URL.Redacted(), body)}
default:
// respond with HTTP status code 4xx when the request is invalid, will never be able to succeed
// and should not be retried.
return fmt.Errorf("unexpected response code %d for %s. Response body %q",
resp.StatusCode, req.URL.Redacted(), body)
}
return nil
}
type retriableError struct {
err error
}
func (e *retriableError) Error() string {
return e.err.Error()
}

View File

@@ -22,11 +22,11 @@ var (
replayTo = flag.String("replay.timeTo", "",
"The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'")
replayRulesDelay = flag.Duration("replay.rulesDelay", time.Second,
"Delay between rules evaluation within the group. Could be important if there are chained rules inside of the group"+
"Delay between rules evaluation within the group. Could be important if there are chained rules inside the group "+
"and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule."+
"Keep it equal or bigger than -remoteWrite.flushInterval.")
replayMaxDatapoints = flag.Int("replay.maxDatapointsPerQuery", 1e3,
"Max number of data points expected in one request. The higher the value, the less requests will be made during replay.")
"Max number of data points expected in one request. It affects the max time range for every `/query_range` request during the replay. The higher the value, the less requests will be made during replay.")
replayRuleRetryAttempts = flag.Int("replay.ruleRetryAttempts", 5,
"Defines how many retries to make before giving up on rule if request for it returns an error.")
disableProgressBar = flag.Bool("replay.disableProgressBar", false, "Whether to disable rendering progress bars during the replay. "+

View File

@@ -20,21 +20,21 @@ func (fr *fakeReplayQuerier) BuildWithParams(_ datasource.QuerierParams) datasou
return fr
}
func (fr *fakeReplayQuerier) QueryRange(_ context.Context, q string, from, to time.Time) ([]datasource.Metric, error) {
func (fr *fakeReplayQuerier) QueryRange(_ context.Context, q string, from, to time.Time) (res datasource.Result, err error) {
key := fmt.Sprintf("%s+%s", from.Format("15:04:05"), to.Format("15:04:05"))
dps, ok := fr.registry[q]
if !ok {
return nil, fmt.Errorf("unexpected query received: %q", q)
return res, fmt.Errorf("unexpected query received: %q", q)
}
_, ok = dps[key]
if !ok {
return nil, fmt.Errorf("unexpected time range received: %q", key)
return res, fmt.Errorf("unexpected time range received: %q", key)
}
delete(dps, key)
if len(fr.registry[q]) < 1 {
delete(fr.registry, q)
}
return nil, nil
return res, nil
}
func TestReplay(t *testing.T) {

View File

@@ -37,8 +37,6 @@ type ruleState struct {
sync.RWMutex
entries []ruleStateEntry
cur int
// disabled defines whether ruleState tracks ruleStateEntry
disabled bool
}
type ruleStateEntry struct {
@@ -55,13 +53,19 @@ type ruleStateEntry struct {
// stores the number of samples returned during
// the last evaluation
samples int
// stores the number of time series fetched during
// the last evaluation.
// Is supported by VictoriaMetrics only, starting from v1.90.0
// If seriesFetched == nil, then this attribute was missing in
// datasource response (unsupported).
seriesFetched *int
// stores the curl command reflecting the HTTP request used during rule.Exec
curl string
}
func newRuleState(size int) *ruleState {
if size < 1 {
return &ruleState{disabled: true}
size = 1
}
return &ruleState{
entries: make([]ruleStateEntry, size),
@@ -69,10 +73,6 @@ func newRuleState(size int) *ruleState {
}
func (s *ruleState) getLast() ruleStateEntry {
if s.disabled {
return ruleStateEntry{}
}
s.RLock()
defer s.RUnlock()
return s.entries[s.cur]
@@ -85,10 +85,6 @@ func (s *ruleState) size() int {
}
func (s *ruleState) getAll() []ruleStateEntry {
if s.disabled {
return nil
}
entries := make([]ruleStateEntry, 0)
s.RLock()
@@ -111,10 +107,6 @@ func (s *ruleState) getAll() []ruleStateEntry {
}
func (s *ruleState) add(e ruleStateEntry) {
if s.disabled {
return
}
s.Lock()
defer s.Unlock()

View File

@@ -14,13 +14,13 @@ func TestRule_stateDisabled(t *testing.T) {
}
state.add(ruleStateEntry{at: time.Now()})
if !e.at.IsZero() {
t.Fatalf("expected entry to be zero")
}
state.add(ruleStateEntry{at: time.Now()})
state.add(ruleStateEntry{at: time.Now()})
if len(state.getAll()) != 0 {
if len(state.getAll()) != 1 {
// state should store at least one update at any circumstances
t.Fatalf("expected for state to have %d entries; got %d",
0, len(state.getAll()),
1, len(state.getAll()),
)
}
}

View File

@@ -0,0 +1,39 @@
function expandAll() {
$('.collapse').addClass('show');
}
function collapseAll() {
$('.collapse').removeClass('show');
}
function toggleByID(id) {
let el = $("#" + id);
if (el.length > 0) {
el.click();
}
}
$(document).ready(function () {
$(".group-heading a").click(function (e) {
e.stopPropagation(); // prevent collapse logic on link click
let target = $(this).attr('href');
if (target.length > 0) {
toggleByID(target.substr(1));
}
});
$(".group-heading").click(function (e) {
let target = $(this).attr('data-bs-target');
let el = $("#" + target);
new bootstrap.Collapse(el, {
toggle: true
});
});
let hash = window.location.hash.substr(1);
toggleByID(hash);
});
$(document).ready(function () {
$('[data-bs-toggle="tooltip"]').tooltip();
});

View File

@@ -21,7 +21,6 @@ import (
"math"
"net"
"net/url"
"path/filepath"
"regexp"
"sort"
"strconv"
@@ -30,6 +29,8 @@ import (
textTpl "text/template"
"time"
"github.com/bmatcuk/doublestar/v4"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/formatutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promutils"
@@ -59,12 +60,12 @@ func Load(pathPatterns []string, overwrite bool) error {
var err error
tmpl := newTemplate()
for _, tp := range pathPatterns {
p, err := filepath.Glob(tp)
p, err := doublestar.FilepathGlob(tp)
if err != nil {
return fmt.Errorf("failed to retrieve a template glob %q: %w", tp, err)
}
if len(p) > 0 {
tmpl, err = tmpl.ParseGlob(tp)
tmpl, err = tmpl.ParseFiles(p...)
if err != nil {
return fmt.Errorf("failed to parse template glob %q: %w", tp, err)
}
@@ -182,6 +183,10 @@ func Get() (*textTpl.Template, error) {
func FuncsWithQuery(query QueryFn) textTpl.FuncMap {
return textTpl.FuncMap{
"query": func(q string) ([]metric, error) {
if query == nil {
return nil, fmt.Errorf("cannot execute query %q: query is not available in this context", q)
}
result, err := query(q)
if err != nil {
return nil, err
@@ -225,7 +230,7 @@ func templateFuncs() textTpl.FuncMap {
"toLower": strings.ToLower,
// crlfEscape replaces '\n' and '\r' chars with `\\n` and `\\r`.
// This funcion is deprectated.
// This function is deprecated.
//
// It is better to use quotesEscape, jsonEscape, queryEscape or pathEscape instead -
// these functions properly escape `\n` and `\r` chars according to their purpose.

View File

@@ -76,6 +76,20 @@ func TestTemplateFuncs(t *testing.T) {
formatting("humanize1024", float64(146521335255970361638912), "124.1Zi")
formatting("humanize1024", float64(150037847302113650318245888), "124.1Yi")
formatting("humanize1024", float64(153638755637364377925883789312), "1.271e+05Yi")
formatting("humanize", float64(127087), "127.1k")
formatting("humanize", float64(136458627186688), "136.5T")
formatting("humanizeDuration", 1, "1s")
formatting("humanizeDuration", 0.2, "200ms")
formatting("humanizeDuration", 42000, "11h 40m 0s")
formatting("humanizeDuration", 16790555, "194d 8h 2m 35s")
formatting("humanizePercentage", 1, "100%")
formatting("humanizePercentage", 0.8, "80%")
formatting("humanizePercentage", 0.015, "1.5%")
formatting("humanizeTimestamp", 1679055557, "2023-03-17 12:19:17 +0000 UTC")
}
func mkTemplate(current, replacement interface{}) textTemplate {

View File

@@ -10,35 +10,7 @@
</main>
<script src="{%s prefix %}static/js/jquery-3.6.0.min.js" type="text/javascript"></script>
<script src="{%s prefix %}static/js/bootstrap.bundle.min.js" type="text/javascript"></script>
<script type="text/javascript">
function expandAll() {
$('.collapse').addClass('show');
}
function collapseAll() {
$('.collapse').removeClass('show');
}
$(document).ready(function() {
// prevent collapse logic on link click
$(".group-heading a").click(function(e) {
e.stopPropagation();
});
$(".group-heading").click(function(e) {
let target = $(this).attr('data-bs-target');
let el = $("#"+target);
new bootstrap.Collapse(el, {
toggle: true
});
});
var hash = window.location.hash.substr(1);
let group = $("#"+hash);
if (group.length > 0) {
group.click();
}
});
</script>
<script src="{%s prefix %}static/js/custom.js" type="text/javascript"></script>
</body>
</html>
{% endfunc %}

View File

@@ -45,63 +45,39 @@ func StreamFooter(qw422016 *qt422016.Writer, r *http.Request) {
qw422016.E().S(prefix)
//line app/vmalert/tpl/footer.qtpl:12
qw422016.N().S(`static/js/bootstrap.bundle.min.js" type="text/javascript"></script>
<script type="text/javascript">
function expandAll() {
$('.collapse').addClass('show');
}
function collapseAll() {
$('.collapse').removeClass('show');
}
$(document).ready(function() {
// prevent collapse logic on link click
$(".group-heading a").click(function(e) {
e.stopPropagation();
});
$(".group-heading").click(function(e) {
let target = $(this).attr('data-bs-target');
let el = $("#"+target);
new bootstrap.Collapse(el, {
toggle: true
});
});
var hash = window.location.hash.substr(1);
let group = $("#"+hash);
if (group.length > 0) {
group.click();
}
});
</script>
<script src="`)
//line app/vmalert/tpl/footer.qtpl:13
qw422016.E().S(prefix)
//line app/vmalert/tpl/footer.qtpl:13
qw422016.N().S(`static/js/custom.js" type="text/javascript"></script>
</body>
</html>
`)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
}
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
func WriteFooter(qq422016 qtio422016.Writer, r *http.Request) {
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
StreamFooter(qw422016, r)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
}
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
func Footer(r *http.Request) string {
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
WriteFooter(qb422016, r)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
qs422016 := string(qb422016.B)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
return qs422016
//line app/vmalert/tpl/footer.qtpl:44
//line app/vmalert/tpl/footer.qtpl:16
}

View File

@@ -57,7 +57,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/", "/vmalert", "/vmalert/":
if r.Method != "GET" {
if r.Method != http.MethodGet {
httpserver.Errorf(w, r, "path %q supports only GET method", r.URL.Path)
return false
}
@@ -211,7 +211,7 @@ func (rh *requestHandler) groups() []APIGroup {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
var groups []APIGroup
groups := make([]APIGroup, 0)
for _, g := range rh.m.groups {
groups = append(groups, g.toAPI())
}
@@ -276,6 +276,7 @@ func (rh *requestHandler) listAlerts() ([]byte, error) {
defer rh.m.groupsMu.RUnlock()
lr := listAlertsResponse{Status: "success"}
lr.Data.Alerts = make([]*APIAlert, 0)
for _, g := range rh.m.groups {
for _, r := range g.Rules {
a, ok := r.(*AlertingRule)

View File

@@ -30,102 +30,132 @@
{%= tpl.Footer(r) %}
{% endfunc %}
{% func ListGroups(r *http.Request, groups []APIGroup) %}
{% func buttonActive(filter, expValue string) %}
{% if filter != expValue %}
btn-secondary
{% else %}
btn-primary
{% endif %}
{% endfunc %}
{% func ListGroups(r *http.Request, originGroups []APIGroup) %}
{%code prefix := utils.Prefix(r.URL.Path) %}
{%= tpl.Header(r, navItems, "Groups") %}
{% if len(groups) > 0 %}
{%code
filter := r.URL.Query().Get("filter")
rOk := make(map[string]int)
rNotOk := make(map[string]int)
for _, g := range groups {
rNoMatch := make(map[string]int)
var groups []APIGroup
for _, g := range originGroups {
var rules []APIRule
for _, r := range g.Rules {
if r.LastError != "" {
rNotOk[g.Name]++
rNotOk[g.ID]++
} else {
rOk[g.Name]++
rOk[g.ID]++
}
if isNoMatch(r) {
rNoMatch[g.ID]++
}
if (filter == "unhealthy" && r.LastError == "") ||
(filter == "noMatch" && !isNoMatch(r)) {
continue
}
rules = append(rules, r)
}
if len(rules) > 0 {
g.Rules = rules
groups = append(groups, g)
}
}
%}
<a class="btn {%= buttonActive(filter, "") %}" role="button" onclick="window.location = window.location.pathname">All</a>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
{% for _, g := range groups %}
<div class="group-heading{% if rNotOk[g.Name] > 0 %} alert-danger{% endif %}" data-bs-target="rules-{%s g.ID %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s)</a>
{% if rNotOk[g.Name] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.Name] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.Name] %}</span>
<p class="fs-6 fw-lighter">{%s g.File %}</p>
{% if len(g.Params) > 0 %}
<div class="fs-6 fw-lighter">Extra params
{% for _, param := range g.Params %}
<span class="float-left badge bg-primary">{%s param %}</span>
{% endfor %}
</div>
{% endif %}
{% if len(g.Headers) > 0 %}
<div class="fs-6 fw-lighter">Extra headers
{% for _, header := range g.Headers %}
<span class="float-left badge bg-primary">{%s header %}</span>
{% endfor %}
</div>
{% endif %}
</div>
<div class="collapse" id="rules-{%s g.ID %}">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" style="width: 60%">Rule</th>
<th scope="col" style="width: 20%" class="text-center" title="How many samples were produced by the rule">Samples</th>
<th scope="col" style="width: 20%" class="text-center" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
{% for _, r := range g.Rules %}
<tr{% if r.LastError != "" %} class="alert-danger"{% endif %}>
<td>
<div class="row">
<div class="col-12 mb-2">
{% if r.Type == "alerting" %}
<b>alert:</b> {%s r.Name %} (for: {%v r.Duration %} seconds)
{% else %}
<b>record:</b> {%s r.Name %}
{% endif %}
| <span><a target="_blank" href="{%s prefix+r.WebLink() %}">Details</a></span>
</div>
<div class="col-12">
<code><pre>{%s r.Query %}</pre></code>
</div>
<div class="col-12 mb-2">
{% if len(r.Labels) > 0 %} <b>Labels:</b>{% endif %}
{% for k, v := range r.Labels %}
<span class="ms-1 badge bg-primary">{%s k %}={%s v %}</span>
{% endfor %}
</div>
{% if r.LastError != "" %}
<div class="col-12">
<b>Error:</b>
<div class="error-cell">
{%s r.LastError %}
<a class="btn {%= buttonActive(filter, "unhealthy") %}" role="button" onclick="location.href='?filter=unhealthy'" title="Show only rules with errors">Unhealthy</a>
<a class="btn {%= buttonActive(filter, "noMatch") %}" role="button" onclick="location.href='?filter=noMatch'" title="Show only rules matching no time series during last evaluation">NoMatch</a>
{% if len(groups) > 0 %}
{% for _, g := range groups %}
<div
class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{%endif%}" data-bs-target="rules-{%s g.ID %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s) #</a>
{% if rNotOk[g.ID] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.ID] %}</span> {% endif %}
{% if rNoMatch[g.ID] > 0 %}<span class="badge bg-warning" title="Number of rules with status NoMatch">{%d rNoMatch[g.ID] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.ID] %}</span>
<p class="fs-6 fw-lighter">{%s g.File %}</p>
{% if len(g.Params) > 0 %}
<div class="fs-6 fw-lighter">Extra params
{% for _, param := range g.Params %}
<span class="float-left badge bg-primary">{%s param %}</span>
{% endfor %}
</div>
{% endif %}
{% if len(g.Headers) > 0 %}
<div class="fs-6 fw-lighter">Extra headers
{% for _, header := range g.Headers %}
<span class="float-left badge bg-primary">{%s header %}</span>
{% endfor %}
</div>
{% endif %}
</div>
<div class="collapse" id="rules-{%s g.ID %}">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" style="width: 60%">Rule</th>
<th scope="col" style="width: 20%" class="text-center" title="How many samples were produced by the rule">Samples</th>
<th scope="col" style="width: 20%" class="text-center" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
{% for _, r := range g.Rules %}
<tr{% if r.LastError != "" %} class="alert-danger"{% endif %}>
<td>
<div class="row">
<div class="col-12 mb-2">
{% if r.Type == "alerting" %}
<b>alert:</b> {%s r.Name %} (for: {%v r.Duration %} seconds)
{% else %}
<b>record:</b> {%s r.Name %}
{% endif %}
|
{%= seriesFetchedWarn(r) %}
<span><a target="_blank" href="{%s prefix+r.WebLink() %}">Details</a></span>
</div>
<div class="col-12">
<code><pre>{%s r.Query %}</pre></code>
</div>
<div class="col-12 mb-2">
{% if len(r.Labels) > 0 %} <b>Labels:</b>{% endif %}
{% for k, v := range r.Labels %}
<span class="ms-1 badge bg-primary">{%s k %}={%s v %}</span>
{% endfor %}
</div>
{% if r.LastError != "" %}
<div class="col-12">
<b>Error:</b>
<div class="error-cell">
{%s r.LastError %}
</div>
</div>
{% endif %}
</div>
{% endif %}
</div>
</td>
<td class="text-center">{%d r.LastSamples %}</td>
<td class="text-center">{%f.3 time.Since(r.LastEvaluation).Seconds() %}s ago</td>
</tr>
{% endfor %}
</tbody>
</table>
</td>
<td class="text-center">{%d r.LastSamples %}</td>
<td class="text-center">{%f.3 time.Since(r.LastEvaluation).Seconds() %}s ago</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endfor %}
{% else %}
<div>
<p>No groups...</p>
</div>
{% endfor %}
{% else %}
<div>
<p>No groups...</p>
</div>
{% endif %}
{% endif %}
{%= tpl.Footer(r) %}
@@ -237,9 +267,9 @@
{%code typeK, ns := keys[i], targets[notifier.TargetType(keys[i])]
count := len(ns)
%}
<div class="group-heading data-bs-target="rules-{%s typeK %}">
<span class="anchor" id="notifiers-{%s typeK %}"></span>
<a href="#notifiers-{%s typeK %}">{%s typeK %} ({%d count %})</a>
<div class="group-heading" data-bs-target="notifiers-{%s typeK %}">
<span class="anchor" id="group-{%s typeK %}"></span>
<a href="#group-{%s typeK %}">{%s typeK %} ({%d count %})</a>
</div>
<div class="collapse show" id="notifiers-{%s typeK %}">
<table class="table table-striped table-hover table-sm">
@@ -377,8 +407,20 @@
annotationKeys = append(annotationKeys, k)
}
sort.Strings(annotationKeys)
var seriesFetchedEnabled bool
var seriesFetchedWarning bool
for _, u := range rule.Updates {
if u.seriesFetched != nil {
seriesFetchedEnabled = true
if *u.seriesFetched == 0 && u.samples == 0{
seriesFetchedWarning = true
}
}
}
%}
<div class="display-6 pb-3 mb-3">Rule: {%s rule.Name %}<span class="ms-2 badge {% if rule.Health!="ok" %}bg-danger{% else %} bg-warning text-dark{% endif %}">{%s rule.Health %}</span></div>
<div class="display-6 pb-3 mb-3">Rule: {%s rule.Name %}<span class="ms-2 badge {% if rule.Health!="ok" %}bg-danger{% else %} bg-success text-dark{% endif %}">{%s rule.Health %}</span></div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
@@ -427,6 +469,16 @@
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Debug
</div>
<div class="col">
{%v rule.Debug %}
</div>
</div>
</div>
{% endif %}
<div class="container border-bottom p-2">
<div class="row">
@@ -440,12 +492,26 @@
</div>
<br>
{% if seriesFetchedWarning %}
<div class="alert alert-warning" role="alert">
<strong>Warning:</strong> some of updates have "Series fetched" equal to 0.<br>
It might be that either this data is missing in the datasource or there is a typo in rule's expression.
For example, <strong>foo{label="bar"} > 0</strong> could never trigger because <strong>foo{label="bar"}</strong>
metric doesn't exist.
<br>
Rule's expressions without time series selector, like <strong>expr: 42</strong> or <strong>expr: time()</strong>
aren't fetching time series from datasource, so they could have "Series fetched" equal to 0 and this won't be a problem.
<br>
See more details about this detection <a target="_blank" href="https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4039">here</a>.
</div>
{% endif %}
<div class="display-6 pb-3">Last {%d len(rule.Updates) %}/{%d rule.MaxUpdates %} updates</span>:</div>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col" title="The time when event was created">Updated at</th>
<th scope="col" style="width: 10%" class="text-center" title="How many samples were returned">Samples</th>
{% if seriesFetchedEnabled %}<th scope="col" style="width: 10%" class="text-center" title="How many series were scanned by datasource during the evaluation">Series fetched</th>{% endif %}
<th scope="col" style="width: 10%" class="text-center" title="How many seconds request took">Duration</th>
<th scope="col" class="text-center" title="Time used for rule execution">Executed at</th>
<th scope="col" class="text-center" title="cURL command with request example">cURL</th>
@@ -458,7 +524,8 @@
<td>
<span class="badge bg-primary rounded-pill me-3" title="Updated at">{%s u.time.Format(time.RFC3339) %}</span>
</td>
<td class="text-center" wi>{%d u.samples %}</td>
<td class="text-center">{%d u.samples %}</td>
{% if seriesFetchedEnabled %}<td class="text-center">{% if u.seriesFetched != nil %}{%d *u.seriesFetched %}{% endif %}</td>{% endif %}
<td class="text-center">{%f.3 u.duration.Seconds() %}s</td>
<td class="text-center">{%s u.at.Format(time.RFC3339) %}</td>
<td>
@@ -468,7 +535,7 @@
</li>
{% if u.err != nil %}
<tr{% if u.err != nil %} class="alert-danger"{% endif %}>
<td colspan="5">
<td colspan="{% if seriesFetchedEnabled %}6{%else%}5{%endif%}">
<span class="alert-danger">{%v u.err %}</span>
</td>
</tr>
@@ -493,3 +560,22 @@
{% func badgeRestored() %}
<span class="badge bg-warning text-dark" title="Alert state was restored after the service restart from remote storage">restored</span>
{% endfunc %}
{% func seriesFetchedWarn(r APIRule) %}
{% if isNoMatch(r) %}
<svg xmlns="http://www.w3.org/2000/svg"
data-bs-toggle="tooltip"
title="No match! This rule last evaluation hasn't selected any time series from the datasource.
It might be that either this data is missing in the datasource or there is a typo in rule's expression.
See more in Details."
width="18" height="18" fill="currentColor" class="bi bi-exclamation-triangle-fill flex-shrink-0 me-2" viewBox="0 0 16 16" role="img" aria-label="Warning:">
<path d="M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16zm.93-9.412-1 4.705c-.07.34.029.533.304.533.194 0 .487-.07.686-.246l-.088.416c-.287.346-.92.598-1.465.598-.703 0-1.002-.422-.808-1.319l.738-3.468c.064-.293.006-.399-.287-.47l-.451-.081.082-.381 2.29-.287zM8 5.5a1 1 0 1 1 0-2 1 1 0 0 1 0 2z"/>
</svg>
{% endif %}
{% endfunc %}
{%code
func isNoMatch (r APIRule) bool {
return r.LastSamples == 0 && r.LastSeriesFetched != nil && *r.LastSeriesFetched == 0
}
%}

File diff suppressed because it is too large Load Diff

View File

@@ -7,6 +7,7 @@ import (
"net/http/httptest"
"reflect"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
)
@@ -19,9 +20,18 @@ func TestHandler(t *testing.T) {
},
state: newRuleState(10),
}
ar.state.add(ruleStateEntry{
time: time.Now(),
at: time.Now(),
samples: 10,
})
rr := &RecordingRule{
Name: "record",
state: newRuleState(10),
}
g := &Group{
Name: "group",
Rules: []Rule{ar},
Rules: []Rule{ar, rr},
}
m := &manager{groups: make(map[uint64]*Group)}
m.groups[0] = g
@@ -62,6 +72,14 @@ func TestHandler(t *testing.T) {
t.Run("/vmalert/rule", func(t *testing.T) {
a := ar.ToAPI()
getResp(ts.URL+"/vmalert/"+a.WebLink(), nil, 200)
r := rr.ToAPI()
getResp(ts.URL+"/vmalert/"+r.WebLink(), nil, 200)
})
t.Run("/vmalert/alert", func(t *testing.T) {
alerts := ar.AlertsToAPI()
for _, a := range alerts {
getResp(ts.URL+"/vmalert/"+a.WebLink(), nil, 200)
}
})
t.Run("/vmalert/rule?badParam", func(t *testing.T) {
params := fmt.Sprintf("?%s=0&%s=1", paramGroupID, paramRuleID)
@@ -144,5 +162,59 @@ func TestHandler(t *testing.T) {
t.Run("/api/v1/1/0/status", func(t *testing.T) {
getResp(ts.URL+"/api/v1/1/0/status", nil, 404)
})
}
func TestEmptyResponse(t *testing.T) {
rh := &requestHandler{m: &manager{groups: make(map[uint64]*Group)}}
ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { rh.handler(w, r) }))
defer ts.Close()
getResp := func(url string, to interface{}, code int) {
t.Helper()
resp, err := http.Get(url)
if err != nil {
t.Fatalf("unexpected err %s", err)
}
if code != resp.StatusCode {
t.Errorf("unexpected status code %d want %d", resp.StatusCode, code)
}
defer func() {
if err := resp.Body.Close(); err != nil {
t.Errorf("err closing body %s", err)
}
}()
if to != nil {
if err = json.NewDecoder(resp.Body).Decode(to); err != nil {
t.Errorf("unexpected err %s", err)
}
}
}
t.Run("/api/v1/alerts", func(t *testing.T) {
lr := listAlertsResponse{}
getResp(ts.URL+"/api/v1/alerts", &lr, 200)
if lr.Data.Alerts == nil {
t.Errorf("expected /api/v1/alerts response to have non-nil data")
}
lr = listAlertsResponse{}
getResp(ts.URL+"/vmalert/api/v1/alerts", &lr, 200)
if lr.Data.Alerts == nil {
t.Errorf("expected /api/v1/alerts response to have non-nil data")
}
})
t.Run("/api/v1/rules", func(t *testing.T) {
lr := listGroupsResponse{}
getResp(ts.URL+"/api/v1/rules", &lr, 200)
if lr.Data.Groups == nil {
t.Errorf("expected /api/v1/rules response to have non-nil data")
}
lr = listGroupsResponse{}
getResp(ts.URL+"/vmalert/api/v1/rules", &lr, 200)
if lr.Data.Groups == nil {
t.Errorf("expected /api/v1/rules response to have non-nil data")
}
})
}

View File

@@ -72,6 +72,8 @@ type APIGroup struct {
Params []string `json:"params,omitempty"`
// Headers contains HTTP headers added to each Rule's request
Headers []string `json:"headers,omitempty"`
// NotifierHeaders contains HTTP headers added to each alert request which will send to notifier
NotifierHeaders []string `json:"notifier_headers,omitempty"`
// Labels is a set of label value pairs, that will be added to every rule.
Labels map[string]string `json:"labels,omitempty"`
}
@@ -113,18 +115,25 @@ type APIRule struct {
// Additional fields
// Type of the rule: recording or alerting
// DatasourceType of the rule: prometheus or graphite
DatasourceType string `json:"datasourceType"`
LastSamples int `json:"lastSamples"`
// LastSamples stores the amount of data samples received on last evaluation
LastSamples int `json:"lastSamples"`
// LastSeriesFetched stores the amount of time series fetched by datasource
// during the last evaluation
LastSeriesFetched *int `json:"lastSeriesFetched,omitempty"`
// ID is a unique Alert's ID within a group
ID string `json:"id"`
// GroupID is an unique Group's ID
GroupID string `json:"group_id"`
// Debug shows whether debug mode is enabled
Debug bool `json:"debug"`
// MaxUpdates is the max number of recorded ruleStateEntry objects
MaxUpdates int `json:"max_updates_entries"`
// Updates contains the ordered list of recorded ruleStateEntry objects
Updates []ruleStateEntry `json:"updates"`
Updates []ruleStateEntry `json:"-"`
}
// WebLink returns a link to the alert which can be used in UI.

View File

@@ -84,6 +84,9 @@ vmauth-linux-arm64:
vmauth-linux-ppc64le:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmauth-linux-s390x:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmauth-linux-386:
APP_NAME=vmauth CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

View File

@@ -17,7 +17,12 @@ and pass the following flag to `vmauth` binary in order to start authorizing and
After that `vmauth` starts accepting HTTP requests on port `8427` and routing them according to the provided [-auth.config](#auth-config).
The port can be modified via `-httpListenAddr` command-line flag.
The auth config can be reloaded either by passing `SIGHUP` signal to `vmauth` or by querying `/-/reload` http endpoint.
The auth config can be reloaded via the following ways:
- By passing `SIGHUP` signal to `vmauth`.
- By querying `/-/reload` http endpoint. This endpoint can be protected with `-reloadAuthKey` command-line flag. See [security docs](#security) for more details.
- By specifying `-configCheckInterval` command-line flag to the interval between config re-reads. For example, `-configCheckInterval=5s` will re-read the config
and apply new changes every 5 seconds.
Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags).
@@ -28,7 +33,68 @@ accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.co
## Load balancing
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls.
In the latter case `vmauth` balances load among the configured urls in least-loaded round-robin manner.
`vmauth` retries failing `GET` requests across the configured list of urls.
This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes
in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
## Concurrency limiting
`vmauth` limits the number of concurrent requests it can proxy according to the following command-line flags:
- `-maxConcurrentRequests` limits the global number of concurrent requests `vmauth` can serve across all the configured users.
- `-maxConcurrentPerUserRequests` limits the number of concurrent requests `vmauth` can serve per each configured user.
It is also possible to set individual limits on the number of concurrent requests per each user
with the `max_concurrent_requests` option - see [auth config example](#auth-config).
`vmauth` responds with `429 Too Many Requests` HTTP error when the number of concurrent requests exceeds the configured limits.
The following [metrics](#monitoring) related to concurrency limits are exposed by `vmauth`:
- `vmauth_concurrent_requests_capacity` - the global limit on the number of concurrent requests `vmauth` can serve.
It is set via `-maxConcurrentRequests` command-line flag.
- `vmauth_concurrent_requests_current` - the current number of concurrent requests `vmauth` processes.
- `vmauth_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
because of the global concurrency limit has been reached.
- `vmauth_user_concurrent_requests_capacity{username="..."}` - the limit on the number of concurrent requests for the given `username`.
- `vmauth_user_concurrent_requests_current{username="..."}` - the current number of concurrent requests for the given `username`.
- `vmauth_user_concurrent_requests_limit_reached_total{username="foo"}` - the number of requests rejected with `429 Too Many Requests` error
because of the concurrency limit has been reached for the given `username`.
- `vmauth_unauthorized_user_concurrent_requests_capacity` - the limit on the number of concurrent requests for unauthorized users (if `unauthorized_user` section is used).
- `vmauth_unauthorized_user_concurrent_requests_current` - the current number of concurrent requests for unauthorized users (if `unauthorized_user` section is used).
- `vmauth_unauthorized_user_concurrent_requests_limit_reached_total` - the number of requests rejected with `429 Too Many Requests` error
because of the concurrency limit has been reached for unauthorized users (if `unauthorized_user` section is used).
## IP filters
[Enterprise version](https://docs.victoriametrics.com/enterprise.html) of `vmauth` can be configured to allow / deny incoming requests via global and per-user IP filters.
For example, the following config allows requests to `vmauth` from `10.0.0.0/24` network and from `1.2.3.4` IP address, while denying requests from `10.0.0.42` IP address:
```yml
users:
# User configs here
ip_filters:
allow_list:
- 10.0.0.0/24
- 1.2.3.4
deny_list: [10.0.0.42]
```
The following config allows requests for the user 'foobar' only from the ip `127.0.0.1`:
```yml
users:
- username: "foobar"
password: "***"
url_prefix: "http://localhost:8428"
ip_filters:
allow_list: [127.0.0.1]
```
## Auth config
@@ -55,26 +121,27 @@ users:
headers:
- "X-Scope-OrgID: foobar"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://localhost:8428 .
# are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
#
# The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
# Excess concurrent requests are rejected with 429 HTTP status code.
# See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
max_concurrent_requests: 10
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# are proxied to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
- username: "local-single-node2"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -84,10 +151,8 @@ users:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
@@ -98,14 +163,22 @@ users:
- "http://vminsert2:8480/insert/42/prometheus"
# A single user for querying and inserting data:
#
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/42/prometheus
# - http://vmselect2:8481/select/42/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
#
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write .
# The "X-Scope-OrgID: abc" http header is added to these requests.
#
# Request which do not match `src_paths` from the `url_map` are proxied to the urls from `default_url`
# in a round-robin manner. The original request path is passed in `request_path` query arg.
# For example, request to http://vmauth:8427/non/existing/path are proxied:
# - to http://default1:8888/unsupported_url_handler?request_path=/non/existing/path
# - or http://default2:8888/unsupported_url_handler?request_path=/non/existing/path
- username: "foobar"
url_map:
- src_paths:
@@ -119,6 +192,28 @@ users:
url_prefix: "http://vminsert:8480/insert/42/prometheus"
headers:
- "X-Scope-OrgID: abc"
ip_filters:
deny_list: [127.0.0.1]
default_url:
- "http://default1:8888/unsupported_url_handler"
- "http://default2:8888/unsupported_url_handler"
# Requests without Authorization header are routed according to `unauthorized_user` section.
unauthorized_user:
url_map:
- src_paths:
- /api/v1/query
- /api/v1/query_range
url_prefix:
- http://vmselect1:8481/select/0/prometheus
- http://vmselect2:8481/select/0/prometheus
ip_filters:
allow_list: [8.8.8.8]
ip_filters:
allow_list: ["1.2.3.0/24", "127.0.0.1"]
deny_list:
- 10.1.0.1
```
The config may contain `%{ENV_VAR}` placeholders, which are substituted by the corresponding `ENV_VAR` environment variable values.
@@ -141,12 +236,14 @@ Do not transfer Basic Auth headers in plaintext over untrusted networks. Enable
Alternatively, [https termination proxy](https://en.wikipedia.org/wiki/TLS_termination_proxy) may be put in front of `vmauth`.
It is recommended protecting following endpoints with authKeys:
It is recommended protecting the following endpoints with authKeys:
* `/-/reload` with `-reloadAuthKey` command-line flag, so external users couldn't trigger config reload.
* `/flags` with `-flagsAuthkey` command-line flag, so unauthorized users couldn't get application command-line flags.
* `/metrics` with `metricsAuthkey` command-line flag, so unauthorized users couldn't get access to [vmauth metrics](#monitoring).
* `/debug/pprof` with `pprofAuthKey` command-line flag, so unauthorized users couldn't get access to [profiling information](#profiling).
`vmauth` also supports the ability to restict access by IP - see [these docs](#ip-filters). See also [concurrency limiting docs](#concurrency-limiting).
## Monitoring
`vmauth` exports various metrics in Prometheus exposition format at `http://vmauth-host:8427/metrics` page. It is recommended setting up regular scraping of this page
@@ -161,6 +258,8 @@ users:
# other config options here
```
For unauthorized users `vmauth` exports `vmauth_unauthorized_user_requests_total` metric without label (if `unauthorized_user` section of config is used).
## How to build from sources
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmauth` is located in `vmutils-*` archives there.
@@ -232,8 +331,10 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
-auth.config string
Path to auth config. It can point either to local file or to http url. See https://docs.victoriametrics.com/vmauth.html for details on the format of this auth config
-configCheckInterval duration
Interval for config file re-read. Zero value disables config re-reading. By default, refreshing is disabled, send SIGHUP for config refresh.
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
@@ -243,11 +344,11 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
-flagsAuthKey string
Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -265,7 +366,7 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
-httpListenAddr.useProxyProtocol
Whether to use proxy protocol for connections accepted at -httpListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt
-internStringMaxLen int
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 300)
The maximum length for strings to intern. Lower limit may save memory at the cost of higher CPU usage. See https://en.wikipedia.org/wiki/String_interning (default 500)
-logInvalidAuthTokens
Whether to log requests with invalid auth tokens. Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
-loggerDisableTimestamps
@@ -284,8 +385,10 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentPerUserRequests int
The maximum number of concurrent requests vmauth can process per each configured user. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option in per-user config (default 300)
-maxConcurrentRequests int
The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxIdleConnsPerBackend (default 1000)
The maximum number of concurrent requests vmauth can process. Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options (default 1000)
-maxIdleConnsPerBackend int
The maximum number of idle connections vmauth can open per each backend host. See also -maxConcurrentRequests (default 100)
-memory.allowedBytes size

View File

@@ -11,8 +11,10 @@ import (
"strings"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
@@ -23,26 +25,56 @@ import (
var (
authConfigPath = flag.String("auth.config", "", "Path to auth config. It can point either to local file or to http url. "+
"See https://docs.victoriametrics.com/vmauth.html for details on the format of this auth config")
configCheckInterval = flag.Duration("configCheckInterval", 0, "interval for config file re-read. "+
"Zero value disables config re-reading. By default, refreshing is disabled, send SIGHUP for config refresh.")
)
// AuthConfig represents auth config.
type AuthConfig struct {
Users []UserInfo `yaml:"users,omitempty"`
Users []UserInfo `yaml:"users,omitempty"`
UnauthorizedUser *UserInfo `yaml:"unauthorized_user,omitempty"`
}
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
Headers []Header `yaml:"headers,omitempty"`
Name string `yaml:"name,omitempty"`
BearerToken string `yaml:"bearer_token,omitempty"`
Username string `yaml:"username,omitempty"`
Password string `yaml:"password,omitempty"`
URLPrefix *URLPrefix `yaml:"url_prefix,omitempty"`
URLMaps []URLMap `yaml:"url_map,omitempty"`
Headers []Header `yaml:"headers,omitempty"`
MaxConcurrentRequests int `yaml:"max_concurrent_requests,omitempty"`
DefaultURL *URLPrefix `yaml:"default_url,omitempty"`
concurrencyLimitCh chan struct{}
concurrencyLimitReached *metrics.Counter
requests *metrics.Counter
}
func (ui *UserInfo) beginConcurrencyLimit() error {
select {
case ui.concurrencyLimitCh <- struct{}{}:
return nil
default:
ui.concurrencyLimitReached.Inc()
return fmt.Errorf("cannot handle more than %d concurrent requests from user %s", ui.getMaxConcurrentRequests(), ui.name())
}
}
func (ui *UserInfo) endConcurrencyLimit() {
<-ui.concurrencyLimitCh
}
func (ui *UserInfo) getMaxConcurrentRequests() int {
mcr := ui.MaxConcurrentRequests
if mcr <= 0 {
mcr = *maxConcurrentPerUserRequests
}
return mcr
}
// Header is `Name: Value` http header, which must be added to the proxied request.
type Header struct {
Name string
@@ -83,16 +115,77 @@ type SrcPath struct {
re *regexp.Regexp
}
// URLPrefix represents pased `url_prefix`
// URLPrefix represents passed `url_prefix`
type URLPrefix struct {
n uint32
urls []*url.URL
n uint32
bus []*backendURL
}
func (up *URLPrefix) getNextURL() *url.URL {
type backendURL struct {
brokenDeadline uint64
concurrentRequests int32
url *url.URL
}
func (bu *backendURL) isBroken() bool {
ct := fasttime.UnixTimestamp()
return ct < atomic.LoadUint64(&bu.brokenDeadline)
}
func (bu *backendURL) setBroken() {
deadline := fasttime.UnixTimestamp() + 3
atomic.StoreUint64(&bu.brokenDeadline, deadline)
}
func (bu *backendURL) put() {
atomic.AddInt32(&bu.concurrentRequests, -1)
}
func (up *URLPrefix) getBackendsCount() int {
return len(up.bus)
}
// getLeastLoadedBackendURL returns the backendURL with the minimum number of concurrent requests.
//
// backendURL.put() must be called on the returned backendURL after the request is complete.
func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
bus := up.bus
if len(bus) == 1 {
// Fast path - return the only backend url.
bu := bus[0]
atomic.AddInt32(&bu.concurrentRequests, 1)
return bu
}
// Slow path - select other backend urls.
n := atomic.AddUint32(&up.n, 1)
idx := n % uint32(len(up.urls))
return up.urls[idx]
for i := uint32(0); i < uint32(len(bus)); i++ {
idx := (n + i) % uint32(len(bus))
bu := bus[idx]
if bu.isBroken() {
continue
}
if atomic.CompareAndSwapInt32(&bu.concurrentRequests, 0, 1) {
// Fast path - return the backend with zero concurrently executed requests.
return bu
}
}
// Slow path - return the backend with the minimum number of concurrently executed requests.
buMin := bus[n%uint32(len(bus))]
minRequests := atomic.LoadInt32(&buMin.concurrentRequests)
for _, bu := range bus {
if bu.isBroken() {
continue
}
if n := atomic.LoadInt32(&bu.concurrentRequests); n < minRequests {
buMin = bu
minRequests = n
}
}
atomic.AddInt32(&buMin.concurrentRequests, 1)
return buMin
}
// UnmarshalYAML unmarshals up from yaml.
@@ -121,31 +214,33 @@ func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
default:
return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
}
pus := make([]*url.URL, len(urls))
bus := make([]*backendURL, len(urls))
for i, u := range urls {
pu, err := url.Parse(u)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
}
pus[i] = pu
bus[i] = &backendURL{
url: pu,
}
}
up.urls = pus
up.bus = bus
return nil
}
// MarshalYAML marshals up to yaml.
func (up *URLPrefix) MarshalYAML() (interface{}, error) {
var b []byte
if len(up.urls) == 1 {
u := up.urls[0].String()
if len(up.bus) == 1 {
u := up.bus[0].url.String()
b = strconv.AppendQuote(b, u)
return string(b), nil
}
b = append(b, '[')
for i, pu := range up.urls {
u := pu.String()
for i, bu := range up.bus {
u := bu.url.String()
b = strconv.AppendQuote(b, u)
if i+1 < len(up.urls) {
if i+1 < len(up.bus) {
b = append(b, ',')
}
}
@@ -196,11 +291,11 @@ func initAuthConfig() {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
m, err := readAuthConfig(*authConfigPath)
err := loadAuthConfig()
if err != nil {
logger.Fatalf("cannot load auth config from `-auth.config=%s`: %s", *authConfigPath, err)
logger.Fatalf("cannot load auth config: %s", err)
}
authConfig.Store(m)
stopCh = make(chan struct{})
authConfigWG.Add(1)
go func() {
@@ -215,53 +310,90 @@ func stopAuthConfig() {
}
func authConfigReloader(sighupCh <-chan os.Signal) {
var refreshCh <-chan time.Time
// initialize auth refresh interval
if *configCheckInterval > 0 {
ticker := time.NewTicker(*configCheckInterval)
defer ticker.Stop()
refreshCh = ticker.C
}
for {
select {
case <-stopCh:
return
case <-refreshCh:
procutil.SelfSIGHUP()
case <-sighupCh:
logger.Infof("SIGHUP received; loading -auth.config=%q", *authConfigPath)
m, err := readAuthConfig(*authConfigPath)
err := loadAuthConfig()
if err != nil {
logger.Errorf("failed to load -auth.config=%q; using the last successfully loaded config; error: %s", *authConfigPath, err)
logger.Errorf("failed to load auth config; using the last successfully loaded config; error: %s", err)
continue
}
authConfig.Store(m)
logger.Infof("Successfully reloaded -auth.config=%q", *authConfigPath)
}
}
}
var authConfig atomic.Value
var authConfig atomic.Pointer[AuthConfig]
var authUsers atomic.Pointer[map[string]*UserInfo]
var authConfigWG sync.WaitGroup
var stopCh chan struct{}
func readAuthConfig(path string) (map[string]*UserInfo, error) {
func loadAuthConfig() error {
ac, err := readAuthConfig(*authConfigPath)
if err != nil {
return fmt.Errorf("failed to load -auth.config=%q: %s", *authConfigPath, err)
}
m, err := parseAuthConfigUsers(ac)
if err != nil {
return fmt.Errorf("failed to parse users from -auth.config=%q: %s", *authConfigPath, err)
}
logger.Infof("loaded information about %d users from -auth.config=%q", len(m), *authConfigPath)
authConfig.Store(ac)
authUsers.Store(&m)
return nil
}
func readAuthConfig(path string) (*AuthConfig, error) {
data, err := fs.ReadFileOrHTTP(path)
if err != nil {
return nil, err
}
m, err := parseAuthConfig(data)
if err != nil {
return nil, fmt.Errorf("cannot parse %q: %w", path, err)
}
logger.Infof("Loaded information about %d users from %q", len(m), path)
return m, nil
return parseAuthConfig(data)
}
func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
var err error
data, err = envtemplate.ReplaceBytes(data)
func parseAuthConfig(data []byte) (*AuthConfig, error) {
data, err := envtemplate.ReplaceBytes(data)
if err != nil {
return nil, fmt.Errorf("cannot expand environment vars: %w", err)
}
var ac AuthConfig
if err := yaml.UnmarshalStrict(data, &ac); err != nil {
if err = yaml.UnmarshalStrict(data, &ac); err != nil {
return nil, fmt.Errorf("cannot unmarshal AuthConfig data: %w", err)
}
ui := ac.UnauthorizedUser
if ui != nil {
ui.requests = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_requests_total`)
ui.concurrencyLimitCh = make(chan struct{}, ui.getMaxConcurrentRequests())
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_concurrent_requests_limit_reached_total`)
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_capacity`, func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_current`, func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
}
return &ac, nil
}
func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
uis := ac.Users
if len(uis) == 0 {
return nil, fmt.Errorf("`users` section cannot be empty in AuthConfig")
if len(uis) == 0 && ac.UnauthorizedUser == nil {
return nil, fmt.Errorf("Missing `users` or `unauthorized_user` sections")
}
byAuthToken := make(map[string]*UserInfo, len(uis))
for i := range uis {
@@ -284,6 +416,11 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
return nil, err
}
}
if ui.DefaultURL != nil {
if err := ui.DefaultURL.sanitize(); err != nil {
return nil, err
}
}
for _, e := range ui.URLMaps {
if len(e.SrcPaths) == 0 {
return nil, fmt.Errorf("missing `src_paths` in `url_map`")
@@ -298,29 +435,44 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
if len(ui.URLMaps) == 0 && ui.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix`")
}
name := ui.name()
if ui.BearerToken != "" {
name := "bearer_token"
if ui.Name != "" {
name = ui.Name
}
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
}
if ui.Username != "" {
name := ui.Username
if ui.Name != "" {
name = ui.Name
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
}
mcr := ui.getMaxConcurrentRequests()
ui.concurrencyLimitCh = make(chan struct{}, mcr)
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_concurrent_requests_limit_reached_total{username=%q}`, name))
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_capacity{username=%q}`, name), func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
byAuthToken[at1] = ui
byAuthToken[at2] = ui
}
return byAuthToken, nil
}
func (ui *UserInfo) name() string {
if ui.Name != "" {
return ui.Name
}
if ui.Username != "" {
return ui.Username
}
if ui.BearerToken != "" {
return "bearer_token"
}
return ""
}
func getAuthTokens(bearerToken, username, password string) (string, string) {
if bearerToken != "" {
// Accept the bearerToken as Basic Auth username with empty password
@@ -342,12 +494,12 @@ func getAuthToken(bearerToken, username, password string) string {
}
func (up *URLPrefix) sanitize() error {
for i, pu := range up.urls {
puNew, err := sanitizeURLPrefix(pu)
for _, bu := range up.bus {
puNew, err := sanitizeURLPrefix(bu.url)
if err != nil {
return err
}
up.urls[i] = puNew
bu.url = puNew
}
return nil
}

View File

@@ -13,7 +13,11 @@ import (
func TestParseAuthConfigFailure(t *testing.T) {
f := func(s string) {
t.Helper()
_, err := parseAuthConfig([]byte(s))
ac, err := parseAuthConfig([]byte(s))
if err != nil {
return
}
_, err = parseAuthConfigUsers(ac)
if err == nil {
t.Fatalf("expecting non-nil error")
}
@@ -202,7 +206,11 @@ users:
func TestParseAuthConfigSuccess(t *testing.T) {
f := func(s string, expectedAuthConfig map[string]*UserInfo) {
t.Helper()
m, err := parseAuthConfig([]byte(s))
ac, err := parseAuthConfig([]byte(s))
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
m, err := parseAuthConfigUsers(ac)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
@@ -218,11 +226,13 @@ users:
- username: foo
password: bar
url_prefix: http://aaa:343/bbb
max_concurrent_requests: 5
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
MaxConcurrentRequests: 5,
},
})
@@ -350,6 +360,83 @@ users:
URLPrefix: mustParseURL("https://bar/x"),
},
})
// with default url
f(`
users:
- bearer_token: foo
url_map:
- src_paths: ["/api/v1/query","/api/v1/query_range","/api/v1/label/[^./]+/.+"]
url_prefix: http://vmselect/select/0/prometheus
- src_paths: ["/api/v1/write"]
url_prefix: ["http://vminsert1/insert/0/prometheus","http://vminsert2/insert/0/prometheus"]
headers:
- "foo: bar"
- "xxx: y"
default_url:
- http://default1/select/0/prometheus
- http://default2/select/0/prometheus
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
Headers: []Header{
{
Name: "foo",
Value: "bar",
},
{
Name: "xxx",
Value: "y",
},
},
},
},
DefaultURL: mustParseURLs([]string{
"http://default1/select/0/prometheus",
"http://default2/select/0/prometheus",
}),
},
getAuthToken("", "foo", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
Headers: []Header{
{
Name: "foo",
Value: "bar",
},
{
Name: "xxx",
Value: "y",
},
},
},
},
DefaultURL: mustParseURLs([]string{
"http://default1/select/0/prometheus",
"http://default2/select/0/prometheus",
}),
},
})
}
@@ -390,15 +477,17 @@ func mustParseURL(u string) *URLPrefix {
}
func mustParseURLs(us []string) *URLPrefix {
pus := make([]*url.URL, len(us))
bus := make([]*backendURL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
pus[i] = pu
bus[i] = &backendURL{
url: pu,
}
}
return &URLPrefix{
urls: pus,
bus: bus,
}
}

View File

@@ -18,26 +18,27 @@ users:
headers:
- "X-Scope-OrgID: foobar"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://localhost:8428 .
# are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
#
# The given user can send maximum 10 concurrent requests according to the provided max_concurrent_requests.
# Excess concurrent requests are rejected with 429 HTTP status code.
# See also -maxConcurrentPerUserRequests and -maxConcurrentRequests command-line flags.
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
max_concurrent_requests: 10
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# are proxied to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
- username: "local-single-node2"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# are load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
@@ -47,10 +48,8 @@ users:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# are load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
@@ -61,14 +60,22 @@ users:
- "http://vminsert2:8480/insert/42/prometheus"
# A single user for querying and inserting data:
#
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/42/prometheus
# - http://vmselect2:8481/select/42/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
#
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write .
# The "X-Scope-OrgID: abc" http header is added to these requests.
#
# Request which do not match `src_paths` from the `url_map` are proxied to the urls from `default_url`
# in a round-robin manner. The original request path is passed in `request_path` query arg.
# For example, request to http://vmauth:8427/non/existing/path are proxied:
# - to http://default1:8888/unsupported_url_handler?request_path=/non/existing/path
# - or http://default2:8888/unsupported_url_handler?request_path=/non/existing/path
- username: "foobar"
url_map:
- src_paths:
@@ -82,3 +89,25 @@ users:
url_prefix: "http://vminsert:8480/insert/42/prometheus"
headers:
- "X-Scope-OrgID: abc"
ip_filters:
deny_list: [127.0.0.1]
default_url:
- "http://default1:8888/unsupported_url_handler"
- "http://default2:8888/unsupported_url_handler"
# Requests without Authorization header are routed according to `unauthorized_user` section.
unauthorized_user:
url_map:
- src_paths:
- /api/v1/query
- /api/v1/query_range
url_prefix:
- http://vmselect1:8481/select/0/prometheus
- http://vmselect2:8481/select/0/prometheus
ip_filters:
allow_list: [8.8.8.8]
ip_filters:
allow_list: ["1.2.3.0/24", "127.0.0.1"]
deny_list:
- 10.1.0.1

View File

@@ -28,12 +28,16 @@ import (
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host. "+
"See also -maxConcurrentRequests")
responseTimeout = flag.Duration("responseTimeout", 5*time.Minute, "The timeout for receiving a response from backend")
maxConcurrentRequests = flag.Int("maxConcurrentRequests", 1000, "The maximum number of concurrent requests vmauth can process. Other requests are rejected with "+
"'429 Too Many Requests' http status code. See also -maxIdleConnsPerBackend")
"'429 Too Many Requests' http status code. See also -maxConcurrentPerUserRequests and -maxIdleConnsPerBackend command-line options")
maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
"in per-user config")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
@@ -80,6 +84,13 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
authToken := r.Header.Get("Authorization")
if authToken == "" {
// Process requests for unauthorized users
ui := authConfig.Load().UnauthorizedUser
if ui != nil {
processUserRequest(w, r, ui)
return true
}
w.Header().Set("WWW-Authenticate", `Basic realm="Restricted"`)
http.Error(w, "missing `Authorization` request header", http.StatusUnauthorized)
return true
@@ -90,49 +101,94 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
authToken = strings.Replace(authToken, "Token", "Bearer", 1)
}
ac := authConfig.Load().(map[string]*UserInfo)
ac := *authUsers.Load()
ui := ac[authToken]
if ui == nil {
invalidAuthTokenRequests.Inc()
err := fmt.Errorf("cannot find the provided auth token %q in config", authToken)
if *logInvalidAuthTokens {
err := fmt.Errorf("cannot find the provided auth token %q in config", authToken)
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusUnauthorized,
}
httpserver.Errorf(w, r, "%s", err)
} else {
http.Error(w, err.Error(), http.StatusUnauthorized)
http.Error(w, "Unauthorized", http.StatusUnauthorized)
}
return true
}
processUserRequest(w, r, ui)
return true
}
func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
ui.requests.Inc()
targetURL, headers, err := createTargetURL(ui, r.URL)
if err != nil {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
return true
}
// Limit the concurrency of requests to backends
concurrencyLimitOnce.Do(concurrencyLimitInit)
select {
case concurrencyLimitCh <- struct{}{}:
default:
concurrentRequestsLimitReachedTotal.Inc()
w.Header().Add("Retry-After", "10")
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh)),
StatusCode: http.StatusTooManyRequests,
if err := ui.beginConcurrencyLimit(); err != nil {
handleConcurrencyLimitError(w, r, err)
<-concurrencyLimitCh
return
}
httpserver.Errorf(w, r, "%s", err)
return true
default:
concurrentRequestsLimitReached.Inc()
err := fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh))
handleConcurrencyLimitError(w, r, err)
return
}
processRequest(w, r, targetURL, headers)
processRequest(w, r, ui)
ui.endConcurrencyLimit()
<-concurrencyLimitCh
return true
}
func processRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, headers []Header) {
func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
u := normalizeURL(r.URL)
up, headers := ui.getURLPrefixAndHeaders(u)
isDefault := false
if up == nil {
missingRouteRequests.Inc()
if ui.DefaultURL == nil {
httpserver.Errorf(w, r, "missing route for %q", u.String())
return
}
up, headers = ui.DefaultURL, ui.Headers
isDefault = true
}
r.Body = &readTrackingBody{
r: r.Body,
}
maxAttempts := up.getBackendsCount()
for i := 0; i < maxAttempts; i++ {
bu := up.getLeastLoadedBackendURL()
targetURL := bu.url
// Don't change path and add request_path query param for default route.
if isDefault {
query := targetURL.Query()
query.Set("request_path", u.Path)
targetURL.RawQuery = query.Encode()
} else { // Update path for regular routes.
targetURL = mergeURLs(targetURL, u)
}
ok := tryProcessingRequest(w, r, targetURL, headers)
bu.put()
if ok {
return
}
bu.setBroken()
}
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("all the backends for the user %q are unavailable", ui.name()),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, headers []Header) bool {
// This code has been copied from net/http/httputil/reverseproxy.go
req := sanitizeRequestHeaders(r)
req.URL = targetURL
@@ -142,12 +198,23 @@ func processRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL,
transportOnce.Do(transportInit)
res, err := transport.RoundTrip(req)
if err != nil {
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("error when proxying the request to %q: %s", targetURL, err),
StatusCode: http.StatusBadGateway,
rtb := req.Body.(*readTrackingBody)
if rtb.readStarted {
// Request body has been already read, so it is impossible to retry the request.
// Return the error to the client then.
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot proxy the request to %q: %w", targetURL, err),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
return true
}
httpserver.Errorf(w, r, "%s", err)
return
// Retry the request if its body wasn't read yet. This usually means that the backend isn't reachable.
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
// NOTE: do not use httpserver.GetRequestURI
// it explicitly reads request body and fails retries.
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying the request to %q: %s", remoteAddr, req.URL, targetURL, err)
return false
}
removeHopHeaders(res.Header)
copyHeader(w.Header(), res.Header)
@@ -157,13 +224,13 @@ func processRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL,
copyBuf.B = bytesutil.ResizeNoCopyNoOverallocate(copyBuf.B, 16*1024)
_, err = io.CopyBuffer(w, res.Body, copyBuf.B)
copyBufPool.Put(copyBuf)
_ = res.Body.Close()
if err != nil && !netutil.IsTrivialNetworkError(err) {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
return
return true
}
return true
}
var copyBufPool bytesutil.ByteBufferPool
@@ -269,7 +336,7 @@ func concurrencyLimitInit() {
})
}
var concurrentRequestsLimitReachedTotal = metrics.NewCounter("vmauth_concurrent_requests_limit_reached_total")
var concurrentRequestsLimitReached = metrics.NewCounter("vmauth_concurrent_requests_limit_reached_total")
func usage() {
const s = `
@@ -279,3 +346,37 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
`
flagutil.Usage(s)
}
func handleConcurrencyLimitError(w http.ResponseWriter, r *http.Request, err error) {
w.Header().Add("Retry-After", "10")
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusTooManyRequests,
}
httpserver.Errorf(w, r, "%s", err)
}
type readTrackingBody struct {
r io.ReadCloser
readStarted bool
}
// Read implements io.Reader interface
// tracks body reading requests
func (rtb *readTrackingBody) Read(p []byte) (int, error) {
if len(p) > 0 {
rtb.readStarted = true
}
return rtb.r.Read(p)
}
// Close implements io.Closer interface.
func (rtb *readTrackingBody) Close() error {
// Close rtb.r only if at least a single Read call was performed.
// http.Roundtrip performs body.Close call even without any Read calls
// so this hack allows us to reuse request body
if rtb.readStarted {
return rtb.r.Close()
}
return nil
}

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -1,17 +1,11 @@
package main
import (
"fmt"
"net/url"
"path"
"strings"
)
func (up *URLPrefix) mergeURLs(requestURI *url.URL) *url.URL {
pu := up.getNextURL()
return mergeURLs(pu, requestURI)
}
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
targetURL := *uiURL
targetURL.Path += requestURI.Path
@@ -35,12 +29,26 @@ func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
return &targetURL
}
func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
func (ui *UserInfo) getURLPrefixAndHeaders(u *url.URL) (*URLPrefix, []Header) {
for _, e := range ui.URLMaps {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix, e.Headers
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix, ui.Headers
}
return nil, nil
}
func normalizeURL(uOrig *url.URL) *url.URL {
u := *uOrig
// Prevent from attacks with using `..` in r.URL.Path
u.Path = path.Clean(u.Path)
if !strings.HasSuffix(u.Path, "/") && strings.HasSuffix(uOrig.Path, "/") {
// The path.Clean() removes traling slash.
// The path.Clean() removes trailing slash.
// Return it back if needed.
// This should fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1752
u.Path += "/"
@@ -52,16 +60,5 @@ func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, []Header, error) {
// See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1554
u.Path = ""
}
for _, e := range ui.URLMaps {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix.mergeURLs(&u), e.Headers, nil
}
}
}
if ui.URLPrefix != nil {
return ui.URLPrefix.mergeURLs(&u), ui.Headers, nil
}
missingRouteRequests.Inc()
return nil, nil, fmt.Errorf("missing route for %q", u.String())
return &u
}

View File

@@ -13,10 +13,14 @@ func TestCreateTargetURLSuccess(t *testing.T) {
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, headers, err := createTargetURL(ui, u)
if err != nil {
t.Fatalf("unexpected error: %s", err)
u = normalizeURL(u)
up, headers := ui.getURLPrefixAndHeaders(u)
if up == nil {
t.Fatalf("cannot determie backend: %s", err)
}
bu := up.getLeastLoadedBackendURL()
target := mergeURLs(bu.url, u)
bu.put()
if target.String() != expectedTarget {
t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
}
@@ -119,15 +123,13 @@ func TestCreateTargetURLFailure(t *testing.T) {
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, headers, err := createTargetURL(ui, u)
if err == nil {
t.Fatalf("expecting non-nil error")
}
if target != nil {
t.Fatalf("unexpected target=%q; want empty string", target)
u = normalizeURL(u)
up, headers := ui.getURLPrefixAndHeaders(u)
if up != nil {
t.Fatalf("unexpected non-empty up=%#v", up)
}
if headers != nil {
t.Fatalf("unexpected headers=%q; want empty string", headers)
t.Fatalf("unexpected non-empty headers=%q", headers)
}
}
f(&UserInfo{}, "/foo/bar")

View File

@@ -39,6 +39,9 @@ vmbackup-freebsd-amd64-prod:
vmbackup-openbsd-amd64-prod:
APP_NAME=vmbackup $(MAKE) app-via-docker-openbsd-amd64
vmbackup-windows-amd64-prod:
APP_NAME=vmbackup $(MAKE) app-via-docker-windows-amd64
package-vmbackup:
APP_NAME=vmbackup $(MAKE) package-via-docker
@@ -75,6 +78,9 @@ vmbackup-linux-arm64:
vmbackup-linux-ppc64le:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmbackup-linux-s390x:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmbackup-linux-386:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch
@@ -90,5 +96,8 @@ vmbackup-freebsd-amd64:
vmbackup-openbsd-amd64:
APP_NAME=vmbackup CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 $(MAKE) app-local-goos-goarch
vmbackup-windows-amd64:
GOARCH=amd64 APP_NAME=vmbackup $(MAKE) app-local-windows-goarch
vmbackup-pure:
APP_NAME=vmbackup $(MAKE) app-local-pure

View File

@@ -82,7 +82,7 @@ The command will upload only changed data to `gs://<bucket>/latest`.
Where `<daily-snapshot>` is the snapshot for the last day `<YYYYMMDD>`.
This apporach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (`latest` backup)
This approach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (`latest` backup)
or from any day (`YYYYMMDD` backups). Note that hourly backup shouldn't run when creating daily backup.
Do not forget to remove old backups when they are no longer needed in order to save storage costs.
@@ -103,7 +103,7 @@ The backup algorithm is the following:
7. Delete the created snapshot.
The algorithm splits source files into 1 GiB chunks in the backup. Each chunk is stored as a separate file in the backup.
Such splitting balances between the number of files in the backup and the amounts of data that needs to be re-transfered after temporary errors.
Such splitting balances between the number of files in the backup and the amounts of data that needs to be re-transferred after temporary errors.
`vmbackup` relies on [instant snapshot](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) properties:
@@ -190,7 +190,7 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Where to put the backup on the remote storage. Example: gs://bucket/path/to/backup, s3://bucket/path/to/backup, azblob://container/path/to/backup or fs:///path/to/local/backup/dir
-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
@@ -200,11 +200,11 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
-flagsAuthKey string
Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -255,7 +255,7 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
-pushmetrics.interval duration
Interval for pushing metrics to -pushmetrics.url (default 10s)
-pushmetrics.url array
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default metrics exposed at /metrics page aren't pushed to any remote storage
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags.
-s3ForcePathStyle
Prefixing endpoint with bucket name when set false, true by default. (default true)

View File

@@ -49,29 +49,35 @@ func main() {
logger.Init()
pushmetrics.Init()
// Storing snapshot delete function to be able to call it in case
// of error since logger.Fatal will exit the program without
// calling deferred functions.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2055
deleteSnapshot := func() {}
if len(*snapshotCreateURL) > 0 {
// create net/url object
createUrl, err := url.Parse(*snapshotCreateURL)
createURL, err := url.Parse(*snapshotCreateURL)
if err != nil {
logger.Fatalf("cannot parse snapshotCreateURL: %s", err)
}
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", createUrl.Redacted())
logger.Infof("Snapshot create url %s", createURL.Redacted())
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
if err != nil {
logger.Fatalf("Failed to set snapshot.deleteURL flag: %v", err)
}
}
deleteUrl, err := url.Parse(*snapshotDeleteURL)
deleteURL, err := url.Parse(*snapshotDeleteURL)
if err != nil {
logger.Fatalf("cannot parse snapshotDeleteURL: %s", err)
}
logger.Infof("Snapshot delete url %s", deleteUrl.Redacted())
logger.Infof("Snapshot delete url %s", deleteURL.Redacted())
name, err := snapshot.Create(createUrl.String())
name, err := snapshot.Create(createURL.String())
if err != nil {
logger.Fatalf("cannot create snapshot: %s", err)
}
@@ -80,32 +86,48 @@ func main() {
logger.Fatalf("cannot set snapshotName flag: %v", err)
}
defer func() {
err := snapshot.Delete(deleteUrl.String(), name)
deleteSnapshot = func() {
err := snapshot.Delete(deleteURL.String(), name)
if err != nil {
logger.Fatalf("cannot delete snapshot: %s", err)
}
}()
}
} else if len(*snapshotName) == 0 {
logger.Fatalf("`-snapshotName` or `-snapshot.createURL` must be provided")
}
if err := snapshot.Validate(*snapshotName); err != nil {
logger.Fatalf("invalid -snapshotName=%q: %s", *snapshotName, err)
}
go httpserver.Serve(*httpListenAddr, false, nil)
err := makeBackup()
deleteSnapshot()
if err != nil {
logger.Fatalf("cannot create backup: %s", err)
}
startTime := time.Now()
logger.Infof("gracefully shutting down http server for metrics at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Fatalf("cannot stop http server for metrics: %s", err)
}
logger.Infof("successfully shut down http server for metrics in %.3f seconds", time.Since(startTime).Seconds())
}
func makeBackup() error {
if err := snapshot.Validate(*snapshotName); err != nil {
return fmt.Errorf("invalid -snapshotName=%q: %s", *snapshotName, err)
}
srcFS, err := newSrcFS()
if err != nil {
logger.Fatalf("%s", err)
return err
}
dstFS, err := newDstFS()
if err != nil {
logger.Fatalf("%s", err)
return err
}
originFS, err := newOriginFS()
if err != nil {
logger.Fatalf("%s", err)
return err
}
a := &actions.Backup{
Concurrency: *concurrency,
@@ -114,18 +136,12 @@ func main() {
Origin: originFS,
}
if err := a.Run(); err != nil {
logger.Fatalf("cannot create backup: %s", err)
return err
}
srcFS.MustStop()
dstFS.MustStop()
originFS.MustStop()
startTime := time.Now()
logger.Infof("gracefully shutting down http server for metrics at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Fatalf("cannot stop http server for metrics: %s", err)
}
logger.Infof("successfully shut down http server for metrics in %.3f seconds", time.Since(startTime).Seconds())
return nil
}
func usage() {
@@ -139,7 +155,7 @@ See the docs at https://docs.victoriametrics.com/vmbackup.html .
}
func newSrcFS() (*fslocal.FS, error) {
snapshotPath := *storageDataPath + "/snapshots/" + *snapshotName
snapshotPath := filepath.Join(*storageDataPath, "snapshots", *snapshotName)
// Verify the snapshot exists.
f, err := os.Open(snapshotPath)

View File

@@ -2,7 +2,7 @@
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
RUN apk update && apk upgrade && apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

View File

@@ -48,7 +48,7 @@ Backup manager uploads only the data that has been changed or created since the
This reduces the consumed network traffic and the time needed for performing the backup.
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for details.
*Please take into account that the first backup upload could take a significant amount of time as it needs to upload all of the data.*
*Please take into account that the first backup upload could take a significant amount of time as it needs to upload all the data.*
There are two flags which could help with performance tuning:
@@ -150,6 +150,30 @@ The result on the GCS bucket. We see only 3 daily backups:
![daily](vmbackupmanager_rp_daily_2.png)
### Protection backups against deletion by retention policy
You can protect any backup against deletion by retention policy with the `vmbackupmanager backups lock` command.
For instance:
```console
./vmbackupmanager backup lock daily/2021-02-13 -dst=<DST_PATH> -storageDataPath=/vmstorage-data -eula
```
After that the backup won't be deleted by retention policy.
You can view the `locked` attribute in backup list:
```console
./vmbackupmanager backup list -dst=<DST_PATH> -storageDataPath=/vmstorage-data -eula
```
To remove protection, you can use the command `vmbackupmanager backups unlock`.
For example:
```console
./vmbackupmanager backup unlock daily/2021-02-13 -dst=<DST_PATH> -storageDataPath=/vmstorage-data -eula
```
## API methods
@@ -158,7 +182,24 @@ The result on the GCS bucket. We see only 3 daily backups:
* GET `/api/v1/backups` - returns list of backups in remote storage.
Example output:
```json
[{"name":"daily/2022-11-30","size_bytes":26664689,"size":"25.429Mi"},{"name":"daily/2022-12-01","size_bytes":40160965,"size":"38.300Mi"},{"name":"hourly/2022-11-30:12","size_bytes":5846529,"size":"5.576Mi"},{"name":"hourly/2022-11-30:13","size_bytes":17651847,"size":"16.834Mi"},{"name":"hourly/2022-11-30:13:22","size_bytes":8797831,"size":"8.390Mi"},{"name":"hourly/2022-11-30:14","size_bytes":10680454,"size":"10.186Mi"}]
[{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00"},{"name":"hourly/2023-04-07:11","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:06+00:00"},{"name":"latest","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:04+00:00"},{"name":"monthly/2023-04","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:10+00:00"},{"name":"weekly/2023-14","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:09+00:00"}]
```
> Note: `created_at` field is in RFC3339 format.
* GET `/api/v1/backups/<BACKUP_NAME>` - returns backup info by name.
Example output:
```json
{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00","locked":true}
```
* PUT `/api/v1/backups/<BACKUP_NAME>` - update "locked" attribute for backup by name.
Example request body:
```json
{"locked":true}
```
Example response:
```json
{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00", "locked": true}
```
* POST `/api/v1/restore` - saves backup name to restore when [performing restore](#restore-commands).
@@ -186,7 +227,13 @@ vmbackupmanager backup
vmbackupmanager backup list
List backups in remote storage
vmbackupmanager restore
vmbackupmanager backup lock
Locks backup in remote storage against deletion
vmbackupmanager backup unlock
Unlocks backup in remote storage for deletion
vmbackupmanager restore
Restore backup specified by restore mark if it exists
vmbackupmanager restore get
@@ -211,7 +258,7 @@ It can be changed by using flag:
`vmbackupmanager backup list` lists backups in remote storage:
```console
$ ./vmbackupmanager backup list
[{"name":"daily/2022-11-30","size_bytes":26664689,"size":"25.429Mi"},{"name":"daily/2022-12-01","size_bytes":40160965,"size":"38.300Mi"},{"name":"hourly/2022-11-30:12","size_bytes":5846529,"size":"5.576Mi"},{"name":"hourly/2022-11-30:13","size_bytes":17651847,"size":"16.834Mi"},{"name":"hourly/2022-11-30:13:22","size_bytes":8797831,"size":"8.390Mi"},{"name":"hourly/2022-11-30:14","size_bytes":10680454,"size":"10.186Mi"}]
[{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00"},{"name":"hourly/2023-04-07:11","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:06+00:00"},{"name":"latest","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:04+00:00"},{"name":"monthly/2023-04","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:10+00:00"},{"name":"weekly/2023-14","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:09+00:00"}]
```
### Restore commands
@@ -249,16 +296,16 @@ If restore mark doesn't exist at `storageDataPath`(restore wasn't requested) `vm
1. Run `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00"},{"name":"hourly/2023-04-07:11","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:06+00:00"},{"name":"latest","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:04+00:00"},{"name":"monthly/2023-04","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:10+00:00"},{"name":"weekly/2023-14","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:09+00:00"}]
```
2. Run `vmbackupmanager restore create` to create restore mark:
- Use relative path to backup to restore from currently used remote storage:
```console
$ /vmbackupmanager-prod restore create daily/2022-10-06
$ /vmbackupmanager-prod restore create daily/2023-04-07
```
- Use full path to backup to restore from any remote storage:
```console
$ /vmbackupmanager-prod restore create azblob://test1/vmbackupmanager/daily/2022-10-06
$ /vmbackupmanager-prod restore create azblob://test1/vmbackupmanager/daily/2023-04-07
```
3. Stop `vmstorage` or `vmsingle` node
4. Run `vmbackupmanager restore` to restore backup:
@@ -275,23 +322,24 @@ If restore mark doesn't exist at `storageDataPath`(restore wasn't requested) `vm
```yaml
vmbackup:
restore:
onStart: "true"
onStart:
enabled: "true"
```
See operator `VMStorage` schema [here](https://docs.victoriametrics.com/operator/api.html#vmstorage) and `VMSingle` [here](https://docs.victoriametrics.com/operator/api.html#vmsinglespec).
2. Enter container running `vmbackupmanager`
2. Use `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00"},{"name":"hourly/2023-04-07:11","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:06+00:00"},{"name":"latest","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:04+00:00"},{"name":"monthly/2023-04","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:10+00:00"},{"name":"weekly/2023-14","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:09+00:00"}]
```
3. Use `vmbackupmanager restore create` to create restore mark:
- Use relative path to backup to restore from currently used remote storage:
```console
$ /vmbackupmanager-prod restore create daily/2022-10-06
$ /vmbackupmanager-prod restore create daily/2023-04-07
```
- Use full path to backup to restore from any remote storage:
```console
$ /vmbackupmanager-prod restore create azblob://test1/vmbackupmanager/daily/2022-10-06
$ /vmbackupmanager-prod restore create azblob://test1/vmbackupmanager/daily/2023-04-07
```
4. Restart pod
@@ -305,7 +353,8 @@ Clusters here are referred to as `source` and `destination`.
```yaml
vmbackup:
restore:
onStart: "true"
onStart:
enabled: "true"
```
Note: it is safe to leave this section in the cluster configuration, since it will be ignored if restore mark doesn't exist.
> Important! Use different `-dst` for *destination* cluster to avoid overwriting backup data of the *source* cluster.
@@ -313,13 +362,13 @@ Clusters here are referred to as `source` and `destination`.
2. Use `vmbackupmanager backup list` to get list of available backups:
```console
$ /vmbackupmanager-prod backup list
["daily/2022-10-06","daily/2022-10-10","hourly/2022-10-04:13","hourly/2022-10-06:12","hourly/2022-10-06:13","hourly/2022-10-10:14","hourly/2022-10-10:16","monthly/2022-10","weekly/2022-40","weekly/2022-41"]
[{"name":"daily/2023-04-07","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:07+00:00"},{"name":"hourly/2023-04-07:11","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:06+00:00"},{"name":"latest","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:04+00:00"},{"name":"monthly/2023-04","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:10+00:00"},{"name":"weekly/2023-14","size_bytes":318837,"size":"311.4ki","created_at":"2023-04-07T16:15:09+00:00"}]
```
3. Use `vmbackupmanager restore create` to create restore mark at each pod of the *destination* cluster.
Each pod in *destination* cluster should be restored from backup of respective pod in *source* cluster.
For example: `vmstorage-source-0` in *source* cluster should be restored from `vmstorage-destination-0` in *destination* cluster.
```console
$ /vmbackupmanager-prod restore create s3://source_cluster/vmstorage-source-0/daily/2022-10-06
$ /vmbackupmanager-prod restore create s3://source_cluster/vmstorage-source-0/daily/2023-04-07
```
## Monitoring
@@ -374,7 +423,7 @@ command-line flags:
-dst string
The root folder of Victoria Metrics backups. Example: gs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
Whether to enable IPv6 for listening and dialing. By default, only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
@@ -384,11 +433,11 @@ command-line flags:
-flagsAuthKey string
Auth key for /flags endpoint. It must be passed via authKey query arg. It overrides httpAuth.* settings
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default, compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
@@ -444,16 +493,16 @@ command-line flags:
-pushmetrics.interval duration
Interval for pushing metrics to -pushmetrics.url (default 10s)
-pushmetrics.url array
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default metrics exposed at /metrics page aren't pushed to any remote storage
Optional URL to push metrics exposed at /metrics page. See https://docs.victoriametrics.com/#push-metrics . By default, metrics exposed at /metrics page aren't pushed to any remote storage
Supports an array of values separated by comma or specified via multiple flags.
-runOnStart
Upload backups immediately after start of the service. Otherwise the backup starts on new hour
Upload backups immediately after start of the service. Otherwise, the backup starts on new hour
-s3ForcePathStyle
Prefixing endpoint with bucket name when set false, true by default. (default true)
-snapshot.createURL string
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup.Example: http://victoriametrics:8428/snapshot/create
-snapshot.deleteURL string
VictoriaMetrics delete snapshot url. Optional. Will be generated from snapshot.createURL if not provided. All created snaphosts will be automatically deleted.Example: http://victoriametrics:8428/snapshot/delete
VictoriaMetrics delete snapshot url. Optional. Will be generated from snapshot.createURL if not provided. All created snapshots will be automatically deleted.Example: http://victoriametrics:8428/snapshot/delete
-storageDataPath string
Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data")
-tls

View File

@@ -78,6 +78,9 @@ vmctl-linux-arm64:
vmctl-linux-ppc64le:
APP_NAME=vmctl CGO_ENABLED=0 GOOS=linux GOARCH=ppc64le $(MAKE) app-local-goos-goarch
vmctl-linux-s390x:
APP_NAME=vmctl CGO_ENABLED=0 GOOS=linux GOARCH=s390x $(MAKE) app-local-goos-goarch
vmctl-linux-386:
APP_NAME=vmctl CGO_ENABLED=0 GOOS=linux GOARCH=386 $(MAKE) app-local-goos-goarch

Some files were not shown because too many files have changed in this diff Show More