Compare commits

...

712 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
08234aa7a0 docs/CHANGELOG.md: cut v1.60.0 2021-05-24 15:55:08 +03:00
Aliaksandr Valialkin
a47d4927d2 docs/Single-server-VictoriaMetrics.md: clarify that the storage size depends on the number of samples per series 2021-05-24 15:47:44 +03:00
Aliaksandr Valialkin
b1e8d92577 docs/vmalert.md: sync with app/vmalert/README.md via make docs-sync 2021-05-24 15:47:20 +03:00
Aliaksandr Valialkin
8e7d1f8824 app/vmagent/remotewrite: use WARN level instead of ERROR level for couldnt send a block with size ... bytes to ... log message
This is really warning, since vmagent re-tries sending the data block until success.
2021-05-24 15:43:59 +03:00
Aliaksandr Valialkin
39ef1e7a51 lib/storage: do not stop data ingestion on the first error in Storage.AddRows
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:47 +03:00
Aliaksandr Valialkin
4b01c9fb2e lib/storage: limit the number of rows per each block in Storage.AddRows()
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:24:07 +03:00
Aliaksandr Valialkin
a4ff4b8e65 lib/storage: allow filling all the rows up to their capacity in rawRowsShard.addRows
This should reduce memory usage a bit on data ingestion path
2021-05-24 15:22:59 +03:00
Aliaksandr Valialkin
a46551245c lib/bloomfilter: fix TestLimiterConcurrent 2021-05-24 05:17:36 +03:00
Aliaksandr Valialkin
eafed5335d vendor: update github.com/valyala/gozstd from v1.10.0 to v1.11.0 2021-05-24 04:59:55 +03:00
Aliaksandr Valialkin
93d81b486d lib/fs: do not pass done callback to tryRemoveAll() func
This improves code readability a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-24 04:51:57 +03:00
Aliaksandr Valialkin
f54133b200 lib/storage: do not populate MetricID->MetricName cache during data ingestion
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.

This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin
24858820b5 docs/CHANGELOG.md: small typo fix 2021-05-23 14:15:01 +03:00
Aliaksandr Valialkin
04eb37a590 docs/CHANGELOG.md: document the addition of extra_filter_labels at 84cc0513e1 2021-05-23 14:10:33 +03:00
Aliaksandr Valialkin
ec79abc382 lib/{mergeset,storage}: reduce the number of IFNO log messages like merged ... items across ... blocks in ... seconds
Log these messages if the merge takes more than 30 seconds instead of 10 seconds.
2021-05-23 14:03:21 +03:00
Roman Khavronenko
84cc0513e1 vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 00:26:01 +03:00
Aliaksandr Valialkin
78dddfb98f lib/promauth: follow-up after 5b8176c68e 2021-05-22 18:01:11 +03:00
Nikolay
5b8176c68e basic OAuth2 support for remoteWrite and scrape targets (#1316)
* adds OAuth2 support for remoteWrite and scrapping

* adds tests
changes init
2021-05-22 16:20:18 +03:00
Aliaksandr Valialkin
e05dd475f0 lib/fs: concurrently remove up to 1024 blocked NFS directories
Previously the blocked directories were removed sequentially by a single goroutine.
This can be not enough for highly loaded VictoriaMetrics that accepts millions of sample per second,
when big number of LSM parts are created and removed at high rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:57:46 +03:00
Aliaksandr Valialkin
8e2985b53d lib/fs: wait for a while before giving up on NFS file removal if the removal queue is full
This should reduce the probability of the panic on a highly loaded VictoriaMetrics
accepting millions of samples per second.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:21:00 +03:00
Aliaksandr Valialkin
d173a9348c docs/MetricsQL.md: add a link to a list of supported timezones that can be passed to timezone_offset() function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306
2021-05-21 16:55:24 +03:00
Aliaksandr Valialkin
14ec2b9f26 docs/CHANGELOG.md: mention the bugfix from d626c5c2a9
Updates https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 16:37:14 +03:00
Aliaksandr Valialkin
c54bb73867 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:34:06 +03:00
Nikolay
d626c5c2a9 changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 13:55:43 +03:00
Aliaksandr Valialkin
060664e221 docs/FAQ.md: re-order questions to be more attractive to visitors 2021-05-20 19:49:34 +03:00
Aliaksandr Valialkin
49ecbc765d app/vmauth: add ability to protect /-/reload endpoint with authKey 2021-05-20 18:47:01 +03:00
Aliaksandr Valialkin
362a49bdd1 vendor: make vendor-update 2021-05-20 18:46:26 +03:00
Aliaksandr Valialkin
4c7bb75fa2 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:27:10 +03:00
Aliaksandr Valialkin
0d3e78b9ee docs/CHANGELOG.md: move tip to proper place 2021-05-20 17:56:03 +03:00
Aliaksandr Valialkin
480087944a docs/FAQ.md: add a question on how to run VictoriaMetrics on FreeBSD
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:16:35 +03:00
Aliaksandr Valialkin
49c39ab388 docs/FAQ.md: add can I use VictoriaMetrics instead of Prometheus?
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:10:36 +03:00
Aliaksandr Valialkin
6cc6a032cd docs/FAQ.md: add a question about memory limits for VictoriaMetrics components
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:09:41 +03:00
Aliaksandr Valialkin
d06aae9454 docs/FAQ.md: add a question about multi-tenancy
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 15:52:40 +03:00
Aliaksandr Valialkin
e394ff6466 app/vmagent/remotewrite: expose metrics with the current number of active series per day and per hour
These numbers are exposed via the following metrics:

- vmagent_hourly_series_limit_current_series
- vmagent_daily_series_limit_current_series

Expose also the limits via the following metrics:

- vmagent_hourly_series_limit_max_series
- vmagent_daily_series_limit_max_series
2021-05-20 15:28:09 +03:00
Aliaksandr Valialkin
ad73f226ff app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin
7e526effaa app/vmagent: add ability to limit series cardinality on a per-hour and per-day basis 2021-05-20 13:13:40 +03:00
Aliaksandr Valialkin
3cd8606abd docs/CHANGELOG.md: document the bugfix in vmctl import for InfluxDB lines with identical names for field and tag
See dcf8803bbd

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299
2021-05-20 12:06:16 +03:00
Aliaksandr Valialkin
98e425ee09 docs/CHANGELOG.md: refer to the issue related to timezone_offset() function 2021-05-20 12:03:39 +03:00
Roman Khavronenko
dcf8803bbd vmctl: explicitly set ::tag type for labels selector in influx mode (#1310)
The `::tag` type is needed in cases when field and tag names are equal, which
results into unexpected results in InfluxQL. Setting the type explicitly helps
InfluxDB to understand which exact column we apply filter to.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299
2021-05-20 12:03:16 +03:00
Aliaksandr Valialkin
22585531ad lib/promscrape/discovery/kubernetes: make golangci-lint happy by removing empty branches 2021-05-20 12:00:29 +03:00
Aliaksandr Valialkin
0842bb9294 app/vmselect/promql: add timezone_offset(tz) function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306
2021-05-20 11:53:09 +03:00
Aliaksandr Valialkin
009e136d88 lib/storage: remove possible data race when logging dropped labels 2021-05-20 02:47:22 +03:00
Aliaksandr Valialkin
f7e73fbe8b app/vmagent/remotewrite: sort labels before sending the series to per-remoteWrite.url queues 2021-05-20 02:12:36 +03:00
Aliaksandr Valialkin
eb8093ca6b lib/promscrape/discovery/kubernetes: reload objects on object parse error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 23:25:48 +03:00
Neo He
e8a6c6927d app/{vmbackup,vmrestore},docs/vmrestore.md: typo fix: vbackup -> vmbackup (#1305) 2021-05-18 16:37:28 +03:00
Aliaksandr Valialkin
f4719889da lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:26:16 +03:00
Aliaksandr Valialkin
b30925738b docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:26:14 +03:00
Aliaksandr Valialkin
0cf1fe19e6 lib/promscrape/discovery/kubernetes: simplify the reload logic for urlWatcher.objectsByKey 2021-05-18 15:40:46 +03:00
Aliaksandr Valialkin
bfb61de606 lib/promscrape/discovery/kubernetes: properly update vm_promscrape_discovery_kubernetes_scrape_works metric
Previously it wasn't descreased during config update.
2021-05-18 15:33:45 +03:00
Aliaksandr Valialkin
3166994244 lib/promscrape/discovery/kubernetes: log errors and stop service discovery when unexpected updates are received from Kubernetes API server
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 15:10:48 +03:00
Aliaksandr Valialkin
66aba00549 app/vmauth: reload -auth.config on the request to /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1194
2021-05-18 02:23:55 +03:00
Aliaksandr Valialkin
3339ea41e7 docs/vmbackup.md: typo fix: snaphosts -> snapshots
Thanks to @jelmd - see 1ab27582a3 (r50884395)
2021-05-18 01:12:36 +03:00
Aliaksandr Valialkin
6c944b86d8 docs: dealay -> delay
Thanks to @jelmd . See 0b7e3510c8 (r50884991)
2021-05-18 01:07:52 +03:00
Aliaksandr Valialkin
ec6a284978 lib/promrelabel: add tests for conditional removal of label on another label match
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1294
2021-05-18 00:23:03 +03:00
Aliaksandr Valialkin
2eed410466 lib/promscrape/discovery/kubernetes: key ScrapeWork objects by urlWatcher instead of namespace
This makes the code less fragile if urlWatcher would depend on additional to namepsace properties.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-17 23:47:08 +03:00
Aliaksandr Valialkin
ede2ba5a45 docs/CHANGELOG.md: document b38edec7ee
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-17 01:57:34 +03:00
Aliaksandr Valialkin
55ed77760b vendor: make vendor-update 2021-05-17 01:47:33 +03:00
Aliaksandr Valialkin
2ec6f81b10 vendor: update github.com/valyala/gozstd from v1.9.0 to v1.10.0 2021-05-17 01:47:33 +03:00
Roman Khavronenko
b96b19f040 Docs update from victoriaMetrics.github.io (#1302)
* port change from 11ca65677b

* port change from afb41dfa43

* port change from f82e3733c9

* port change from d499ab0502
2021-05-16 18:43:09 +01:00
Roman Khavronenko
b38edec7ee vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-15 13:25:57 +03:00
Aliaksandr Valialkin
733706e6c6 lib/promscrape: reload auth tokens from files every second
Previously auth tokens were loaded at startup and couldn't be updated without vmagent restart.
Now there is no need in vmagent restart.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297
2021-05-14 20:00:08 +03:00
Aliaksandr Valialkin
a15145e597 docs/Articles.md: add a link to https://fly.io/blog/measuring-fly/ 2021-05-14 19:47:43 +03:00
Aliaksandr Valialkin
24cbf8b66c docs/vmauth.md: sync with app/vmauth/README.md after 10a47af631 2021-05-14 18:13:10 +03:00
Aliaksandr Valialkin
10a47af631 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:12:24 +03:00
Aliaksandr Valialkin
e6ec442e96 docs/Single-server-VictoriaMetrics.md: document how to reduce memory usage when importing too long JSON lines into VictoriaMetrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1295
2021-05-14 17:22:08 +03:00
Denys Holius
70165a6758 Fix Cortex typo 2021-05-13 16:26:05 +03:00
Aliaksandr Valialkin
a422165dc6 app/vmagent/remotewrite: clarify the comment explaining why vmagent drops blocks if remote storage returns 400 or 409 status code 2021-05-13 16:16:16 +03:00
Aliaksandr Valialkin
fc3519fa26 lib/promscrape: limit scrape_timeout by scrape_interval like Prometheus does 2021-05-13 16:09:45 +03:00
Aliaksandr Valialkin
1f75ae6006 docs/CHANGELOG.md: document the bugfix from b4f5be8bd8 2021-05-13 11:18:39 +03:00
匠心零度
b4f5be8bd8 fix vagent imbalance problem (#1292)
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2021-05-13 11:14:51 +03:00
Aliaksandr Valialkin
e6fda03e8f vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.14 to v1.0.15
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:44:14 +03:00
Aliaksandr Valialkin
f6a641de62 lib/promscrape: exponentially increase retry interval on unsuccesful requests to scrape targets or to service discovery services
This should reduce CPU load at vmagent and at remote side when the remote side doesn't accept HTTP requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:38:50 +03:00
Aliaksandr Valialkin
c0ec541559 lib/cgroup: document the ability to detect cgroup v2 memory and cpu limits. This is follow-up for b50024812e 2021-05-13 09:26:20 +03:00
Aliaksandr Valialkin
d7be2753c0 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:02:33 +03:00
Nikolay
b50024812e adds cgroupsv2 support (#1283)
* adds cgroupv2 limits support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269

* small fix

* changes Atoi to ParseUint
2021-05-13 09:02:13 +03:00
Aliaksandr Valialkin
a22a17dc66 lib/storage: merge getTSDBStatusForDate with getTSDBStatusWithFiltersForDate
These functions are non-trivial, while their code has minimal differences.
It is better from maintainability PoV to merge these functions into a single function.
2021-05-12 17:56:53 +03:00
Aliaksandr Valialkin
beddc0c0d5 docs/Single-server-VictoriaMetrics.md: typo fix: retuns->returns 2021-05-12 17:22:34 +03:00
Aliaksandr Valialkin
832651c6c2 app/vmselect: follow up after 8a0678678b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168
2021-05-12 17:18:30 +03:00
Nikolay
8a0678678b Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 15:18:45 +03:00
Aliaksandr Valialkin
cfd6aa28e1 lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices scrape targets every 5 seconds, since they may depend on changed service and pod objects
This should make endpoints and endpointslices scrape targets eventually consistent with the maximum delay of 5 seconds after the related service or pod object changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-12 14:10:34 +03:00
Aliaksandr Valialkin
afec68ad13 lib/httpserver: add new X-Server-Hostname header instead of overwriting already exsiting header
This makes possible tracking origins of chained requests over multiple hops.
2021-05-11 23:48:59 +03:00
Aliaksandr Valialkin
f2d5c4e2d0 lib/httpserver: return X-Server-Hostname http header in all the responses for better debuggability 2021-05-11 22:03:48 +03:00
Aliaksandr Valialkin
33f7bacb01 lib/storage: properly apply time range when matching an empty filter
It must match all the time series on the given time range.
Previously it was matched to all the time series without the restriction on the given time range.
2021-05-11 01:12:05 +03:00
Aliaksandr Valialkin
3fb3ce2a6d Revert ".github/dependabot.yml: remove automated dependency version checks"
This reverts commit 5b986c95dd.

This check verifies only dependencies needed for github-actions. This is OK.
2021-05-10 12:05:09 +03:00
Aliaksandr Valialkin
8b65920a8b Add make check-licenses rule for the ability to manually check licenses in vendored dependencies
This is a follow-up for c687536956
2021-05-10 11:54:43 +03:00
Aliaksandr Valialkin
5b986c95dd .github/dependabot.yml: remove automated dependency version checks
Dependency updates must be under manual control, since the resulting code diffs must be reviewed manually for the sake of security.
It is done with `make vendor-update` now.
2021-05-10 11:41:23 +03:00
Artem Navoiev
c687536956 Add vendor license checker, update codecov action, add dependbot for … (#1280)
* Add vendor license checker, update codecov action, add dependbot for github actions

* update gitingore, temprorary turn on check

* fix action name

* change action rules to trigger only when vendor changes

* remove obsolete line from main action
2021-05-10 11:38:56 +03:00
Aliaksandr Valialkin
229d9d6dd7 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:26 +03:00
Roman Khavronenko
5c448126dc vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:11:45 +03:00
Aliaksandr Valialkin
3b0966c00c docs/CHANGELOG.md: document vmalert fix for state restoration on startup 2021-05-10 11:10:04 +03:00
Roman Khavronenko
4247168a2d vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:06:31 +03:00
Aliaksandr Valialkin
6ff19096be docs/vmagent.md: add stream parsing mode chapter 2021-05-08 23:14:07 +03:00
Aliaksandr Valialkin
cbd0569ce2 docs/CHANGELOG.md: mention the comment, which gives an example of multi-level vminsert setup 2021-05-08 22:57:04 +03:00
Aliaksandr Valialkin
120927ab65 vendor: make vendor-update 2021-05-08 20:21:57 +03:00
Aliaksandr Valialkin
75c2c813fc lib/storage: remove dead code after the commit 3ccf7ea20c 2021-05-08 20:14:11 +03:00
Aliaksandr Valialkin
f8d50e9641 deployment/dm: update Go builder from v1.16.3 to v1.16.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.4+label%3ACherryPickApproved for details
2021-05-08 20:04:05 +03:00
Aliaksandr Valialkin
7aea5f58c4 lib/ingestserver: properly close incoming connections during graceful shutdown 2021-05-08 19:52:58 +03:00
Aliaksandr Valialkin
12d733dd5d app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:52:57 +03:00
Aliaksandr Valialkin
237e9f9fd7 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 15:59:35 +03:00
Aliaksandr Valialkin
d6439490f5 docs/Single-server-VictoriaMetrics.md: add links to vmauth and vmgateway as auth proxy examples 2021-05-07 10:46:05 +03:00
Aliaksandr Valialkin
79ec35cef9 app/vmbackup: make sure that -snapshotName isnt set if -snapshot.createURL is set 2021-05-07 08:44:25 +03:00
Aliaksandr Valialkin
d9e3872b1c docs/CHANGELOG.md: document 904bbffc7f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 20:33:05 +03:00
Aliaksandr Valialkin
4bea1afc6d docs/CHANGELOG.md: document 9cdd4696fe
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 20:28:58 +03:00
Aliaksandr Valialkin
904bbffc7f lib/promscrape/discovery/kubernetes: start watchers for pods and services before starting watchers for endpoints
This should eliminate possible race when an update on endpoints depends on pods and/or services, which are missing in the cache yet.
This could result in missing targets based on endpoints or endpointslices.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 12:23:50 +03:00
Roman Khavronenko
9cdd4696fe vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 10:07:19 +03:00
Denis Fondras
cdd1b06473 Update to OpenBSD 6.9 and VictoriaMetrics 1.59 (#1263)
* Update to OpenBSD 6.9 and VictoriaMetrics 1.58

While at it, also build vmagent

* Bump to v1.59 and remove zstd dependency
2021-05-03 14:14:13 +03:00
Aliaksandr Valialkin
76027093ca lib/storage: use WARNING instead of INFO level for logging dropped labels 2021-05-03 13:56:43 +03:00
Aliaksandr Valialkin
ce9e163e94 lib/httpserver: stop the process on panics in request handlers
Panics may leave the process in inconsistent state. That's why it is better to stop the process after the panic
instead of recovering from the panic. Unfortunately, the standard net/http.Server recovers panics in request handlers.
See https://github.com/golang/go/issues/16542 . That's lib/httpserver must stop the process on itself after the panic.
2021-05-03 11:59:40 +03:00
Aliaksandr Valialkin
988c3b386f docs/CHANGELOG.md: document the bugfix for proper removal of stale parts (477369b62f) 2021-05-03 11:38:15 +03:00
Nikolay
477369b62f adds stalePartsRemover (#1261)
for new created partitions
2021-05-03 11:34:00 +03:00
Aliaksandr Valialkin
9a930720b3 lib/promrelabel: add tests for removing the specified {label="value"} pair 2021-05-03 11:26:53 +03:00
Aliaksandr Valialkin
0969b446b3 deployment/docker: update base docker image from alpine:3.13.2 to alpine:3.13.5 2021-05-01 10:50:09 +03:00
Aliaksandr Valialkin
fb097ff774 docs/CHANGELOG.md: cut v1.59.0 2021-05-01 09:39:48 +03:00
Aliaksandr Valialkin
4e2ea844ca vendor: make vendor-update 2021-05-01 09:39:26 +03:00
Aliaksandr Valialkin
58d1e6eeea lib/storage: log dropped labels if the number of labels in a metric exceeds -maxLabelsPerTimeseries command-line flag value
This should improve debuggability for this case.
2021-05-01 09:27:55 +03:00
Aliaksandr Valialkin
508f500477 docs/vmalert.md: update docs after afca7b430c 2021-04-30 11:49:01 +03:00
Aliaksandr Valialkin
86bdb5ea95 docs/Cluster-VictoriaMetrics.md: document api/v1/series/count endpoint 2021-04-30 11:46:14 +03:00
Roman Khavronenko
0f988e5a31 vmalert: fix the typo in ApplyParams func (#1259) 2021-04-30 10:31:45 +03:00
Roman Khavronenko
afca7b430c vmalert: use rule's evaluationInterval as step param by default (#1258)
User still can override param by specifying `datasource.queryStep` flag.
2021-04-30 10:01:05 +03:00
Aliaksandr Valialkin
4394dc6cbb docs/CHANGELOG.md: document the change from f3a048288e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
2021-04-30 09:54:41 +03:00
Roman Khavronenko
f3a048288e Vmalert: adjust time param for datasource queries according to evaluationInterval (#1257)
* Simplify arguments list for fn `queryDataSource` to improve readbility

* vmalert: adjust `time` param according to rule evaluation interval

With this change, vmalert will start to use rule's evaluation interval
for truncating the `time` param. This is mostly needed to produce consistent
time series with timestamps unaffected by vmalert start time. Now, timestamp
becomes predictable.
Additionally, adjustment is similar to what Grafana does for plotting range graphs.
Hence, recording rule series and recording rule expression plotted in grafana
suppose to become similar in most of cases.
2021-04-30 09:46:03 +03:00
Aliaksandr Valialkin
6fa5981e68 app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:36:43 +03:00
Aliaksandr Valialkin
2ab1266593 lib/promscrape/discovery/kubernetes: remove a mutex at urlWatcher - use groupWatcher mutex for accessing all the urlWatcher children
This simplifies the code a bit and reduces the probability of improper mutex handling and deadlocks.
2021-04-29 10:14:26 +03:00
Aliaksandr Valialkin
25b8d71df5 vendor: update github.com/klauspost/compress from v1.12.1 to v1.12.2 2021-04-29 10:12:50 +03:00
Nikolay
4e5a88114a vmagent kubernetes_sd tests (#1253)
* first part of tests for kubernetes sd

* makes linter happy

* added more test cases

* adds pub/sub for tests
2021-04-29 10:10:24 +03:00
Nikolay
15609ee447 changes vmalert Querier with per rule querier (#1249)
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
2021-04-28 21:41:15 +01:00
Aliaksandr Valialkin
87179c6839 lib/{storage,mergeset}: fix unaligned 64-bit atomic operation panic for 32-bit architectures
The panic has been introduced in 56b6b893ce
2021-04-27 16:41:32 +03:00
Aliaksandr Valialkin
56b6b893ce lib/mergeset: split rows ingestion among multiple shards
This improves rows ingestion on systems with many CPU cores by reducing lock contention.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244

Thanks to @waldoweng for the original idea and draft implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1243
2021-04-27 15:36:34 +03:00
Aliaksandr Valialkin
2ec7d8b384 lib/promscrape/discovery/kubernetes: fix a deadlock introduced in eddba29664
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240

Thanks to @f41gh7 for providing the initial idea for deadlock fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1248
2021-04-27 14:57:51 +03:00
Aliaksandr Valialkin
f89c1f7f49 lib/storage: typo fix in info message when deleting the part outside the configured retention
Previously the message was displaying incorrect retention time
2021-04-27 13:32:46 +03:00
Aliaksandr Valialkin
60947fb2d5 lib/persistentqueue: eliminate possible data race when obtaining vm_persistentqueue_bytes_pending metric value 2021-04-27 00:25:52 +03:00
Roman Khavronenko
87018650dd vmalert: keep the returned timestamp when persisting recording rule (#1245)
Previously, vmalert used `lastExecTime` timestamp when writing recording rules
to the remote storage. This may be incorrect, if vmalert uses `datasource.lookback` flag,
which means rule's expression will be executed at some moment in the past.
To avoid such situations, vmalert now will use returned timestamp instead of `lastExecTime`.
2021-04-26 01:03:22 +03:00
Roman Khavronenko
d6f44977a7 docs: update per tenant stats page (#1246) 2021-04-25 09:34:28 +01:00
Aliaksandr Valialkin
00deb69e28 docs: ordering fix 2021-04-24 02:28:53 +03:00
Aliaksandr Valialkin
bf2439f962 vendor: make vendor-update 2021-04-24 01:34:07 +03:00
Aliaksandr Valialkin
20b77abc17 docs: update docs order 2021-04-24 01:27:13 +03:00
Aliaksandr Valialkin
e9898e1772 docs/Single-server-VictoriaMetrics.md: mention that the native export format can change between releases
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1203
2021-04-24 01:22:54 +03:00
Aliaksandr Valialkin
52b4b1605e app/vmagent/remotewrite: increase the maximum possible number of inmemory blocks for systems with high amounts of RAM
This should reduce the probability of using much slower file-based persistent queue
when vmagent processes metrics at high rate (millions of metrics per second).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1235
2021-04-23 22:03:51 +03:00
Aliaksandr Valialkin
fb37b853e9 app/vmagent/remotewrite: count maxLabelsPerBlock as 10x of maxRowsPerBlock
This should increase block sizes and subsequently increase the maximum possible bandwidth per each connection to remote storage.
This, in turn, should reduce the probability of storing the data in local buffers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1235
2021-04-23 21:55:47 +03:00
Aliaksandr Valialkin
908e35affd lib/promscrape: apply scrape_timeout on receiving the first response byte for stream_parse: true scrape targets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017#issuecomment-767235047
2021-04-23 21:53:35 +03:00
Aliaksandr Valialkin
eddba29664 lib/promscrape/discovery/kubernetes: refresh role: endpoints targets on service object removal as Prometheus does
This is a follow-up for ae37cfd528

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:26:57 +03:00
Aliaksandr Valialkin
ae37cfd528 lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices targets on service object update like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:11:40 +03:00
Aliaksandr Valialkin
f7d7236b44 Makefile: remove trailing whitespace 2021-04-23 11:05:57 +03:00
Aliaksandr Valialkin
478c56f281 docs/Cluster-VictoriaMetrics.md: sync with cluster branch 2021-04-22 14:49:16 +03:00
Aliaksandr Valialkin
758de04440 docs: add missing images for vmbackupmanager.md 2021-04-22 14:49:14 +03:00
Aliaksandr Valialkin
4c4c383f30 docs/vmbackupmanager.md: sync with app/vmbackupmanager/README.md 2021-04-22 14:44:07 +03:00
Aliaksandr Valialkin
bbebdf9ba1 lib/{storage,mergeset}: remove empty directories on startup. Such directories can be left after unclean shutdown on NFS storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1142
2021-04-22 13:02:44 +03:00
Aliaksandr Valialkin
1ab27582a3 docs/vmbackup.md: sync with app/vmbackup/README.md 2021-04-22 11:18:51 +03:00
Aliaksandr Valialkin
1fbc04facd app/vmbackup: typo fix: snaphsot -> snapshot
Follow-up for 9de0fa3649
2021-04-22 11:16:56 +03:00
Denys Holius
9de0fa3649 fixed typo; added example of usage vmbackup in docker-compose (#1237) 2021-04-22 11:14:13 +03:00
Aliaksandr Valialkin
7c4e460513 app/vmauth: parse url_prefix only once during config load 2021-04-21 10:55:29 +03:00
Aliaksandr Valialkin
6bc52fe41a all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:16:17 +03:00
Aliaksandr Valialkin
5720b283fa all: consistency renaming Victoria Metrics -> VictoriaMetrics
VMInsert -> vminsert
VMSelect -> vmselect
VMStorage -> vmstorage
2021-04-20 11:45:48 +03:00
Denys Holius
9f9c99c8c0 removed DS_Store files from VM_logo.zip (#1233) 2021-04-20 11:41:07 +03:00
Aliaksandr Valialkin
187e3ec909 app/vmauth: follow-up for 6a81a89b3d 2021-04-20 10:58:29 +03:00
Nikolay
6a81a89b3d adds query params support for vmauth urlPrefix (#1226)
* adds query params support for vmauth urlPrefix

* Update app/vmauth/example_config.yml

* Update app/vmauth/example_config.yml

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-20 10:51:03 +03:00
Roman Khavronenko
b955fe0038 dashboard: use unit short for Labels limit exceeded panel (#1227) 2021-04-19 13:33:21 +03:00
Aliaksandr Valialkin
cc94afeacc docs/Cluster-VictoriaMetrics.md: sync with cluster README.md 2021-04-19 13:31:55 +03:00
Roman Khavronenko
f80156d9df dashboard: fix avg GC duration expression (#1228)
Previous expression was not correct.
2021-04-19 13:28:41 +03:00
Aliaksandr Valialkin
90bba22c25 .github/workflows: remove CODECOV_TOKEN 2021-04-19 11:11:14 +03:00
Aliaksandr Valialkin
b3d88610fd app/vmctl: update README.md according to bfecd0fd55 2021-04-16 12:09:45 +03:00
Denys Holius
650c143f6d removed MACOSX dirrectory from VM_logo.zip (#1224) 2021-04-16 07:30:15 +01:00
Denys Holius
bfecd0fd55 Added some explanation to vmctl docs (#1217)
Added explanation that vmctl do not make snapshot from Prometheus when run migration data
2021-04-15 18:40:48 +01:00
dereksfoster99
cf726448f2 Update PerTenantStatistic.md 2021-04-14 20:40:53 +03:00
Aliaksandr Valialkin
8d244c5f7f vendor: update github.com/klauspost/compress from v1.11.13 to v1.12.1 2021-04-14 14:20:56 +03:00
Aliaksandr Valialkin
574b80f274 docs/Cluster-VictoriaMetrics.md: mention that sometimes -replicationFactor shouldnt be set at vmselect nodes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
2021-04-14 13:07:59 +03:00
ArtemVoitsekhovskyi
403a7b4a1f Grammar-correction (#1210) 2021-04-14 12:45:22 +03:00
ArtemVoitsekhovskyi
294c8b747b Grammar_correction (#1211) 2021-04-14 12:44:26 +03:00
Aliaksandr Valialkin
9b1999a5ff lib/uint64set: remove memory allocation in getOrCreateSmallPool()
This partially reverts fb82c4b9fa

It has been appeared that the additional memory allocation may result in higher GC pauses.
It is better to spend CPU time on copying bigger bucket16 structs instead of increasing query latencies due to higher GC pauses
2021-04-14 12:30:19 +03:00
f41gh7
e5db518c0f updates per-tenant stats docs
changes docs order
changes per-tenant-stats pic
2021-04-14 12:14:51 +03:00
Artem Navoiev
e9ee2122df [draft] per tenant statistic (#121)
* [draft] per tenant statistic

* updates metric name
update graph
adds link and example config

* quick fix

* adds grafana dashboard
adds example alert

Co-authored-by: f41gh7 <nik@victoriametrics.com>
2021-04-14 11:23:07 +03:00
Aliaksandr Valialkin
251747e253 lib/storage: code clarification: remove caching the found metricName in searchMetricName 2021-04-13 10:22:21 +03:00
Aliaksandr Valialkin
663a91bb82 docs: update -help output after the commit 77be3e3a82 2021-04-12 12:34:59 +03:00
Artem Navoiev
77be3e3a82 improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:04 +03:00
Aliaksandr Valialkin
0b7e3510c8 docs: make docs-sync 2021-04-10 19:54:18 +03:00
Artem Navoiev
3ed113b390 update vmgateway.md (#1201) 2021-04-10 19:52:33 +03:00
Lapo Luchini
3d20fd20d0 Allow specifying build date from a variable (#1200)
Just like the existing infrastructure for `BUILDINFO_TAG`, this can ease the production of [reproducible builds](https://wiki.freebsd.org/ReproducibleBuilds).
(e.g. in FreeBSD the date the port was committed is used at build time, not the actual build time, so that an identical port produced at different times produces an identical executable)
2021-04-10 15:46:53 +03:00
Roman Khavronenko
7644efae01 Docs update (#1199)
* docs: drop table of contents for `vmctl`

We already have it autogenerated on .github.io, so no need to keep it.

* docs: mention OpenTSDB migration feature for vmctl

* docs: sync docs for `vmalert`
2021-04-10 15:46:08 +03:00
John Seekins
a9d76b06a7 Improve documentation on OpenTSDB migration tool and fix a bug with hard offsets (#1198)
* add more documentation on OpenTSDB migration explaining what chunking means
* more clarification of OpenTSDB aggregations
* break out what a retention string becomes
* add more docs around retention strings
* add example of running program and fix mistake in how hard offsets are handled
* fix formatting
2021-04-10 13:24:18 +01:00
John Seekins
cd9786c2a7 OpenTSDB migration to VictoriaMetrics (#1089) 2021-04-08 20:58:06 +01:00
Roman Khavronenko
162681e60d add new alerts (#1195)
* alerts: backport `DiskRunsOutOfSpace` alert and some other tweaks from cluster branch

* alerts: add `ServiceDown` alert to detect "dead" services
2021-04-08 18:24:25 +03:00
Roman Khavronenko
4ed8de62ac vmalert: document template functions and mention them in README (#1197) 2021-04-08 18:19:08 +03:00
Aliaksandr Valialkin
06e962f141 docs/Cluster-VictoriaMetrics.md: recommend setting up official alerts for VictoriaMetrics components 2021-04-08 12:15:24 +03:00
Aliaksandr Valialkin
fac1e3810a docs/Single-server-VictoriaMetrics.md: recommend using the official alerts for VictoriaMetrics in Monitoring section 2021-04-08 12:12:45 +03:00
Aliaksandr Valialkin
edd1590ac7 dashboards/victoriametrics.json: typo fix: chur rate -> churn rate 2021-04-08 09:35:50 +03:00
Aliaksandr Valialkin
3f0bcbe067 lib/promscrape: create a single swosFunc per scrape_config 2021-04-08 09:31:48 +03:00
Aliaksandr Valialkin
3eadee6cb7 vendor: make vendor-update 2021-04-08 00:58:22 +03:00
Aliaksandr Valialkin
f1a22b097a docs/CHANGELOG.md: cut v1.58.0 2021-04-08 00:48:16 +03:00
Aliaksandr Valialkin
544821b719 app/vmselect/promql: fix tests after d3fa0ccabd 2021-04-08 00:18:01 +03:00
Aliaksandr Valialkin
d3fa0ccabd app/vmselect/promql: properly detect aggregate topk* and bottomk* aggregate functions in order to disable duplicate sorting
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189
2021-04-08 00:09:40 +03:00
Aliaksandr Valialkin
cb12a8f0a8 app/vmselect: return data:null instead of data:[] from /api/v1/query_exemplars, since Grafana throws an error otherwise
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1186
2021-04-07 23:34:06 +03:00
Aliaksandr Valialkin
5a0938d807 lib/promscrape: do not spend CPU time on constructing scrapeWork key if clustering is disabled 2021-04-07 21:54:22 +03:00
Aliaksandr Valialkin
1177dca3da app/vmselect: do not sort series returned from topk* and bottomk* functions, since these series are already sorted in user-expected order
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189
2021-04-07 14:16:08 +03:00
Aliaksandr Valialkin
75f309fbbf docs/Cluster-VictoriaMetrics.md: follow-up after c6a8ebb11f 2021-04-07 13:47:46 +03:00
Roman Khavronenko
c6a8ebb11f docs: update docs ordering and formatting (#1192)
The major change is adding `sort` directive to docs. For those docs which are copied
from internal packages `sort` is added via makefile command. For the rest it is added
manually since they're updated manually as well.

The rest of changes is connected with markdown formatting. For example, changing headers
in some files (`##` => `#`) makes navigation on .github.io to look better. This especially
useful for `changelog` docs.

Table of contents for `vmctl` is dropped, since we already have it autogenerated on .github.io.

No link changes expected. The corresponding PR to `cluster` branch will be made in follow-up PR.
2021-04-07 13:39:16 +03:00
Aliaksandr Valialkin
df32d2836c lib/storage: properly handle big time ranges passed to /api/v1/labels and /api/v1/label/<labelName>/values
It should be faster querying all the labels and/or all the values instead of querying per-day labels/values on time ranges exceeding maxDaysForPerDaySearch
2021-04-07 13:33:46 +03:00
Aliaksandr Valialkin
59f9960992 lib/promscrape/discovery: remove superflouos check in registerPendingAPIWatchers
The check `_, ok := uw.aws[aw]; !ok` isn't needed, since aw cannot exist in uw.aws
because of the check inside subscribeAPIWatcher
2021-04-07 13:07:39 +03:00
Aliaksandr Valialkin
3ec6639bbb lib/promscrape/discovery/kubernetes: register pending apiWatchers in uw.aws 2021-04-06 11:12:13 +03:00
Aliaksandr Valialkin
7d23598b33 app/vmselect: return dumb response on /api/v1/query_exemplars request
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1186
2021-04-05 23:25:08 +03:00
Aliaksandr Valialkin
78d35d4f46 Makefile: prepare arm64 and amd64 release archives for cluster version on make release command 2021-04-05 23:03:19 +03:00
Aliaksandr Valialkin
5f593b0ed3 lib/promscrape/discovery/kubernetes: remove superflouos mustStart and mustStop functions 2021-04-05 22:44:12 +03:00
Aliaksandr Valialkin
4a0d06d1db deployment/docker/docker-compose.yml: update Grafana from v7.5.1 to v7.5.2 2021-04-05 22:30:58 +03:00
Lu Jiajing
b59164cf33 fix access to nil *url.URL (#1180)
* fix access to nil *url.URL

Signed-off-by: Megrez Lu <lujiajing1126@gmail.com>

* Update lib/promscrape/discovery/kubernetes/api_watcher.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-05 22:25:31 +03:00
Aliaksandr Valialkin
b46194472f lib/promscrape/discovery/kubernetes: reduce CPU time spent on registering big number of Kubernetes objects shared among big number of scrape jobs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 22:04:30 +03:00
Aliaksandr Valialkin
a51d0ec6ec lib/promscrape/discovery/kubernetes: load objects missing in local cache from api seriver in getObjectByRole()
This should fix possible race for `role: endpoints` and `role: endpointslices` service discovery,
when the referred `pod` and `service` objects aren't propagated to urlWatcher cache yet.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182#issuecomment-813353359 for details.
2021-04-05 20:31:17 +03:00
Aliaksandr Valialkin
95dbebf512 lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030
2021-04-05 19:26:11 +03:00
Aliaksandr Valialkin
f010d773d6 lib/promscrape/discovery/kubernetes: synchronously load Kubernetes objects on first access
Remove async registration of apiWatchers, since it breaks discovering `role: endpoints` and `role: endpointslices` targets,
which depend on pod and service objects.

There is no need in reloading `endpoints` and `endpointslices` targets if the referenced `pod` or `service` objects change,
since in this case the corresponding `endpoints` and `endpointslices` objects should also change because they contain
ResourceVersion of the referenced `pod` or `service` objects, which is modified on object update.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 14:20:12 +03:00
Aliaksandr Valialkin
2c25b0322b lib/proxy: typo fix after a5c5b54c22 2021-04-05 13:45:24 +03:00
Aliaksandr Valialkin
a5c5b54c22 lib/proxy: add support for socks5 over tls proxy 2021-04-05 13:00:05 +03:00
Aliaksandr Valialkin
6742839fd6 lib/promscrape: pass X-Prometheus-Scrape-Timeout-Seconds header to scrape targets as Prometheus does 2021-04-05 12:15:24 +03:00
Aliaksandr Valialkin
9ff3ecb991 docs/CHANGELOG.md: explain why -sortLabels is set to false by default 2021-04-04 01:59:25 +03:00
Aliaksandr Valialkin
9a97941e2a docs/vmagent.md: mention that vmagent supports scraping via socks5 proxy 2021-04-04 01:45:52 +03:00
Aliaksandr Valialkin
fc2240fb22 docs/CHANGELOG.md: document the ability to use socks5 proxy
Follow-up for a4c6a3b3e1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177
2021-04-04 01:42:00 +03:00
Nikolay
a4c6a3b3e1 adds socks5 support for fasthttp client (#1178)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-04 01:29:54 +03:00
Aliaksandr Valialkin
500e625e8c lib/promscrape: properly send full url in GET request via simple HTTP proxy
This is a follow-up for a0ae0f86666a75ec57b45eab2429da7ab4a7b250

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 01:20:06 +03:00
Aliaksandr Valialkin
5153410ced lib/promscrape: support for simple HTTP proxies without CONNECT method support such as https://github.com/prometheus-community/PushProx
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 00:40:40 +03:00
Aliaksandr Valialkin
4c56b1a6dd lib/promscrape: add tests for authorization config, which has been added in df148f48b7 2021-04-03 22:13:22 +03:00
Aliaksandr Valialkin
7a0b964e8d app/vmselect/promql: do not delete dst_label if src_label is empty in label_copy(q, src_label, dst_label) and label_move(q, src_label, dst_label) 2021-04-03 22:05:06 +03:00
Aliaksandr Valialkin
6f3080f9fb docs/CHANGELOG.md: document the change from 3055ab0115 (add ability to pass "label=value" to the third argument to topk_* and bottomk_* functions 2021-04-03 21:42:08 +03:00
Aliaksandr Valialkin
e1d1708fa2 docs/Single-server-VictoriaMetrics.md: add missing link in heading to Graphite Render API usage chapter 2021-04-03 21:34:13 +03:00
Aliaksandr Valialkin
43f9842b6f lib/proxy: log response body on non-200 response code
This should improve debuggability for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-03 03:03:55 +03:00
Aliaksandr Valialkin
2256b79a89 docs/vmgateway.md: update docs 2021-04-03 00:29:19 +03:00
Aliaksandr Valialkin
b8d9d6c326 docs/CHANGELOG.md: yet another typo fix 2021-04-03 00:25:39 +03:00
Aliaksandr Valialkin
22949911e9 docs/CHANGELOG.md: typo fix 2021-04-03 00:23:50 +03:00
Aliaksandr Valialkin
3055ab0115 app/vmselect/promql: add ability to set label value additionally to label name for the remaining sum of time series returned from topk_* and bottomk_* functions in the form: topk_min(N, m, "label=value") 2021-04-02 23:55:54 +03:00
Aliaksandr Valialkin
25e19c75c7 docs/{vmauth,vmgateway}.md: small fixes 2021-04-02 23:15:00 +03:00
Aliaksandr Valialkin
3c1b39c978 app/vmgateway: publish docs 2021-04-02 23:08:41 +03:00
Aliaksandr Valialkin
0db901617d app: do not process non-GET requests on at / handler 2021-04-02 22:54:06 +03:00
Aliaksandr Valialkin
569e58dcdf vendor: make vendor-update 2021-04-02 22:20:17 +03:00
Aliaksandr Valialkin
b1d0028e79 app/vmauth: add support for authorization via Authorization: Bearer <token> 2021-04-02 22:14:53 +03:00
Aliaksandr Valialkin
b88feb631e docs/vmagent.md: mention about proxy_authorization section 2021-04-02 21:24:11 +03:00
Aliaksandr Valialkin
df148f48b7 lib/promscrape: add support for authorization config in -promscrape.config as Prometheus 2.26 does
See https://github.com/prometheus/prometheus/pull/8512
2021-04-02 21:17:45 +03:00
Aliaksandr Valialkin
7f9c68cdcb lib/promscrape: add follow_redirect option to scrape_configs section like Prometheus does
See https://github.com/prometheus/prometheus/pull/8546
2021-04-02 19:56:40 +03:00
Aliaksandr Valialkin
d1dcbfd0f9 deployment/docker: upgrade Go builder from v1.16.2 to v1.16.3
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.3+label%3ACherryPickApproved
2021-04-02 19:22:32 +03:00
Aliaksandr Valialkin
c79e4a2f90 app/vmselect/promql: remove the limit on the number of time series that can be sorted, since it may confuse users
Always sort time series returned from `/api/v1/query` and `/api/v1/query_range` unless `sort_*` function is used at top level of the query.
2021-04-02 15:02:08 +03:00
Aliaksandr Valialkin
5b08e6fb16 lib/promscrape/discovery/kubernetes: properly track objects with the same names in multiple namespaces
This is a follow-up for 12e4785fe8

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:45:32 +03:00
Aliaksandr Valialkin
12e4785fe8 lib/promscrape/discovery/kubernetes: properly discover targets in multiple namespaces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:28:30 +03:00
Aliaksandr Valialkin
3967bd705a docs/CHANGELOG.md: mention about AWS IAM roles for tasks support for EC2 service discovery
Follow-up for fdb8995642
2021-04-02 13:10:03 +03:00
Nikolay
fdb8995642 Adds aws ECS credentials support (#1175) 2021-04-02 11:56:40 +03:00
Aliaksandr Valialkin
9d237408c6 Makefile: add missing -prod suffix to binary names in *_checksums.txt file
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1171
2021-04-02 11:55:31 +03:00
Aliaksandr Valialkin
759c938870 Makefile: properly generate checksums for *.tar.gz files
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1171
2021-04-02 11:50:04 +03:00
Aliaksandr Valialkin
dc9eafcd02 app/{vminsert,vmagent}: add -sortLabels command-line option for sorting time series labels before ingesting them in the storage
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:58 +03:00
Aliaksandr Valialkin
e1f699bb6c lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels 2021-03-31 21:24:46 +03:00
Lapo Luchini
db963205cc Add vmutils-pure target (#1163) 2021-03-31 17:42:08 +03:00
Aliaksandr Valialkin
48275d8c12 app/vmagent/remotewrite: reduce memory usage when -remoteWrite.queues is set to a big value
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1167
2021-03-31 16:16:33 +03:00
Aliaksandr Valialkin
f888d194da docs/Articles.md: add a link to https://blog.kintone.io/entry/2021/03/31/175256 2021-03-31 15:14:35 +03:00
Aliaksandr Valialkin
33622c409c app/vmagent/remotewrite: reduce memory usage when samples with big number of labels are sent to remote storage 2021-03-31 00:44:54 +03:00
Aliaksandr Valialkin
3be0e6b087 docs/Single-server-VictoriaMetrics.md: update victoria-metrics -help output after e7fdea5953 2021-03-30 21:44:54 +03:00
Aliaksandr Valialkin
e7fdea5953 app/vmselect: add -search.maxStatusRequestDuration command-line flag for limiting the duration of requests to /api/v1/status/* and /api/v1/series/count 2021-03-30 21:41:35 +03:00
Denys Holius
2af39a96c5 deployment: Grafana version updated to 7.5.1 (#1161) 2021-03-30 20:43:54 +03:00
Aliaksandr Valialkin
2d3082bb55 docs/CHANGELOG.md: cut v1.57.1 2021-03-30 15:40:06 +03:00
Aliaksandr Valialkin
0fe8f11090 docs/CHANGELOG.md: mention about returned back type label for vm_tenant_inserted_rows_total metric
See 9b4e608199
2021-03-30 15:16:57 +03:00
Aliaksandr Valialkin
d58d5562f1 app/vmselect: remove -search.storageTimeout command-line flag, since it has the same meaning as -search.maxQueryDuration
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2021-03-30 15:04:54 +03:00
Aliaksandr Valialkin
7962cf1af8 app/vmselect: prevent from possible incomplete query results after timed out query
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2021-03-30 13:35:45 +03:00
Aliaksandr Valialkin
65c60cf413 Makefile: build arm binaries on make release-victoria-metrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:44:39 +03:00
Aliaksandr Valialkin
75991277fa Makefile: build vmutils for arm on make release-vmutils
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:15:29 +03:00
Aliaksandr Valialkin
1db1a29ffa all: increase minimum supported Go version for building VictoriaMetrics components from v1.14 to v1.15
This is needed after the commit c0ac740f93, which uses URL.Redacted() method,
which has been added in v1.15.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:04:53 +03:00
Aliaksandr Valialkin
947b37ba8e docs/CHANGELOG.md: cut v1.57.0 2021-03-29 19:14:49 +03:00
Aliaksandr Valialkin
e6fd1f7875 docs/CHANGELOG.md: typo fixes 2021-03-29 15:46:35 +03:00
Aliaksandr Valialkin
5c7c0b2bda docs/CHANGELOG.md: mention about logging of metrics with too old timestamps in a single-node VictoriaMetrics
This is a follow up for aa81039b42
2021-03-29 15:42:46 +03:00
Aliaksandr Valialkin
5f4a9f782f docs/CHANGELOG.md: mention Graphite Render API fixes 2021-03-29 14:27:42 +03:00
Aliaksandr Valialkin
602a3d99ae docs/CHANGELOG.md: mention optimized query performance on systems with many CPU cores 2021-03-29 13:55:10 +03:00
Roman Khavronenko
cfdb6762e6 deployment: add new alert TooHighChurnRate24h (#1154)
Alert `TooHighChurnRate24h` suppose to cover cases when churn rate
is low but results in multiple times higher number than total
number of active series.
2021-03-29 12:38:03 +03:00
Roman Khavronenko
b1e49bab52 Dashboards update (#1153)
* dashboard: update single node dashboard

* add number of new series created over last 24h;
* bump version requirements.

* dashboard: update vmagent dashboard

* add panel for open file descriptors;
* add panel for disk I/O;
* add panel for `vmagent_remotewrite_packets_dropped_total` metric;
* bump version requirements.
2021-03-29 12:37:17 +03:00
Aliaksandr Valialkin
b75c2ce659 lib/uint64set: improve Set.Has() performance scalability on multi-CPU system
Do not update bucket32.hint on Set.Has() call, since it leads to memory ping-pong between CPU cores multi-CPU system
2021-03-29 12:33:47 +03:00
Aliaksandr Valialkin
2601cc0fb0 lib/storage: do not update b.nextIdx if no samples are removed because of retention 2021-03-29 12:00:21 +03:00
hagen1778
0c403bfd29 deployment: fix typo in vmalert docker-compose definition 2021-03-29 11:06:32 +03:00
Aliaksandr Valialkin
78188decf9 docs: document that vmagent drops data blocks when remote storage replies with 400 and 409 http status codes
This is a follow up for 1b7dc1e5a5.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
2021-03-26 14:44:06 +02:00
Aliaksandr Valialkin
c54cb3e63c app/vmagent/remotewrite: remove superflouos code after 1b7dc1e5a5 2021-03-26 13:59:46 +02:00
Aliaksandr Valialkin
8fc8ef1aba vendor: update github.com/klauspost/compress from v1.11.12 to v1.11.13 2021-03-26 13:57:01 +02:00
Nikolay
1b7dc1e5a5 Adds blocks drop (#1151)
* adds blocks drop at 400 BadRequest status code
recieved from remote storage,
not expected that remote storage will be able to handle it on retry

* removes error logging for dropped blocks,
its expected error
2021-03-26 14:17:59 +03:00
Aliaksandr Valialkin
f39c84b21f lib/promscrape/discovery/kubernetes: typo fix in error message 2021-03-26 12:46:14 +02:00
Aliaksandr Valialkin
9761ffd161 lib/promscrape/discovery/kubernetes: properly handle too old resource version error message from Kubernetes watch API 2021-03-26 12:28:10 +02:00
Aliaksandr Valialkin
aa81039b42 app/vmselect: log the metric which trigger rollup result cache reset
This should help finding the source of stale metrics
2021-03-25 21:31:39 +02:00
Aliaksandr Valialkin
50f790b5d7 docs/Cluster-VictoriaMetrics.md: mention that vmselect doesnt serve partial responses from export API
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1148
2021-03-25 21:04:13 +02:00
Aliaksandr Valialkin
136fc6217c vendor: make vendor-update 2021-03-25 17:56:10 +02:00
Aliaksandr Valialkin
5ec9e49103 docs/vmagent.md: add an example for -remoteWrite.label 2021-03-25 17:54:55 +02:00
Aliaksandr Valialkin
88f6286df7 docs/Cluster-VictoriaMetrics.md: sync with upstream 2021-03-25 17:18:18 +02:00
Aliaksandr Valialkin
f6529f932a docs: add a link to the repository from build instruction for all the VictoriaMetrics components 2021-03-25 17:14:42 +02:00
Aliaksandr Valialkin
2065d11300 docs/vmagent.md: cosmetic fixes 2021-03-25 17:11:19 +02:00
Aliaksandr Valialkin
35fb9bdee1 docs/vmagent.md: cosmetic fixes 2021-03-25 16:54:03 +02:00
Aliaksandr Valialkin
a647144616 docs/vmagent.md: typo fix: tupically -> typically 2021-03-25 16:48:45 +02:00
Aliaksandr Valialkin
c3c3e51f17 docs/vmalert.md: remove misleading -evaluationInterval=3s from example config args
3s evaluation interval is too small for practical setups. It can result in increased load on datasource.
So it is better to remove it from example config args, which are usually copy-pasted by novice users.
2021-03-25 15:29:06 +02:00
Aliaksandr Valialkin
0b2a66db30 app/vmselect/promql: do not merge time series during requests to /api/v1/query
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1141
2021-03-25 13:56:07 +02:00
Aliaksandr Valialkin
6e855d4b82 lib/storage: tune loopsCountPerMetricNameMatch according to production workload 2021-03-25 13:27:47 +02:00
Aliaksandr Valialkin
d4aadba9fa app/vmagent: add -promscrape.consul.waitTime command-line flag for configuring Consul service discovery wait time
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1144
2021-03-23 19:33:25 +02:00
Aliaksandr Valialkin
edc0a94a3c docs/CHANGELOG.md: mention the feature from 44a6cc5eca 2021-03-23 19:00:18 +02:00
Aliaksandr Valialkin
3a3d2165f9 lib/storage: do not reload metricName for the same metricID in Search.NextMetricBlock
This should speed up Search.NextMetricBlock a bit
2021-03-23 17:56:49 +02:00
Aliaksandr Valialkin
9c566f7db9 app/vmagent: mention -remoteWrite.maxDiskUsagePerURL in the descriptio of -remoteWrite.tmpDataPath flag 2021-03-23 16:38:48 +02:00
Nikolay
29f9ef9b7f changes consul_service label value (#1143)
according to prometheus discovery.
 It should mitigate issue with case sensetive services
https://github.com/hashicorp/consul/issues/5707
2021-03-23 15:35:01 +02:00
Aliaksandr Valialkin
331a6a2015 app/vmselect/graphite: accept and enforce extra_label in all the Graphite APIs 2021-03-23 15:29:16 +02:00
Aliaksandr Valialkin
b521d1d4f2 app/vmselect: move getEnforcedTagFiltersFromRequest to searchtuils, since it will be used in Graphite functions soon 2021-03-23 14:16:29 +02:00
Aliaksandr Valialkin
3cfb3a3683 lib/storage: respect the deadline passed to Storage.SearchMetricNames 2021-03-22 23:03:17 +02:00
Aliaksandr Valialkin
8e2afdf568 lib/storage: improve Search.NextMetricBlock performance by using MetricID->MetricName cache 2021-03-22 22:49:18 +02:00
Aliaksandr Valialkin
e17eb35147 docs/Single-server-VictoriaMetrics.md: sync with README.md 2021-03-22 17:51:43 +02:00
Aliaksandr Valialkin
65a61ff118 docs/Articles.md: add https://blog.cybozu.io/entry/2021/03/18/115743 2021-03-22 17:50:52 +02:00
Aliaksandr Valialkin
71b72304ae app/vmselect: improve description for -search.maxPointsPerTimeseries command-line flag 2021-03-22 16:45:34 +02:00
Aliaksandr Valialkin
44a6cc5eca app/{vminsert,vmagent}: use Influx field as metric name if measurement is empty and -influxSkipSingleField command-line is set
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1139
2021-03-22 13:53:53 +02:00
Aliaksandr Valialkin
910092ca4d lib/storage: tune loopsCountPerMetricNameMatch 2021-03-22 12:53:17 +02:00
Aliaksandr Valialkin
cef010d5f7 app/vmselect/promql: increment key prefix for faster reset for rollup result cache 2021-03-22 11:59:07 +02:00
Aliaksandr Valialkin
648d11b8e0 vendor: update github.com/VictoriaMetrics/metrics from v1.17.0 to v1.17.1 2021-03-18 18:53:07 +02:00
Aliaksandr Valialkin
fb83e97170 app/victoria-metrics: use flag.Parse instead of envflag.Parse for avoiding possible side effects of envflag 2021-03-18 18:20:22 +02:00
Aliaksandr Valialkin
b0c956a178 app/vmselect/graphite: follow-up after 529d7be26b 2021-03-18 16:30:20 +02:00
Nikolay
529d7be26b changes metricsFind api (#1137)
it should be able mitigate crash if label value contains *,[ or { symbols
2021-03-18 16:12:02 +02:00
Aliaksandr Valialkin
726f6ad804 lib/storage: small code simplification after 6cee5338b2 2021-03-18 15:21:13 +02:00
Aliaksandr Valialkin
6cee5338b2 lib/storage: prevent from infinite loop if {__graphite__="..."} filter matches a metric name with *, [ or { chars
The idea has been borrowed from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1137
2021-03-18 14:53:47 +02:00
Aliaksandr Valialkin
e061a4fa19 docs/Single-server-VictoriaMetrics.md: remove outdated message about experimental mode for -pure builds 2021-03-18 13:49:17 +02:00
Aliaksandr Valialkin
4fb049bcba lib/fs: reduce the frequency of failed to remove directory ... due to NFS lock log warnings
Log `failed to remove directory ... due to NFS lock` warning only if the directory cannot be removed in one second.
2021-03-18 13:24:46 +02:00
Aliaksandr Valialkin
17d4a6e900 vendor: update github.com/VictoriaMetrics/metrics from v1.16.0 to v1.17.0 2021-03-17 23:23:04 +02:00
Aliaksandr Valialkin
904dababcc vendor: update github.com/VictoriaMetrics/metrics from v1.15.3 to v1.16.0
This adds the following new metrics for each VictoriaMetrics app:

* process_resident_memory_anonymous_bytes - the RSS share for memory allocated by the process itself.
  This share cannot be freed by the OS, so it must be taken into account by OOM killer.

* process_resident_memory_pagecache_bytes - the RSS share for page cache memory (aka memory-mapped files).
  This share can be freed by the OS at any time, so it must be ignored by OOM killer.
2021-03-17 17:59:40 +02:00
Aliaksandr Valialkin
45dabfac1b lib/storage: faster move heavy filters to the end of list 2021-03-17 15:12:13 +02:00
Aliaksandr Valialkin
b1713e3fcd app/vmselect/promql: typo fix after 9666834045 2021-03-17 15:12:11 +02:00
Aliaksandr Valialkin
9666834045 app/vmselect/promql: merge adjancent buckets with the smallest summary number of hits in buckets_limit() function
This should improve accuracy for the returned buckets
2021-03-17 14:31:41 +02:00
Aliaksandr Valialkin
20ac89c4e0 docs/CHANGELOG.md: cut v1.56.0 2021-03-17 02:04:55 +02:00
Aliaksandr Valialkin
bd0c6a095e docs/CHANGELOG.md: do not mention reduction in query duration
There was a signficant refactoring in the code responsible for time series search,
so it can result in both speed ups and slow downs depending on used queries.
2021-03-17 01:56:58 +02:00
Aliaksandr Valialkin
7bc728bf53 app/vmselect: add vm_index_search_duration_seconds histogram for monitoring the performance of index search 2021-03-17 01:17:41 +02:00
Aliaksandr Valialkin
828669e4e1 all: make golint happy 2021-03-17 00:49:28 +02:00
Aliaksandr Valialkin
ccfb0ae2d3 lib/storage: limit loops count in order to reduce max CPU usage during filter search 2021-03-17 00:49:26 +02:00
Aliaksandr Valialkin
576a80b3d9 lib/storage: do not modify filterLoopsCount stats with loopsCount stats
Such a modification can result in incorrect filter sorting later
2021-03-17 00:49:26 +02:00
Aliaksandr Valialkin
f104f3eb2a all: make golangci-lint happy after the commit 6378205415 2021-03-17 00:24:40 +02:00
Aliaksandr Valialkin
6378205415 lib/netutil: enable IPv6 UDP listening if -enableTCP6 command-line flag is passed to VictoriaMetrics
This is a follow-up for 18cfc4be7b

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1131
2021-03-17 00:16:17 +02:00
Nikolay
18cfc4be7b Adds udp6 support for ingest servers (#1134)
with flag -enableUDP6  https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1131
2021-03-17 00:03:06 +02:00
Aliaksandr Valialkin
0ce557951f app/vmselect/netstorage: reduce mutex contention when unpacking data on a system with high number of CPU cores 2021-03-16 21:51:31 +02:00
Aliaksandr Valialkin
35bb44d317 lib/uint64set: optimize bucket16.addMulti a bit 2021-03-16 21:09:07 +02:00
Aliaksandr Valialkin
aba955fa16 Makefile: prepare vmutils-windows-*.zip archive on make release-vmutils command
The archive contains the following executables for Windows:

* vmagent
* vmalert
* vmauth
* vmctl

Other components - vmbackup, vmrestore, victoria-metrics - aren't supported for Windows yet
2021-03-16 20:52:41 +02:00
Aliaksandr Valialkin
fd86a7dc1d lib/storage: time series search optimization according to production workload profiling
Do not pass filter metric ids to getMetricIDsForTagFilter, since it has been appeared that this slows down
the function by multiple times when it finds big number of metricIDs (tens of millions).
2021-03-16 20:01:43 +02:00
Aliaksandr Valialkin
264d3432ac vendor: make vendor-update 2021-03-16 19:08:36 +02:00
Aliaksandr Valialkin
e36fbfae5b lib/storage: further tuning for time series search 2021-03-16 18:46:22 +02:00
Aliaksandr Valialkin
f0a4157f89 app/vmselect/promql: do not crash if histogram_over_time() function name contains uppercase letters such as Histogram_over_time() 2021-03-16 12:24:21 +02:00
Aliaksandr Valialkin
645cf6746c vendor: update github.com/VictoriaMetrics/metrics from v1.15.2 to v1.15.3 2021-03-16 12:15:50 +02:00
Aliaksandr Valialkin
031d256810 docs/Articles.md: added a link to https://www.youtube.com/watch?v=QgLMztnj7-8 2021-03-15 23:01:49 +02:00
Aliaksandr Valialkin
dd7e82c34f app/vmstorage: add -logNewSeries command-line flag for determining the source of series churn rate 2021-03-15 22:38:50 +02:00
Aliaksandr Valialkin
37323c57c9 lib/influxutils: return response compatible with InfluxDB 1.8.4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
2021-03-15 22:19:59 +02:00
f41gh7
acf2a82d3c typo fix 2021-03-15 23:03:52 +03:00
Aliaksandr Valialkin
85a95bf60c all: various fixes in command-line flag descriptions 2021-03-15 21:59:25 +02:00
Aliaksandr Valialkin
7c37e9aea9 app/{vminsert,vmagent}: a follow-up for b1aa8c3d8f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
2021-03-15 21:37:55 +02:00
Nikolay
b1aa8c3d8f adds fake response for telegraph queries (#1130)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124
2021-03-15 21:10:47 +02:00
Aliaksandr Valialkin
9c77e34ef9 lib/storage: further tuning for time series selector code 2021-03-15 20:31:34 +02:00
Aliaksandr Valialkin
923cdb0552 app/vmselect/promql: reduce overhead on scrape interval estimation
It should be enough to use the first 20 datapoints instead of 100 datapoints for scrape interval estimation.
2021-03-15 20:31:33 +02:00
Aliaksandr Valialkin
82aab87446 app/vmselect/promql: fix tests after 2dae0a2c47 2021-03-15 20:18:59 +02:00
Aliaksandr Valialkin
fb82c4b9fa lib/uint64set: reduce the size of bucket16 by storing smallPool by pointer.
This reduces CPU time spent on bucket16 copying inside bucket13.addBucketAtPos.
2021-03-15 17:23:31 +02:00
Aliaksandr Valialkin
eb103e1527 lib/uint64set: optimize Set.AddMulti for large sorted sets 2021-03-15 17:10:40 +02:00
Aliaksandr Valialkin
43504ebd14 lib/uint64set: optimize bucket16.add and bucket16.addMulti a bit 2021-03-15 16:58:12 +02:00
Aliaksandr Valialkin
fb935e6e2c docs/CHANGELOG.md: typo fix: FATURE -> FEATURE 2021-03-15 14:58:19 +02:00
Aliaksandr Valialkin
3ccf7ea20c lib/storage: tune per-day index search 2021-03-15 13:31:55 +02:00
Aliaksandr Valialkin
2dae0a2c47 app/vmselect: add round_digits query arg to /api/v1/query and /api/v1/query_range handlers for limiting the number of decimal digits after the point 2021-03-15 12:36:33 +02:00
Roman Khavronenko
b457739f87 Single dashboard (#1126)
* dashboard: update single node dashboard

* add panel `Open FDs` for file descriptors metrics;
* add panel `Disk writes/reads` to show the real read/write
load on storage layer;
* add `process_resident_memory_bytes` metric to memory usage panel;
* add stats panel to show available CPUs, memory and disk space;
* rm flags panel since it didn't prove its usefulness.

* alerts: add alert for reaching FDs limit
2021-03-15 12:04:24 +02:00
Aliaksandr Valialkin
6d91842c83 docs/Cluster-VictoriaMetrics.md: sync with cluster README.md 2021-03-15 11:51:40 +02:00
Aliaksandr Valialkin
c14dafce43 lib/promscrape: an attempt to reduce memory usage when vmagent scrapes targets with varying number of metrics
Do not cache too big byte buffers and too big writeRequestCtx objects,
since it is cheaper to re-create them instead of wasting RAM for their caching.

This reverts 7f6f350ee1
2021-03-15 11:45:39 +02:00
Aliaksandr Valialkin
7f6f350ee1 lib/promscrape: return back the logic for flushing big buffers to storage from the commit 3fd8653b40
This should reduce memory usage when vmagent scrapes targets with big number of metrics and `-promscrape.streamParse` isn't enabled
2021-03-14 22:26:00 +02:00
Aliaksandr Valialkin
b88806ecbf lib/promscrape/discovery/kubernetes: do not start object watcher until initial objects are loaded 2021-03-14 21:55:00 +02:00
Aliaksandr Valialkin
83edbb7cab lib/promscrape: retry service discovery in a few seconds if it starts returning 0 targets
This should reduce recovery time from temporary issues during service discovery
2021-03-14 21:53:23 +02:00
Aliaksandr Valialkin
bf15d6a6a2 lib/promscrape: remove duplicate target word in error message 2021-03-14 21:52:02 +02:00
Aliaksandr Valialkin
d409898515 lib/promscrape/discovery/kubernetes: further optimize kubernetes service discovery for the case with many scrape jobs
Do not re-calculate labels per each scrape job - reuse them instead for scrape jobs with identical Kubernetes role

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-14 21:14:53 +02:00
Aliaksandr Valialkin
7a16e8e3a2 lib/promscrape/discovery: fixes after 133b288681
- Removed a deadlock in addAPIWatcher
- Do not create unused ScrapeWork objects
- Do not spend CPU resources on creating objectByKey map in addAPIWatcher

This work is based on https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1125
2021-03-13 15:18:51 +02:00
Aliaksandr Valialkin
2096c6e464 app/vmselect/prometheus: typo fix after 7c002023d7 2021-03-12 12:19:36 +02:00
Aliaksandr Valialkin
7c002023d7 app/vmselect/prometheus: do not include datapoints with timestamps matching t-d when returning results from /api/v1/query?query=m[d]&time=t as Prometheus does 2021-03-12 12:16:50 +02:00
Aliaksandr Valialkin
43552fa8d3 docs/CaseStudies.md: fix incorrect number of active time series for Zhihu 2021-03-12 11:45:38 +02:00
Aliaksandr Valialkin
def014eb75 lib/promscrape/discovery/kubernetes: remove debug lines left after the commit 133b288681 2021-03-12 11:22:33 +02:00
Aliaksandr Valialkin
ca4d5ce037 lib/proxy: there is no need in cloning tlsCfg, which has been created two lines above 2021-03-12 10:47:02 +02:00
Aliaksandr Valialkin
895d5d1355 lib/proxy: set proxy address in tls.Config.ServerName instead of the target address
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-12 10:41:33 +02:00
Aliaksandr Valialkin
a6a71ef861 lib/promscrape: add ability to configure proxy options via proxy_tls_config, proxy_basic_auth, proxy_bearer_token and proxy_bearer_token_file
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-12 03:36:19 +02:00
Aliaksandr Valialkin
fa448806a5 deployment/docker: update Go builder from 1.16.1 to 1.16.2
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.2+label%3ACherryPickApproved
2021-03-12 01:53:48 +02:00
Aliaksandr Valialkin
f3d712724c docs/Articles.md: add https://www.sensedia.com/post/monitoring-with-prometheus-alertmanager 2021-03-12 01:22:51 +02:00
Aliaksandr Valialkin
f669531506 lib/storage: further tune filters sorting logic 2021-03-12 00:53:04 +02:00
Aliaksandr Valialkin
133b288681 lib/promscrape/discovery/kubernetes: use a single watcher per apiURL
Previously multiple scrape jobs could create multiple watchers for the same apiURL. Now only a single watcher is used.
This should reduce load on Kubernetes API server when many scrape job configs use Kubernetes service discovery.
2021-03-11 16:43:04 +02:00
Aliaksandr Valialkin
dc0cb54d41 deployment/docker: update Go builder from 1.16.0 to 1.16.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.1+label%3ACherryPickApproved
2021-03-11 16:43:03 +02:00
Aliaksandr Valialkin
c0ac740f93 lib/proxy: do not show inline basic auth passwords when logging errors related to proxy_url 2021-03-11 13:43:36 +02:00
Aliaksandr Valialkin
bebcb8130c lib/promscrape/discovery/kubernetes: localize Bookmark parsing code
This is a follow-up for e772d1c920
2021-03-11 13:08:08 +02:00
Aliaksandr Valialkin
971e3d83f7 docs/ExttendedPromQL.md: remove outdated doc 2021-03-11 12:41:20 +02:00
Brensted
238fe7d4e8 Update BestPractices.md (#1123)
update lists, hyperlinks fixed.
2021-03-11 11:39:14 +02:00
Aliaksandr Valialkin
e772d1c920 lib/promscrape/discovery/kubernetes: reduce load on Kubernetes API server by using watch bookmarks
This allows continuing object watch from the last bookbark instead of reloading all the objects
on watch errors or timeouts.

See https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
2021-03-10 15:06:35 +02:00
Aliaksandr Valialkin
bc5c4add89 lib/httpserver: export vm_available_memory_bytes and vm_available_cpu_cores metrics
These metrics are useful for tracking the available memory and CPU cores for VictoriaMetrics apps.
2021-03-10 12:02:42 +02:00
Brensted
e02265cfa7 Add files via upload 2021-03-10 10:16:59 +02:00
Brensted
314f28fb38 Add files via upload 2021-03-10 10:14:34 +02:00
Brensted
45ede6ba98 Add files via upload 2021-03-10 10:08:38 +02:00
Brensted
09dc49e942 Add files via upload 2021-03-10 10:06:49 +02:00
Brensted
96dbe9bcbd Add files via upload 2021-03-10 10:05:18 +02:00
Ihor Borodin
d212f35ae8 Fixing examples of external.alert.source in documentation (#1120)
* Fixing examples of external.alert.source in documentation
2021-03-09 20:49:50 +00:00
Aliaksandr Valialkin
48fb0d1c4b vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.13 to v1.0.14 2021-03-09 21:34:07 +02:00
Aliaksandr Valialkin
2728b25783 docs/CHANGELOG.md: mention about the bugfix from 787242d7b0 2021-03-09 20:56:06 +02:00
Aliaksandr Valialkin
787242d7b0 lib/proxy: pass proxy hostname in Host header of the CONNECT request
This should resolve the following issue when connecting to tls proxy:

  cannot validate certificate for ... because it doesn't contain any IP SANs
2021-03-09 20:39:40 +02:00
Aliaksandr Valialkin
36fd007247 lib/proxy: set missing ServerName in TLS config for proxy_url.
While at it, allow setting Proxy-Authorization for `proxy_url` via `basic_auth` and `bearer_token` configs.
2021-03-09 18:58:18 +02:00
Nikolay
ad34f42467 Changes tlsConfig init for proxy connections (#1121)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
2021-03-09 18:51:00 +02:00
Aliaksandr Valialkin
3fd8653b40 lib/promscrape: apply sample_limit after metric relabeling is applied as Prometheus does
See the description for `sample_limit` option from Prometheus docs:

Per-scrape limit on number of scraped samples that will be accepted.
If more than this number of samples are present after metric relabeling
the entire scrape will be treated as failed. 0 means no limit.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
2021-03-09 15:47:18 +02:00
Aliaksandr Valialkin
4b18a4f026 lib/promscrape/discovery/kubernetes: remove too verbose logs about starting and stopping the watchers
Log the number of objects loaded per each watch url

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-09 15:05:14 +02:00
Aliaksandr Valialkin
e44532b760 docs/CHANGELOG.md: mention about improved query performance at 18fe0ff14b 2021-03-09 13:12:25 +02:00
Aliaksandr Valialkin
fe8b12fbad app/vmselect/promql: follow up for 433fff0006 2021-03-09 12:48:44 +02:00
Nikolay
433fff0006 duplicate timeseries fix for prometheus_buckets function (#1119)
* try fix for prometheus_buckets

* merge possible end of the bucket collision
2021-03-09 12:26:23 +02:00
Aliaksandr Valialkin
534944b671 vendor: make vendor-update 2021-03-09 12:01:55 +02:00
Aliaksandr Valialkin
f6878eac36 vendor: update github.com/VictoriaMetrics/fasthttp from 1.0.12 to 1.0.13
This should fix a bug in vmagent with high CPU usage during failed scrapes with small `scrape_timeout`.
2021-03-09 11:45:02 +02:00
John Belmonte
364fdf4a56 spelling fix: adjacent (#1115) 2021-03-09 09:18:19 +02:00
Aliaksandr Valialkin
14a399dd06 lib/promscrape: add scrape_offset option to scrape_config
This option can be used for specifying the particular offset per each scrape interval for target scraping
2021-03-08 12:03:33 +02:00
Aliaksandr Valialkin
345980f78f lib/storage: go fmt 2021-03-08 12:03:31 +02:00
Aliaksandr Valialkin
18fe0ff14b lib/storage: tune loopsCount estimations in getMetricIDsForTagFilterSlow
The adjusted estmations give up to 2x lower median response times on 200qps /api/v1/query_range workload
2021-03-07 21:12:35 +02:00
Aliaksandr Valialkin
ab4f090c63 lib/promscrape/discovery/kubernetes: reduce memory usage further when big number of scrape jobs are configured for the same kubernetes_sd_config role
Serialize reloading per-role objects, so they don't occupy too much memory when objects for many scrape jobs are simultaneously refreshed.
Do not reload per-role objects if they were already refreshed by concurrent goroutines. This should reduce load on Kubernetes API server
when big number of scrape jobs are configured for the same Kubernetes role.

This is a follow-up for 17b87725ed

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
2021-03-07 19:51:03 +02:00
Aliaksandr Valialkin
1187ee5e16 lib/decimal: prevent exponent overflow when processing values close to zero
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114
2021-03-05 18:52:47 +02:00
Aliaksandr Valialkin
47ac2051bb app/vmauth: allow using regexps in url_map paths
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1112
2021-03-05 18:21:36 +02:00
Aliaksandr Valialkin
17b87725ed lib/promscrape/discovery/kubernetes: reduce memory usage when Kubernetes service discovery is configured on a big number of scrape jobs
Previously vmagent was creating a separate Kubernetes object cache per each scrape job.
This could result in increased memory usage when monitoring a Kubernetes cluster with big number of objects (pods / nodes / services, etc.)
as seen at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113

Now it uses a shared map of scrape objects across multiple scrape jobs.
2021-03-05 17:29:55 +02:00
Aliaksandr Valialkin
c9a25e931b lib/promscrape/discovery/kubernetes: move apiWatcher code to a separate file 2021-03-05 12:36:05 +02:00
Aliaksandr Valialkin
536d59914a deployment/docker: update base Docker image from alpine:3.13.1 to alpine:3.13.2
See https://www.alpinelinux.org/posts/Alpine-3.13.2-released.html
2021-03-05 10:35:10 +02:00
Aliaksandr Valialkin
68b8c48c86 docs/Articles.md: add https://dalefro.medium.com/prometheus-victoria-metrics-on-aws-ecs-62448e266090 2021-03-05 09:06:44 +02:00
Aliaksandr Valialkin
444fca89f8 lib/promscrape: make cluster membership calculations consistent across 32-bit and 64-bit architectures 2021-03-05 09:06:17 +02:00
Aliaksandr Valialkin
a14053ffa0 app/vmselect/promql: add histogram_avg(), histogram_stddev() and histogram_stdvar() functions to MetricsQL 2021-03-04 14:12:07 +02:00
Aliaksandr Valialkin
423cd981fb lib/promscrape: add -promscrape.cluster.replicationFactor command-line flag for replicating scrape targets among vmagent instances in the cluster 2021-03-04 10:20:15 +02:00
Aliaksandr Valialkin
d962cdbc13 docs/Single-server-VictoriaMetrics.md: mention that VictoriaMetrics needs free resources for handling workload spikes 2021-03-04 09:57:53 +02:00
Aliaksandr Valialkin
9a48c1b53d lib/promscrape/discovery/kubernetes: fix tests after e154f4a644 2021-03-03 22:41:30 +02:00
Aliaksandr Valialkin
201b685b13 all: bump minimum supported Go version from 1.13 to 1.14 2021-03-03 15:57:13 +02:00
Aliaksandr Valialkin
90d6e94e5b vendor: update github.com/VictoriaMetrics/fastcache from v1.5.7 to v1.5.8 2021-03-03 15:52:51 +02:00
Aliaksandr Valialkin
f391e5a3a0 docs/vmagent.md: remove outdated suggestion for determining labels that lead to duplicate targets
The original labels for duplicate targets is already printed in the error message starting from 71ea4935de
2021-03-03 12:26:11 +02:00
Aliaksandr Valialkin
3a68b94487 docs/CHANGELOG.md: cut v1.55.1 release 2021-03-03 11:49:10 +02:00
Aliaksandr Valialkin
467cb9e3d1 docs/vmagent.md: make docs-sync after the commit 621bf03745 2021-03-03 10:52:40 +02:00
Roman Khavronenko
621bf03745 Vmagent docs upd (#1104)
* vmagent: port changes from https://github.com/VictoriaMetrics/VictoriaMetrics.github.io/pull/1

Thanks to @dereksfoster99 for this patch!

* vmagent: reword to make the meaning clear
2021-03-03 10:51:51 +02:00
Aliaksandr Valialkin
4c3ef78c05 docs/CHANGELOG.md: mention recent bugfixes from commits 7906316741 and e154f4a644 2021-03-03 10:50:31 +02:00
Aliaksandr Valialkin
e09a245b2b app/vmalert/README.md: sync with docs/vmalert.md 2021-03-03 10:43:59 +02:00
Nikolay
e154f4a644 Fix ingress discovery api (#1110) 2021-03-03 10:43:39 +02:00
Aliaksandr Valialkin
7906316741 lib/promscrape/discovery/kubernetes: properly check for nil pointer inside interface
See https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1

This fixes a panic when the ScrapeWork is filtered out in swcFunc.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1108
2021-03-03 10:33:17 +02:00
Alexandros Orfanos
f4f8f21875 vmalert: docs update - status endpoint needs group ID, not group name (#1106) 2021-03-03 07:46:10 +00:00
Aliaksandr Valialkin
1252ca44d8 docs/CHANGELOG.md: cut v1.55.0 2021-03-02 21:37:58 +02:00
Aliaksandr Valialkin
d741794fab vendor: make vendor-update 2021-03-02 21:37:57 +02:00
Aliaksandr Valialkin
03dceb700d lib/promscrape: go fmt 2021-03-02 21:20:43 +02:00
Aliaksandr Valialkin
4de4da1e2a lib/storage: typo fix: umarshal -> unmarshal 2021-03-02 20:47:59 +02:00
Aliaksandr Valialkin
062211c61c lib/promscrape: pre-allocate space for a map in mergeLabels
This should reduce the number of memory allocations when discovering big number of targets
2021-03-02 18:41:58 +02:00
Aliaksandr Valialkin
d1d34664b5 lib/promscrape/discovery: properly track vm_promscrape_discovery_kubernetes_objects_removed_total metric 2021-03-02 18:32:54 +02:00
Aliaksandr Valialkin
a939667ce0 lib/promrelabel: remove unneded optimizations for labeldrop and labelkeep actions
These optimizations may slow down code execution by matching the same label against regexp two times instead of a single time
2021-03-02 17:55:43 +02:00
Aliaksandr Valialkin
6a7ef768ff lib/promscrape/discovery/kubernetes: cache ScrapeWork objects as soon as the corresponding k8s objects are changed
This should reduce CPU usage and memory usage when Kubernetes contains tens of thousands of objects
2021-03-02 16:42:55 +02:00
Aliaksandr Valialkin
22b1941cfc lib/promscrape/discovery/ec2: follow-up after f6114345de 2021-03-02 13:46:26 +02:00
Nikolay
f6114345de Adds webIndentity token for aws (#1099)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1080
2021-03-02 13:27:09 +02:00
Aliaksandr Valialkin
937f382938 lib/protoparser/prometheus: properly unescape label values in Prometheus exposition format
Unescape only `\n`, `\"` and `\\` sequences as Prometheus does. Other escape sequences shouldn't be unescaped.
2021-03-02 13:21:43 +02:00
Aliaksandr Valialkin
019d8e88d8 lib/protoparser/graphite: fix parsing of a Graphite line with empty tags such as foo; 1 2
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1100
2021-03-01 17:16:35 +02:00
Aliaksandr Valialkin
baefe5a8ad docs: actualize -help output 2021-03-01 17:01:27 +02:00
Aliaksandr Valialkin
2c43e846a9 docs/CHANGELOG.md: mention the out of range panic bugfix d6a41b6ea2 2021-03-01 16:54:24 +02:00
Aliaksandr Valialkin
d6a41b6ea2 vendor: update github.com/VictoriaMetrics/metrics from v1.15.1 to v1.15.2
This should fix an edge case panic when 1e-9 value is passed to Histogram.Update

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1096
2021-03-01 16:49:57 +02:00
Aliaksandr Valialkin
1a3689af9a lib/promscrape/discovery/kubernetes: deflake tests; a follow-up for 05fb08713c 2021-03-01 14:32:12 +02:00
Aliaksandr Valialkin
62ebf5c88e lib/promscrape: explicitly stop and cleanup service discovery routines when new config is read from -promscrape.config
This should reduce memory usage when `-promscrape.config` file frequently changes
2021-03-01 14:14:00 +02:00
Aliaksandr Valialkin
e32ad9e923 lib/promscrape: use target arg in ScrapeWork cache 2021-03-01 12:29:09 +02:00
Aliaksandr Valialkin
3792ea4065 .github/workflows/main.yml: update Go version from v1.15 to v1.16 2021-03-01 12:14:20 +02:00
Aliaksandr Valialkin
f5d77a7081 lib/promscrape: typo fix, which prevented from caching ScrapeWork entries 2021-03-01 12:12:56 +02:00
Aliaksandr Valialkin
e84153d5ca lib/promscrape: add vm_promscrape_scrapework_cache_* metrics for tracking ScrapeWork cache effectiveness 2021-03-01 12:05:45 +02:00
Aliaksandr Valialkin
4e3cfe8461 app/vmagent/remotewrite: clarify -remoteWrite.flushInterval flag description 2021-03-01 11:50:54 +02:00
Aliaksandr Valialkin
732e729ef9 docs/CHANGELOG.md: mention the issue related to using Kubernetes watch API for service discovery 2021-03-01 01:42:11 +02:00
Aliaksandr Valialkin
369f01c738 app/vmagent/remotewrite: fix rate limiting logic for -remoteWrite.url 2021-03-01 00:58:34 +02:00
Aliaksandr Valialkin
7f15cd7161 lib/httpserver: make make errcheck happy after the commit 9fc7726d84 2021-03-01 00:34:43 +02:00
Aliaksandr Valialkin
cb943f35c7 app/vmagent: remove data race when applying rate limits to -remoteWrite.url with multiple queues 2021-03-01 00:29:07 +02:00
Aliaksandr Valialkin
530e9904af lib/promscrape: reduce CPU usage an memory allocations when constructing scrapeWorkKey 2021-02-28 22:29:58 +02:00
Aliaksandr Valialkin
8d021b73b5 docs/vmbackup.md: clarify docs on vmagent clustering 2021-02-28 22:00:15 +02:00
Aliaksandr Valialkin
2b53add6b2 app/vmselect/querystats: show the number of matching queries in the top by average duration and in the top by summary duration
This should help debugging slow queries.
2021-02-28 19:40:19 +02:00
Aliaksandr Valialkin
1da1d502a8 docs/CHANGELOG.md: mention about https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1074 2021-02-28 19:31:44 +02:00
Aliaksandr Valialkin
9fc7726d84 lib/httpserver: make sure the gzipResponseWriter.Write() is called on Flush() and Close() calls
This should fix the `http: superfluous response.WriteHeader call` issue

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1078
2021-02-28 19:22:50 +02:00
Aliaksandr Valialkin
e5ca8ac0db lib/promscrape: add ability to spread scrape targets among multiple vmagent instances
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1084
2021-02-28 18:41:08 +02:00
Aliaksandr Valialkin
51bf577431 vendor: update github.com/VictoriaMetrics/metrics from v1.15.0 to v1.15.1
This can help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1096
2021-02-28 17:43:32 +02:00
Aliaksandr Valialkin
e02d1ef93c lib/promscrape/discovery/kubernetes: properly account the number of objects when watcher is stopped
A follow-up for b21b110b7a
2021-02-28 17:06:02 +02:00
Aliaksandr Valialkin
b21b110b7a lib/promscrape/discovery/kubernetes: add vm_promscrape_discovery_kubernetes_* metrics for monitoring internal state of k8s service discovery 2021-02-28 16:57:40 +02:00
Aliaksandr Valialkin
c459600346 lib/promscrape/discovery/kubernetes: remove resourceVersionMatch=NotOlderThan query arg when watching for k8s object changes, since it cannot be used when watch=1 query arg is passed 2021-02-28 16:07:14 +02:00
Aliaksandr Valialkin
59a31171e3 lib/promscrape: fix possible deadlock in parallel execution of target relabeling 2021-02-28 16:05:13 +02:00
Aliaksandr Valialkin
68a0f5ce12 lib/promscrape/discovery/kubernetes: fix deadlock in startWatcherForURL
reloadObjects must be called without holding aw.mu lock
2021-02-28 15:26:30 +02:00
Aliaksandr Valialkin
b523e0369c lib/promscrape/discovery/kubernetes: typo fix after 241ffd1f3b 2021-02-28 15:12:17 +02:00
Aliaksandr Valialkin
241ffd1f3b lib/promscrape/discovery/kubernetes: pre-populate labelsByKey in reloadObject() 2021-02-28 15:09:49 +02:00
Aliaksandr Valialkin
4281c5ed14 vendor: make vendor-update 2021-02-28 14:47:50 +02:00
Aliaksandr Valialkin
05fb08713c lib/promscrape/discovery/kubernetes: compare sorted sets of labels in tests
This should deflake tests where the order of labels isn't stable
2021-02-28 14:10:19 +02:00
Aliaksandr Valialkin
03903c1176 docs/CHANGELOG.md: mention 317b0cbed2 2021-02-28 14:02:49 +02:00
Aliaksandr Valialkin
af8b7e8391 lib/promscrape: add missing startWatchersForRole() call at the beginning of apiWatcher.getLabelsForRole 2021-02-28 14:00:17 +02:00
Nikolay
317b0cbed2 adds query params for vmalert (#1094)
remoteWrite.url now accepts query params at provided url
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1087
2021-02-27 10:04:58 +00:00
Aliaksandr Valialkin
4f8a72806a app/vmbackupmanager: add docs; the vmbackupmanager is available as a part of enterprise subscription 2021-02-27 02:17:55 +02:00
Aliaksandr Valialkin
422b31de40 lib/promscrape/discovery/kubernetes: reload k8s resources on every error
This is needed for obtaining fresh resourceVersion
2021-02-27 01:47:27 +02:00
Aliaksandr Valialkin
7cc3d96a41 lib/fs: follow-up after f3a03c4164 2021-02-27 01:01:47 +02:00
Nikolay
f3a03c4164 Adds windows build (#1040)
* fixes windows compilation,
adds signal impl for windows,
adds free space usage for windows,
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1036

NOTE victoria metrics database still CANNOT work under windows system,
only vmagent is supported.
To completly port victoria metrics, you have to fix issues with separators,
parsing and posix file removall

* rollback separator

* Adds windows setInformation api,
it must behave like unix, need to test it.
changes procutil

* check for invlaid param

* Fixes posix delete semantic

* refactored a bit

* fixes openbsd build

* removed windows api call

* Fixes code after windows add

* Update lib/procutil/signal_windows.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-02-27 00:37:07 +02:00
Aliaksandr Valialkin
975dac9086 docs/CHANGELOG.md: mentioned a bugfix with extra_label handling during caching query results
Related to 186c078fac
2021-02-27 00:20:29 +02:00
Nikolay
186c078fac adds enforced tag filters into cache key (#1095) 2021-02-27 00:15:53 +02:00
Aliaksandr Valialkin
a78948ae8b lib/promscrape: yet another typo fix after ed8441ec52 2021-02-26 23:35:47 +02:00
Aliaksandr Valialkin
8683ea85e6 lib/fs: properly handle stale NFS file handle error during file deletion
This error can appear when -storageDataPath points to NFS volume and the given file has been already removed.
2021-02-26 23:25:14 +02:00
Aliaksandr Valialkin
9fa2632ac3 lib/promscrape: typo fix after ed8441ec52 2021-02-26 23:04:05 +02:00
Aliaksandr Valialkin
d86e9b49c4 app/vmselect/promql: increase accuracy for buckets_limit() function for small limits by skipping the first and the last buckets during merge
The first and the last buckets are usually `[0 ... leMin]` and `(leMax ... +Inf)`. If they are merged with adjancent buckets,
then the resulting accuracy can suffer.
2021-02-26 22:56:36 +02:00
Aliaksandr Valialkin
ed8441ec52 lib/promscrape: cache ScrapeWork
This should reduce the time needed for updating big number of scrape targets.
2021-02-26 21:43:22 +02:00
Aliaksandr Valialkin
815666e6a6 lib/promscrape/discovery/kubernetes: cache target labels
This should reduce CPU usage on repeated SDConfig.GetLabels() calls.
2021-02-26 20:23:28 +02:00
Aliaksandr Valialkin
19712fc2bd lib/promscrape/discovery/kubernetes: errcheck fix 2021-02-26 17:00:08 +02:00
Aliaksandr Valialkin
c8f2f9b2e8 lib/promscrape: cleanup after 9b2246c29b
Main points:

* Revert changes outside lib/promscrape/discovery/kuberntes . These changes can be applied later in a separate commit
* Minimize changes in lib/promscrape/discovery/kubernetes compared to a93e644001
* Corner case fixes.
2021-02-26 16:54:05 +02:00
Nikolay
9b2246c29b vmagent kubernetes watch stream discovery. (#1082)
* started work on sd for k8s

* continue work on watch sd

* fixes

* continue work

* continue work on sd k8s

* disable gzip

* fixes typos

* log errror

* minor fix

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-02-26 16:46:13 +02:00
Aliaksandr Valialkin
a93e644001 lib/promscrape: remove duplicate code a bit 2021-02-26 16:39:56 +02:00
Aliaksandr Valialkin
f7b242540b lib/promscrape: reduce processing time for big number of discovered targets by processing them in parallel 2021-02-26 16:39:56 +02:00
dereksfoster99
438428b5b0 Alphabetized names and added "native speaker" spin. (#1093)
Do we have the dates that each of these were written so that can be added next to each brand?
2021-02-26 16:39:15 +02:00
dereksfoster99
d5f21f3f4b Native speaker edits. (#1088)
I made an effort to not change anything substantive.
2021-02-25 15:57:09 +02:00
Aliaksandr Valialkin
f7049e2af7 lib/promrelabel: optimize labeldrop and labelkeep relabeling for prefix.* and prefix.+ regexps 2021-02-24 17:58:28 +02:00
Aliaksandr Valialkin
98854e5f2b app/vmselect: add sign(q) and clamp(q, min, max) functions, which will be added in the upcoming Prometheus release
See https://twitter.com/roidelapluie/status/1363428376162295811

The `last_over_time(m[d])` function already exists in MetricsQL.
2021-02-24 17:24:56 +02:00
Aliaksandr Valialkin
2cd23362f5 README.md: sync with docs/Single-server-Victoria-Metrics.md 2021-02-24 17:24:18 +02:00
Aliaksandr Valialkin
f050e3f492 docs/CHANGELOG.md: mention about a bugfix from 4805b80977 2021-02-24 11:49:10 +02:00
Aliaksandr Valialkin
f4135b0d14 app/vmselect/promql: properly calculate histogram_quantile() over zero buckets and only a single non-zero le="+Inf"` bucket like Prometheus does 2021-02-24 00:42:22 +02:00
Aliaksandr Valialkin
2c44178645 lib/storage: consistency renaming: durationsPerDateTagFilterCache -> loopsPerDateTagFilterCache 2021-02-23 15:47:19 +02:00
faceair
15d61c4879 lib/storage: correct tagfilter match cost (#1079) 2021-02-22 21:46:56 +02:00
Aliaksandr Valialkin
fa03e0d210 app/vmselect/promql: add increase_pure() function to MetricsQL
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/962
2021-02-22 19:14:15 +02:00
Aliaksandr Valialkin
2549469d5d app/vmselect/graphite: support Storage-Step header value and storage_step query arg at /render API 2021-02-22 18:33:26 +02:00
Aliaksandr Valialkin
d136081040 lib/promrelabel: add more optimizations for relabeling for common cases 2021-02-22 16:33:55 +02:00
Aliaksandr Valialkin
dd1e53b119 lib/promrelabel: optimize relabeling performance for common cases 2021-02-22 00:51:13 +02:00
Aliaksandr Valialkin
ff5bbc4b88 lib/promscrape: export vm_promscrape_target_relabel_duration_seconds metric 2021-02-21 23:21:42 +02:00
Aliaksandr Valialkin
901e12024d vendor: update github.com/VictoriaMetrics/metrics from v1.14.0 to v1.15.0
The v1.15.0 exports the following additional metrics:

    process_io_read_bytes_total - the number of bytes read via io syscalls such as read and pread
    process_io_written_bytes_total - the number of bytes written via io syscalls such as write and pwrite
    process_io_read_syscalls_total - the number of read syscalls such as read and pread
    process_io_write_syscalls_total - the number of write syscalls such as write and pwrite
    process_io_storage_read_bytes_total - the number of bytes read from storage layer
    process_io_storage_written_bytes_total - the number of bytes written to storage layer

These metrics can be used for monitoring process io
2021-02-21 22:54:00 +02:00
Aliaksandr Valialkin
636c55b526 lib/mergeset: reduce memory usage for inmemoryBlock by using more compact items representation
This also should reduce CPU time spent by GC, since inmemoryBlock.items don't have pointers now,
so GC doesn't need visiting them.
2021-02-21 22:06:47 +02:00
Aliaksandr Valialkin
388cdb1980 lib/storage: do not re-calculate stats for heavy tag filters
This should reduce the number of slow queries when stats for heavy tag filters was recalculated.
2021-02-21 21:39:01 +02:00
Aliaksandr Valialkin
48656dcc38 lib/{mergeset,storage}: allow merging smaller number of small parts
While this may increase CPU and disk IO usage needed for background merge,
this also recudes CPU usage during queries in production. This is because
such queries tend to read recently added data and it is better to have lower number
of parts for such data in order to reduce CPU usage.

This partially reverts ebf8da3730
2021-02-21 21:28:36 +02:00
Aliaksandr Valialkin
cb311bb156 lib/{mergeset,storage}: do not use pools for indexBlock and inmemoryBlock during their caching, since this results in higher memory usage in production without any performance gains 2021-02-21 21:18:59 +02:00
Aliaksandr Valialkin
2cfb376945 lib/promscrape: typo fix after the commit f26162ec99 2021-02-19 00:33:37 +02:00
Aliaksandr Valialkin
c2678754e4 app/vmagent: properly perform graceful shutdown, which was broken in the commit 1d1ba889fe
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1065
2021-02-19 00:31:34 +02:00
Aliaksandr Valialkin
49e36e8d9d app/vmagent: fix scrape config example for scrape_align_interval option 2021-02-18 23:57:23 +02:00
Aliaksandr Valialkin
f26162ec99 lib/promscrape: add scrape_align_interval config option into scrape config
This option allows aligning scrapes to a particular intervals.
2021-02-18 23:53:44 +02:00
Aliaksandr Valialkin
9c70c1f21f app/vmselect/promql: reduce the probability of duplicate time series errors when querying Kubernetes metrics 2021-02-18 22:07:29 +02:00
Aliaksandr Valialkin
5e341ccb59 docs/CHANGELOG.md: cut v1.54.1 2021-02-18 19:09:59 +02:00
Aliaksandr Valialkin
f9084611bd lib/storage: use composite index for a query with a name filter and negative filters 2021-02-18 18:57:23 +02:00
Aliaksandr Valialkin
a537c4f602 lib/storage: properly handle queries containing a filter on metric name plus any number of negative filters and zero non-negative filters
Example: `node_cpu_seconds_total{mode!="idle"}`
2021-02-18 18:46:36 +02:00
Aliaksandr Valialkin
ae1238fe5c docs/Articles.md: add a link to https://www.percona.com/blog/2021/02/12/tame-kubernetes-costs-with-percona-monitoring-and-management-and-prometheus-operator/ 2021-02-18 16:36:47 +02:00
Aliaksandr Valialkin
03ebc028f7 app/vmalert: add missing multiarch Dockerfile 2021-02-18 15:23:17 +02:00
Aliaksandr Valialkin
a7697cc88b docs/CHANGELOG.md: cut v1.54.0 2021-02-18 14:52:38 +02:00
Aliaksandr Valialkin
e540c02014 lib/storage: prevent from running identical heavy tag filters in concurrent queries when measuring the number of loops for such tag filter.
This should reduce CPU usage spikes when measuring the number of loops needed for heavy tag filters
2021-02-18 13:58:18 +02:00
Aliaksandr Valialkin
711f8a5b8d lib/storage: sort tag filters by the number of loops they need for the execution
This metric should work better than the filter execution duration, since it cannot be distorted
by concurrently running queries.
2021-02-18 12:47:38 +02:00
Aliaksandr Valialkin
f95dd67a22 docs/Articles.md: add a link to https://medium.com/alteos-tech-blog/observability-availability-and-doras-research-program-85deb6680e78 2021-02-18 01:25:57 +02:00
Aliaksandr Valialkin
d7de4807e1 app/victoria-metrics/testdata: add a test for {__graphite__="foo.*.bar"} selector 2021-02-17 21:52:43 +02:00
Aliaksandr Valialkin
edcdc39eb3 app/vmagent/remotewrite: cleanup after 1d1ba889fe
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1065
2021-02-17 21:42:55 +02:00
Nikolay
1d1ba889fe adds pushback for fastqueue, (#1075)
during shutdown currently sending block was lost,
now its pushed back to fast queue and will be flushed on disk,
it may lead to data duplication.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1065
2021-02-17 21:23:38 +02:00
Aliaksandr Valialkin
4b110fa21c docs/CHANGELOG.md: mention that prod binaries are built now with Go1.16
This is a follow-up for 4edfe76bef
2021-02-17 21:06:59 +02:00
Aliaksandr Valialkin
efe1e0cff0 docs/CaseStudies.md: actualize Wix numbers 2021-02-17 21:03:53 +02:00
Aliaksandr Valialkin
ce99b48a9a Revert "lib/mergeset: tune lifetime for entries inside block caches"
This reverts commit 458c89324d.

Production testing revealed zero improvements for memory usage with reduced lifetime for entries in block caches.
2021-02-17 20:42:21 +02:00
Aliaksandr Valialkin
939d5ffc2b lib/storage: move composite filters to the top during sorting 2021-02-17 20:26:51 +02:00
Aliaksandr Valialkin
faad6f84a4 lib/storage: return back filter arg to getMetricIDsForTagFilter function
The filter arg has been removed in the commit c7ee2fabb8
because it was preventing from caching the number of matching time series per each tf.

Now the cache contains duration for tf execution, so the filter shouldn't break such caching.
2021-02-17 19:33:22 +02:00
Aliaksandr Valialkin
d4849561ef app/vmstorage: export vm_composite_filter_success_conversions_total and vm_composite_filter_missing_conversions_total metrics 2021-02-17 19:13:38 +02:00
Aliaksandr Valialkin
33806264ec lib/storage: revert ecf132933e, since negative filters require the same amount of work as non-negative filters 2021-02-17 18:55:04 +02:00
Aliaksandr Valialkin
63fc140624 lib/storage: tag filters sorting... 2021-02-17 17:55:29 +02:00
Aliaksandr Valialkin
74424b55ee lib/storage: further tune tag filters sorting 2021-02-17 17:28:15 +02:00
Aliaksandr Valialkin
4edfe76bef deployment/dm: update Go builder image from v1.15.8 to v1.16.0
See release notes for Go1.16 at https://golang.org/doc/go1.16
2021-02-17 15:19:00 +02:00
Aliaksandr Valialkin
442fcfec5a lib/storage: tune the logic for sorting tag filters according the their exeuction times 2021-02-17 15:00:08 +02:00
Aliaksandr Valialkin
4a07820048 lib/storage: make sure that nobody uses partitions when closing the table 2021-02-17 14:59:04 +02:00
Roman Khavronenko
9ca7d76b25 Add Labels limit exceeded panel to dashboard (#1072)
New panel supposed to display events when VM drops extra label
on exceeding `maxLabelsPerTimeseries` limit.
2021-02-16 23:38:20 +02:00
Aliaksandr Valialkin
bea2f86b7b docs/CHANGELOG.md: document new per-tenant metrics 2021-02-16 23:34:24 +02:00
Aliaksandr Valialkin
1a9a6b560f vendor: make vendor-update 2021-02-16 22:24:52 +02:00
Aliaksandr Valialkin
1256931aee lib/httpserver: make errcheck happy 2021-02-16 22:05:32 +02:00
Aliaksandr Valialkin
7f2a6c7b54 docs: rename vmbackuper to vmbackupmanager 2021-02-16 22:00:39 +02:00
Aliaksandr Valialkin
d61f7b7279 lib/storage: more tuning for tag filters sorting according the time they take 2021-02-16 21:22:23 +02:00
Aliaksandr Valialkin
458c89324d lib/mergeset: tune lifetime for entries inside block caches
This should reduce memory usage in general case without significant CPU usage increase
2021-02-16 18:11:51 +02:00
Aliaksandr Valialkin
2824856691 lib/mergeset: clarify comments in the code a bit 2021-02-16 18:02:57 +02:00
Aliaksandr Valialkin
83a1a889ec deployment/docker: properly publish latest tag during make publish-via-docker
This has been broken in f9902b3372
2021-02-16 17:42:43 +02:00
Aliaksandr Valialkin
c4756f94da app/vmselect/netstorage: reuse timeseriesWork objects in order to reduce memory allocations 2021-02-16 16:08:53 +02:00
Aliaksandr Valialkin
5a401225c7 app/vmselect/netstorage: use unsafe string as a key for a map when the map already contains the given key
This should prevent from a memory allocation and a string copy.
2021-02-16 15:43:10 +02:00
Aliaksandr Valialkin
1bf6cd814d lib/uint64set: remove memory allocation in bucket16.appendTo when sorting smallPool 2021-02-16 15:31:49 +02:00
Aliaksandr Valialkin
8ec45ff335 lib/httpserver: cache /metrics output for a second
This should reduce CPU load when `/metrics` output is scraped with a frequency exceeding a request per second
2021-02-16 14:56:36 +02:00
Aliaksandr Valialkin
b861a64510 lib/protoparser/influx: make sure that escaped whitespace can be put in measurement, tag names and field names 2021-02-16 13:59:18 +02:00
Aliaksandr Valialkin
7faa762021 lib/mergeset: remove unused code after a4140de9e6 2021-02-16 13:40:09 +02:00
Aliaksandr Valialkin
ca191696fe lib/storage: tune sorting for tag filters 2021-02-16 13:04:49 +02:00
Aliaksandr Valialkin
ecf132933e lib/storage: increase match cost for negative tag filters, since they need to scan all the label pairs 2021-02-15 16:34:23 +02:00
Aliaksandr Valialkin
4e39bf148c vendor: update github.com/VictoriaMetrics/metrics from v1.13.1 to v1.14.0
The new version switches from log-linear histograms to log-based histograms,
which provide up to 3.6 times better accuracy.
2021-02-15 15:12:29 +02:00
Aliaksandr Valialkin
9f5ac603a7 lib/storage: reduce the minimum supported retention for inverted index from one month to one day 2021-02-15 15:12:29 +02:00
Aliaksandr Valialkin
2e30202dc7 lib/flagutil: prevent from integer overflow when parsing duration 2021-02-15 15:12:29 +02:00
Aliaksandr Valialkin
38d7e96602 lib/promscrape/discovery/kubernetes: add __meta_kubernetes_endpoints_label_* and __meta_kuberntes_endpoints_annotation_* labels to role: endpoints
This syncs kubernetes SD with Prometheus 2.25
See 617c56f55a
2021-02-15 02:51:16 +02:00
Aliaksandr Valialkin
01d82cbf21 docs/Cluster-VictoriaMetrics.md: clarify replication docs 2021-02-15 01:45:09 +02:00
Aliaksandr Valialkin
74963f71c6 lib/logger: explicitly import "time/tzdata" package for embedding tzdata into the app
The approach with `timetzdata` build tag didn't work for GOARCH=arm and GOARCH=ppc64le
due to the issue https://github.com/golang/go/issues/44073#issuecomment-778854298
2021-02-15 01:00:01 +02:00
Aliaksandr Valialkin
71c417427c lib/storage: sort tag filters by actual execution time instead of by the number of matching time series
This should improve query speed for queries with regexp filters matching small number of time series
on a label with big number of unique values.
2021-02-15 00:18:13 +02:00
Aliaksandr Valialkin
c727d2219b lib/storage: properly hanle regexp tag filters with dots, which can be converted to full string match filters.
For example `{label=~"foo\.bar"}` should be converted to `{label="foo.bar"}`. Previously it has was mistakenly conveted to `{label="foo\.bar"}` .
This could result in missing time series for such tag filters.
2021-02-14 23:38:14 +02:00
Aliaksandr Valialkin
73c95c4e5b docs/CHANGELOG.md: mention about fixed multiarch build for Docker images
Related commit: f9902b3372
2021-02-12 15:23:49 +02:00
Aliaksandr Valialkin
80dc74dbc1 lib/promscrape: remove vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
These metrics may result in big number of time series when vmagent scrapes thousands of targets and these targets constantly changes.

* It is better using `up == 0` query for determining failing targets.
* It is better using the following query for determining targets with exceeded limit on the number of metrics:

  scrape_samples_scraped > 0 if up == 0
2021-02-12 05:26:04 +02:00
Aliaksandr Valialkin
802fabf0d7 deployment/docker: embed tzdata into prod Go app instead of installing it into base docker image
While this increases app size by 700Kb, this allows using -loggerTimezone in a scratch base image
See https://github.com/golang/go/issues/38017
2021-02-12 04:54:27 +02:00
Aliaksandr Valialkin
f9902b3372 deployment/docker: use docker buildx for creating multiarch builds
See https://github.com/docker/buildx/
2021-02-12 04:31:22 +02:00
Aliaksandr Valialkin
baa36354e0 docs/Cluster-VictoriaMetrics.md: mention that /api/v1/import/prometheus supports OpenMetrics data 2021-02-12 00:59:54 +02:00
Aliaksandr Valialkin
5219c0f474 docs/Single-server-VictoriaMetrics.md: mention that VictoriaMetrics support data ingestion in OpenMetrics format 2021-02-12 00:57:14 +02:00
Aliaksandr Valialkin
acdb401585 app/vmselect/prometheus: treat match query arg in the same way as match[] query arg 2021-02-11 15:02:21 +02:00
Aliaksandr Valialkin
1e38ad6d20 app/vmauth: add ability to route requests from a single users to multiple targets depending on the requested path
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1064
2021-02-11 12:41:16 +02:00
Aliaksandr Valialkin
2d33230793 app/vmselect/promql: properly make copies of EvalConfig 2021-02-11 12:41:15 +02:00
Aliaksandr Valialkin
7a3a9421f3 app/vmselect/promql: make a copy of EvalConfig when executing q1 and q2 in parallel for q1 binary_op q2
This should prevent from data races if the underlying functions modify EvalConfig contents.
2021-02-10 23:05:55 +02:00
Aliaksandr Valialkin
04faea8b45 app/vmselect: parallelize q1 <binary_op> q2 queries by running q1 and q2 in parallel
This should reduce query execution times.
2021-02-10 22:59:39 +02:00
Aliaksandr Valialkin
0e26b7168a lib/storage: return back in-order applying of tag filters, since concurrently executing tag filters can result in CPU and RAM waste in common case 2021-02-10 22:41:04 +02:00
Aliaksandr Valialkin
b51c23dc5b lib/storage: parallelize tag filters execution a bit
This should reduce execution time when a query contains multiple tag filters and each such filter matches big number of time series.
2021-02-10 18:12:25 +02:00
Aliaksandr Valialkin
c7ee2fabb8 lib/storage: remove filter arg from getMetricIDsForDateTagFilter function
The `filter` arg breaks the logic for sorting tag filters by the matching metrics,
which may result in non-optimal performance during time series search.
2021-02-10 18:12:20 +02:00
Aliaksandr Valialkin
57cac289e0 lib/storage: fix inconsistencies in error logs 2021-02-10 18:12:16 +02:00
Aliaksandr Valialkin
5d5f0b0627 lib/storage: load metadata before loading indexdb, since indexdb depends on the metadata 2021-02-10 17:55:40 +02:00
Aliaksandr Valialkin
cdecf83ce5 app/vmstorage: export vm_composite_index_min_timestamp metric 2021-02-10 17:14:08 +02:00
Aliaksandr Valialkin
553016ea99 lib/storage: disable composite index usage when querying old data 2021-02-10 14:57:50 +02:00
Aliaksandr Valialkin
fcb7655d1e lib/storage: fix metric name match for composite filter 2021-02-10 01:27:45 +02:00
Aliaksandr Valialkin
c7dccebaef lib/storage: optimize search by label filters matching big number of time series 2021-02-10 00:44:54 +02:00
Aliaksandr Valialkin
6b4e6c229c lib/storage: reduce lock contention in dateMetricIDCache when registering new time series for the current day
This should help systems with multiple CPU cores
2021-02-10 00:01:13 +02:00
Aliaksandr Valialkin
31f6b9c977 lib/fs: remove the code for tracking whether the given memory region is in page cache
This code didn't give performance gains under production workload, so let's remove it in order to simplify the code.
2021-02-09 16:49:03 +02:00
Aliaksandr Valialkin
0a69122d81 lib/mergeset: remove dead code left after a4140de9e6 2021-02-09 16:33:52 +02:00
Aliaksandr Valialkin
d56390b925 optimize Storage.updatePerDateData() 2021-02-09 02:55:36 +02:00
Aliaksandr Valialkin
fda61e8e96 lib/storage: skip deduplication when creating inmemory data blocks
The deduplication will be performed later during merging such blocks.
2021-02-09 02:25:32 +02:00
Aliaksandr Valialkin
3b146a9976 vendor: make vendor-update 2021-02-09 01:08:56 +02:00
Aliaksandr Valialkin
947333bfa2 deployment/dm: update Go builder image from v1.15.7 to v1.15.8
See https://github.com/golang/go/issues?q=milestone%3AGo1.15.8+label%3ACherryPickApproved
2021-02-09 00:58:48 +02:00
Aliaksandr Valialkin
a4140de9e6 lib/mergeset: unconditionally cache indexdb blocks
Production workloads show that indexdb blocks must be cached unconditionally for reducing CPU usage.
This shouldn't increase memory usage too much, since unused blocks are removed from the cache every two minutes.
2021-02-09 00:47:50 +02:00
Aliaksandr Valialkin
cb96a1865b app/vmstorage: export missing vm_cache_size_bytes metrics for indexdb and data caches 2021-02-09 00:47:00 +02:00
Aliaksandr Valialkin
4dca03501b docs/CHANGELOG.md: mention about a bugfix for timezone data from df0cda3ab9 2021-02-08 15:58:25 +02:00
Aliaksandr Valialkin
c5770600a2 lib/cgroup: follow-up after b9bf3cbe3e 2021-02-08 15:54:38 +02:00
Aliaksandr Valialkin
3f689561d5 vendor: update github.com/VictoriaMetrics/metrics from v1.13.0 to v1.13.1 2021-02-08 15:46:52 +02:00
Nikolay
b9bf3cbe3e refactored cgroups limits, (#1061)
adds tests, remove os.Exec
2021-02-08 15:46:22 +02:00
Aliaksandr Valialkin
c2f2e5c0a0 deployment/docker: bump local/base image tag from 1.1.1 to 1.1.2, so it is built with new timezone info after the commit df0cda3ab9 2021-02-08 14:04:44 +02:00
Nikolay
df0cda3ab9 adds zoneinfo to base docker image, (#1062)
NOTE clean up local cache with docker rmi local/base:1.1.1-alpine_3.13.1-alpine_3.13.1
2021-02-08 14:00:57 +02:00
Aliaksandr Valialkin
2242647a04 lib/storage: optimize data ingestion in the beginning of every hour
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1046
2021-02-08 12:01:12 +02:00
Aliaksandr Valialkin
8f28a578d3 lib/logger: exit the app if unsupported timezone value has been passed to -loggerTimezone
While at it, clarify descrption for `-loggerTimezone` command-line flag.
2021-02-07 23:35:37 +02:00
Aliaksandr Valialkin
9b9cb04511 docs/vmctl.md: fix title, so it is properly displayed in the header of https://victoriametrics.github.io/ 2021-02-04 20:20:08 +02:00
Aliaksandr Valialkin
803a00102a app/{vminsert,vmselect}: accept requests to paths with /graphite and /prometheus prefixes
This should improve compatibility with path prefixes from VictoriaMetrics cluster.
See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
2021-02-04 20:01:18 +02:00
Aliaksandr Valialkin
4d43ab0875 app/vmselect: typo fix when stripping url path prefixes 2021-02-04 19:29:25 +02:00
Aliaksandr Valialkin
83d3e582ab lib/storage: check for prevHourMetricIDs cache before falling back to checking for (date, metricID) entries during data ingestion
This should reduce possible CPU usage spikes at the beginning of every hour.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1046
2021-02-04 18:48:13 +02:00
Aliaksandr Valialkin
b44edc7832 docs/CHANGELOG.md: mention recently added changes 2021-02-04 16:41:34 +02:00
Aliaksandr Valialkin
9fb38569eb lib/httpserver: expose process_open_fds and process_max_fds metrics
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/402
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1037
2021-02-04 16:40:50 +02:00
Nikolay
48c8c5093b fixes dockerswarm (#1053)
fixes improper usage of host network services
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1028
2021-02-04 15:56:42 +02:00
Aliaksandr Valialkin
814455138a docs/CHANGELOG.md: cut v1.53.1 2021-02-03 23:45:21 +02:00
Roman Khavronenko
2ff038d841 vmalert: mention -datasource.appendTypePrefix in README (#1052) 2021-02-03 23:44:37 +02:00
Karsonito
8001d4ccc9 change carbonapi link (#1051)
Co-authored-by: Konstantin Lesnichenko <konstantin.lesnichenko@together.com>
2021-02-03 23:44:05 +02:00
Dmitry Shevchuk
2baf98082b Adds ability to query right vmselect endpoint based on the query type (#1050)
* Adds ability to query right vmselect endpoint based on the query type

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2021-02-03 21:26:30 +00:00
Aliaksandr Valialkin
a2344ef4b7 docs/vmalert.md: mention that type option can be set at group level additionally to rule level 2021-02-03 21:13:13 +02:00
Aliaksandr Valialkin
8dc6095749 app/vmagent: add Advanced usage section with the description for all the command-line flags 2021-02-03 21:02:59 +02:00
Aliaksandr Valialkin
ea00dac35f docs/CHANGELOG.md: mention that {__graphite__="foo.*.bar"} syntax deprecates -search.treatDotsAsIsInRegexps command-line flag 2021-02-03 20:41:29 +02:00
Aliaksandr Valialkin
35242831f5 docs/Single-server-VictoriaMetrics.md: remove misleading section about native format in How to export data in JSON line format chapter 2021-02-03 20:41:17 +02:00
Aliaksandr Valialkin
8629fd8a72 app/vmselect: deprecate -search.treatDotsAsIsInRegexps in favor to {__graphite__="foo.*.bar"} syntax 2021-02-03 20:36:01 +02:00
Aliaksandr Valialkin
d16f22f3a1 app/vmselect,lib/storage: properly parse Graphite selectors with inner wildcards
Example: foo{bar{x,yz},a[b-c],*de}
2021-02-03 20:14:22 +02:00
Aliaksandr Valialkin
a5a1b9bd66 lib/storage: fix a bug, which breaks searching by Graphite wildcard filters 2021-02-03 20:14:22 +02:00
Aliaksandr Valialkin
6123aa3e75 sort orSuffixes in tagFilter.InitFromGraphiteQuery for faster seeks 2021-02-03 20:14:22 +02:00
Aliaksandr Valialkin
9d41c06db1 docs/CHANGELOG.md: fix a link to Graphite Render API usage docs 2021-02-03 12:29:37 +02:00
Aliaksandr Valialkin
4abd71402e docs/Cluster-VictoriaMetrics.md: sync with cluster branch 2021-02-03 12:14:58 +02:00
Aliaksandr Valialkin
c7e03f30d8 docs: mention about Graphite render API implementation 2021-02-03 12:12:04 +02:00
Aliaksandr Valialkin
b739ff194b docs/Single-server-VictoriaMetrics.md: mention about {__graphite__="foo.*.bar"} syntax under Querying Graphite data section 2021-02-03 11:49:42 +02:00
Aliaksandr Valialkin
8c568b13b2 docs/CHANGELOG.md: cut v1.53.0 2021-02-03 03:42:31 +02:00
Aliaksandr Valialkin
7388479a07 deployment/docker: update base alpine image from v3.13.0 to v3.13.1
See release notes for v3.13.1 - https://www.alpinelinux.org/posts/Alpine-3.13.1-released.html
2021-02-03 03:40:28 +02:00
Aliaksandr Valialkin
157c02622b app/vmselect: add ability to set Graphite-compatible filter via {__graphite__="foo.*.bar"} syntax 2021-02-03 01:21:54 +02:00
Aliaksandr Valialkin
4068f8d590 lib/promscrape: add vm_promscrape_service_discovery_duration_seconds metric 2021-02-02 16:15:25 +02:00
Aliaksandr Valialkin
bd11fd8f1d lib/promscrape: add vm_promscrape_scrape_retries_total, vm_promscrape_discovery_retries_total and vm_promscrape_discovery_requests_total metrics 2021-02-01 20:06:27 +02:00
Aliaksandr Valialkin
b577cdd855 docs: increase heading sizes in vmagent, vmauth, vmbackup and vmrestore docs, so they match the heading sizes in VictoriaMetrics docs 2021-02-01 19:44:00 +02:00
Aliaksandr Valialkin
b39d5ef656 vendor: make vendor-update 2021-02-01 19:39:10 +02:00
Aliaksandr Valialkin
8164cd8932 docs/vmctl.md: update build instructions after the migration from github.com/VictoriaMetrics/vmctl to github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl 2021-02-01 19:39:08 +02:00
Aliaksandr Valialkin
b43b498fd8 app/vmselect: add ability to pass extra_label=<label>=<value> query arg to Prometheus Querying API
This enforced `{label="value"}` label filter to the query.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1021
2021-02-01 18:04:17 +02:00
Aliaksandr Valialkin
5d87dbfd65 docs: document ability to query Graphite datasource from vmalert 2021-02-01 15:26:33 +02:00
Nikolay
195341a7cf Graphite vmalert wip (#112)
* init implementation for graphite alerts

* adds graphite support for vmalert

* small fix

* changes vmalert graphite api with type

* updates tests

* small fix

* fixes graphite parse

* Fixes graphite from time
2021-02-01 15:05:32 +02:00
Aliaksandr Valialkin
f0087f0dbb lib/flagutil: typo fix in comment to ArrayInt.GetOptionalArgOrDefault() func 2021-02-01 14:35:39 +02:00
Aliaksandr Valialkin
a4ae945a79 app/victoria-metrics: fix tests after 8749c2dd92 2021-02-01 14:34:11 +02:00
Aliaksandr Valialkin
b2aa80e74b app/vmagent: add -remoteWrite.roundDigits command-line option for limiting the number of digits after the point for stored values
This commit also adds --vm-round-digits command-line option to vmctl tool.
2021-02-01 14:27:09 +02:00
Aliaksandr Valialkin
29a7067827 app/vmctl: fix make check-all warnings 2021-02-01 01:31:25 +02:00
Aliaksandr Valialkin
d5c180e680 app/vmctl: move vmctl code from github.com/VictoriaMetrics/vmctl
It is better developing vmctl tool in VictoriaMetrics repository, so it could be released
together with the rest of vmutils tools such as vmalert, vmagent, vmbackup, vmrestore and vmauth.
2021-02-01 01:10:20 +02:00
Aliaksandr Valialkin
2a7b1cc668 docs/Cluster-VictoriaMetrics.md: mention about -search.denyPartialResponse command-line flag and deny_partial_response query arg 2021-01-27 14:07:00 +02:00
Aliaksandr Valialkin
929f09b90d docs/CHANGELOG.md: typo fixes 2021-01-27 01:18:48 +02:00
Aliaksandr Valialkin
d6347a3e56 lib/logger: initialize timezone by UTC in order to fix failing tests 2021-01-27 00:59:12 +02:00
Aliaksandr Valialkin
fc5b26d856 lib/promscrape: export vm_promscrape_scrapes_failed_per_url_total and vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total metrics
These metrics could be useful for determining imporperly working scrape targets.
Note that these metrics are exported only for failing scrape targets. They aren't exposed for normally working targets.
2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
de3c662e8a all: consistently use timers from timerpool 2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
3149ac7a7e lib/fs: properly initialize cleaner for pageCache bitmaps
Previously it wasnt working because the timer was fired only once
2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
419ad74269 app/vmagent: add -remoteWrite.rateLimit command-line flag for limiting data rate to remote storage
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1035
2021-01-27 00:39:26 +02:00
Aliaksandr Valialkin
3fe848cdd7 lib/logger: add -loggerTimezone command-line flag for adjusting timezone for timestamps in log messages 2021-01-26 22:51:54 +02:00
Aliaksandr Valialkin
5481906db6 docs/CHANGELOG.md: mention about https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1027 2021-01-26 16:37:36 +02:00
weng zhao
cc3e69e963 vmalert: add option datasource.queryStep to allow user to address the inconsistency between grafana dashboards(query_range with step 15s usually) and ALERTS (#1027)
Co-authored-by: zhao.weng <zhao.weng@shopee.com>
2021-01-26 08:12:04 +00:00
Aliaksandr Valialkin
8cea3c3cc4 lib/promscrape: retry scrape and service discovery requests when the remote server closes http keep-alive connection 2021-01-22 13:22:33 +02:00
Aliaksandr Valialkin
c164a8d231 app/vmselect/promql: improve documentation for -search.maxPointsPertimeseries command-line flag
This should reduce incorrect usage and assumptions for this flag.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1020
2021-01-22 13:00:10 +02:00
Aliaksandr Valialkin
3caac3d12c docs/CHANGELOG.md: mention about the fix with too big HTTP reconnection rate to targets
This has been fixed in 0a45220b0a
2021-01-22 12:09:16 +02:00
Aliaksandr Valialkin
054fe1c198 deployment/docker: update Go builder from v1.15.6 to v1.15.7
See https://groups.google.com/g/golang-nuts/c/ufLjEY_AJ0I/m/smSHpGXiDQAJ for details
2021-01-21 18:39:49 +02:00
Aliaksandr Valialkin
0a45220b0a vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.11 to v1.0.12 2021-01-21 12:00:21 +02:00
Aliaksandr Valialkin
8749c2dd92 app/vmselect: add -search.maxStepForPointsAdjustment command-line flag, which can be used for disabling adjustment for points returned from /api/v1/query_range handler if they have timestamps closer than -search.latencyOffset to the current time 2021-01-19 22:56:32 +02:00
Aliaksandr Valialkin
011c5da785 app/vmselect/graphite: extract getCanonicalPath() function from loop body inside getCanonicalPaths() 2021-01-18 17:30:26 +02:00
Aliaksandr Valialkin
fcbefc15d0 LICENSE: bump the last year from 2020 to 2021 2021-01-16 13:00:16 +02:00
Aliaksandr Valialkin
485d43ef21 deployment/docker: upgrade alpine base Docker image from v3.12.3 to v3.13.0
See release notes for v3.13.0 - https://www.alpinelinux.org/posts/Alpine-3.13.0-released.html
2021-01-15 22:50:40 +02:00
faceair
b638c1eed5 lib/mergeset: add missing shouldCacheBlock (#1019) 2021-01-15 11:46:01 +02:00
Aliaksandr Valialkin
cc379f95c2 Makefile: add release-victoria-metrics-arm64 build rule 2021-01-13 18:13:18 +02:00
Aliaksandr Valialkin
689d769b4d Makefile: release vmutils for amd64 and arm64
Follow-up for 0d03855787
2021-01-13 18:04:37 +02:00
Robert Edström
0d03855787 Arch consistent filenames (#1015)
* Include individual binary checksums for vmutils

* Consistent archive/binary artefacts between arm64/amd64 for vmutils

* architecture in arhcive, checksums
* not in binaries
2021-01-13 17:31:08 +02:00
Aliaksandr Valialkin
75f7c51cab docs/vmagent.md: follow-up for 184a659c5f 2021-01-13 13:53:14 +02:00
mancubus77
184a659c5f Doco vmagent fix (#1014)
* Update section with remote_write.url for clustered version

* fix typo

Co-authored-by: mancubus <dont@write.me>
2021-01-13 13:50:37 +02:00
Aliaksandr Valialkin
7ce87ebcb2 docs/CHANGELOG.md: cut v1.52.0 2021-01-13 12:58:51 +02:00
Aliaksandr Valialkin
1051d8aa2d app/vmselect/promql: add ability to pass multiple labels to sort_by_label and sort_by_label_desc functions
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/992
2021-01-13 12:44:51 +02:00
Aliaksandr Valialkin
689cf88eb2 vendor: make vendor-update 2021-01-13 12:19:39 +02:00
Aliaksandr Valialkin
bdd0a1cdb2 lib/backup: increase backup chunk size from 128MB to 1GB
This should reduce costs for object storage API calls by 8x. See https://cloud.google.com/storage/pricing#operations-pricing
2021-01-13 12:16:35 +02:00
Aliaksandr Valialkin
acf1a2c72b app/vmselect/promql: properly parse escaped multibyte utf8 code sequences in metric names and labels names
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/990
2021-01-13 10:59:42 +02:00
Aliaksandr Valialkin
89315d719d docs/CHANGELOG.md: document updated extra_label query arg behavior
Follow-up for dc9d7aedd5
2021-01-13 00:58:20 +02:00
Nikolay
dc9d7aedd5 adds extra_label to all import apis (#1007)
* adds extra_label to all import apis,
changes priority for extra_label - now it has priority over original labels

* Update README.md

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>

* Update README.md

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>

* adds extra labels to vmagent  import api
changes order for adding labels, now its added after user values

* adds tests for extra_label

* import fix

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-01-13 00:52:50 +02:00
Aliaksandr Valialkin
7373986f9e docs/CHANGELOG.md: mention that the minimum supported TLS version now is v1.2
Follow-up for 7bf5d48315
2021-01-13 00:44:39 +02:00
Nikolay
7bf5d48315 bumps minimal tls version (#1012) 2021-01-13 00:35:47 +02:00
Aliaksandr Valialkin
3e451ccdda docs/Single-server-VictoriaMetrics.md: typo fix 2021-01-12 22:02:55 +02:00
Aliaksandr Valialkin
fe3444b124 deployment/docker: upgrade base image for Docker packages from Alpine 3.13.1 to Alpine 3.12.3 in order to fix potential security issues
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1010
2021-01-12 21:57:01 +02:00
Robert Edström
77be066ee8 add release-vmutils-arm64,release-vmutils-arm64 make targets (#1011) 2021-01-12 21:50:04 +02:00
Aliaksandr Valialkin
1837f2f7d3 app/vmselect/promql: add tfirst_over_time(m[d]) and tlast_over_time(m[d]) MetricsQL functions for returning timestamps for the first and the last samples in m over d 2021-01-12 16:12:12 +02:00
Aliaksandr Valialkin
f5d52b51f1 docs/Articles.md: add https://cer6erus.medium.com/cloud-native-model-driven-telemetry-stack-on-openshift-80712621f5bc 2021-01-12 15:36:27 +02:00
Aliaksandr Valialkin
31ec79eaf6 lib/storage: inline marshalTags function and remove the code for handling duplicate tags from here
This is a follow-up commit after c8ea697db8
2021-01-12 15:13:30 +02:00
Aliaksandr Valialkin
c8ea697db8 lib/storage: de-duplicate tags in MetricName.sortTags
Leave only the last tag among tags with duplicate keys. This is needed for reliable addition of extra_labels
during data ingestion. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1007 for details.
2021-01-12 15:03:42 +02:00
Aliaksandr Valialkin
2140ccbdcc docs/CHANGELOG.md: document big fixes from the commit 7976c22797 2021-01-12 13:44:17 +02:00
Nikolay
7976c22797 Fixes error handling for promscrape.streamParse (#1009)
properly return error if client cannot read data,
properly suppress scraper errors
2021-01-12 13:31:47 +02:00
Aliaksandr Valialkin
2c44f9989a lib/promscrape: properly show scrape duration on /targets page
Previously it has been shown as 0.000s for any scrape duration.
2021-01-11 21:14:46 +02:00
Aliaksandr Valialkin
e61e3bf174 docs/Single-server-VictoriaMetrics.md: mention about https://github.com/aorfanos/vmalert-cli in Integrations section 2021-01-11 18:52:08 +02:00
Aliaksandr Valialkin
89611fa48c docs/CHANGELOG.md: mention about a bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/989 2021-01-11 13:11:41 +02:00
Roman Khavronenko
14f0f90507 docker-compose: provide the example list of alerting rules for vm components (#1005)
List contains examples for the alerting rules which might be executed
via `vmalert` to track the health state of VM components. It is assumed
that list will be revised and calibrated for each system individually.
2021-01-11 13:03:15 +02:00
Aliaksandr Valialkin
24ffad74c1 all: use net.Dial instead of fasthttp.Dial, because fasthttp.Dial limits the number of concurrent dials to 1000 2021-01-11 12:53:30 +02:00
Aliaksandr Valialkin
6740294ebb vendor: update github.com/VictoriaMetrics/fasthttp 2021-01-11 12:53:30 +02:00
Roman Khavronenko
2e2e4f7e21 vmalert-989: return non-empty result in template func query stub to pass validation (#1002)
On templates validation stage vmalert does not acutally send queries, so for complex
chained expression validation may fail. To avoid this, we add a blank sample in response
so validation can pass successfully. Later, during the rule execution, stub will be replaced
with real `query` function.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/989
2021-01-10 02:56:11 +03:00
Aliaksandr Valialkin
9dcb18e03d app/vmstorage: disable final merge by default, since it may result in high disk IO and CPU usage without measurable benefits such as increased query performance and reduced disk space usage 2021-01-08 00:16:05 +02:00
Aliaksandr Valialkin
0477991b4d vendor: make vendor-update 2021-01-07 23:55:02 +02:00
Aliaksandr Valialkin
b1f9b39c4b docs/Single-server-VictoriaMetrics.md: sync with upstream 2021-01-07 23:37:31 +02:00
Dan Dascalescu
39b11b3ff4 Tiny typo fix (#997) 2021-01-07 23:35:46 +02:00
Roman Khavronenko
7bd420cbfe docker-compose: add blackhole receiver for alertmanager (#999)
Currently, alertmanager spams logs with `Notify attempt failed, will retry later` message
because default receiver is unreachable. The change updates default configuration with
blackhole receiver which means alertmanager will continue to accept alerts but won't make
attempts to send them anywhere.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/995
2021-01-07 23:33:53 +02:00
Nikolay
85962b459f Snap docs change (#986)
* adds snap docs,
adds release information for snap package,
adds docs notes about configuration management with snap package.

* adds release page mention

* version fix for snap, its awful

* revert version

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2020-12-29 11:43:09 +02:00
Aliaksandr Valialkin
f6ca776c75 README.md: mention about -search.queryStats.lastQueriesCount and -search.queryStats.minQueryDuration command-line flags in docs about query stats 2020-12-29 11:38:57 +02:00
Aliaksandr Valialkin
70df5f4975 docs/CHANGELOG.md: cut v1.51.0 2020-12-27 14:21:29 +02:00
Aliaksandr Valialkin
c86286ec1d app/vmselect/promql: do not ajdust offset value provided in the query
Previously it could be modified in order to improve response cache hit ratio.
This is unneeded, since cache hit ratio should remain good because the query time range
should be already aligned to multiple of `step` values.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/976
2020-12-27 14:09:25 +02:00
Aliaksandr Valialkin
261535b32d docs/Articles.md: add a link to https://www.percona.com/blog/2020/12/23/observations-on-better-resource-usage-with-percona-monitoring-and-management-v2-12-0/ 2020-12-27 13:01:30 +02:00
Aliaksandr Valialkin
4b7105a65b app/vmselect: sync query stats handling with cluster version 2020-12-27 13:00:29 +02:00
Aliaksandr Valialkin
df0309eae0 app/vmselect/promql: simplify defer call for querystats.RegisterQuery 2020-12-27 12:06:04 +02:00
Aliaksandr Valialkin
ad4e6a9283 app/vmselect/querystats: reduce the default number of last queries to track from 100K to 20K
This should reduce memory usage in constrained environments
2020-12-25 17:40:47 +02:00
Aliaksandr Valialkin
59183f66d0 app/vmselect: refactor /api/v1/stats/top_queries
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/907
2020-12-25 16:44:29 +02:00
Aliaksandr Valialkin
fb338c50a3 app/victoria-metrics: show usage info when incorrect command-line flag is passed to executable 2020-12-25 16:42:21 +02:00
Nikolay
86630350bf Adds query stats handler (#945)
* Adds query stat handler,
for query and query_range api, victoriametrics tracks query execution time,
stats are expored at /api/v1/status/queries endpoint with topN param
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/907

* fixed query stats bugs

* improves queryStats tracker

* improves query stat

* small fix

* fix tests

* added more tests

* fixes 386 tests

* naming fixes

* adds drop for outdated records
2020-12-25 16:42:05 +02:00
Aliaksandr Valialkin
490c69c64e lib/storage: wait for pending transactions before closing and dropping the partition
This deflakes `make test-full-386` test
2020-12-25 11:45:53 +02:00
Aliaksandr Valialkin
932e53522d docs/CHANGELOG.md: mention that vmalert now properly escapes multi-line queries when passing to Grafana
A follow-up for 1de15ad490

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
2020-12-25 11:12:06 +02:00
Nikolay
1de15ad490 adds escape for CRLF (#984)
at external.alert.source - \n and \r symbols was url encoded, instead of direct usage.
replace it from "\n" to `\n`  allows to skip url encoding.
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
2020-12-25 11:03:13 +02:00
Aliaksandr Valialkin
1f2944a9d0 vendor: make vendor-update 2020-12-24 17:19:41 +02:00
Aliaksandr Valialkin
cab7e936a3 lib/storage: physically remove stale parts
Previously they were removed from partition struct, but the corresponding directories weren't removed.

This is a follow-up for 46dba00756
2020-12-24 16:51:36 +02:00
Aliaksandr Valialkin
0326638c90 app/vmalert: typo fix in descriptions for notifier.basicAuth.username and notifier.basicAuth.password command-line flags 2020-12-24 12:48:59 +02:00
Aliaksandr Valialkin
4eb520a342 docs/CHANGELOG.md: mention about adding missing __meta_kubernetes_service_* labels for endpoints and endpointslices roles in kubernetes_sd_config
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/982
2020-12-24 11:33:00 +02:00
Nikolay
b21e16ad0c fixes kubernetes_sd (#983)
* fixes kubernetes_sd,
adds missing service metadata for pod ports without endpoint
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/982

* fix test
2020-12-24 11:26:14 +02:00
Aliaksandr Valialkin
820669da69 lib/promscrape: code prettifying for 8dd03ecf19 2020-12-24 10:56:10 +02:00
Nikolay
8dd03ecf19 adds proxy_url support, (#980)
* adds proxy_url support,
adds proxy_url to the dockerswarm, eureka, kubernetes and consul service discovery,
adds proxy_url to the scrape_config for targets scrapping,
http based proxy is supported atm,
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/503

* fixes imports
2020-12-24 10:52:37 +02:00
Aliaksandr Valialkin
9e4ed5e591 lib/storage: do not remove parts outside the configured retention if they are currently merged
These parts are automatically removed after the merge is complete.
2020-12-24 08:51:28 +02:00
Aliaksandr Valialkin
9df60518bb docs: mention that it is possible to set multiple -notifier.tlsInsecureSkipVerify command-line flags for vmalert
See c3a92968343c2b3619f1ab935702d0e9b3a46733
2020-12-22 22:32:13 +02:00
Nikolay
c270f8f3e6 changes vmalert notifier flag, (#978)
fixes issue with notifier insecure setting, now its possible to use multiple notifier.tlsInsecureSkipVerify multiple time.
2020-12-22 23:23:04 +03:00
Aliaksandr Valialkin
46dba00756 lib/storage: remove stale parts as soon as they go outside the configured retention
Previously such parts could remain undeleted for long durations until they are merged with other parts.
This should help for `-retentionPeriod` values smaller than one month.
2020-12-22 19:54:31 +02:00
Aliaksandr Valialkin
de89bcddae vendor: upgrade github.com/klauspost/compress from v1.11.3 to v1.11.4 2020-12-21 08:56:02 +02:00
Artem Navoiev
0f99c1afb1 add linkedin to release announcement 2020-12-20 20:06:48 +02:00
Artem Navoiev
750daa04d1 Announcement guide 2020-12-19 21:58:03 +02:00
Aliaksandr Valialkin
e4f856e900 vendor: make vendor-update 2020-12-19 17:00:20 +02:00
1451 changed files with 164480 additions and 27385 deletions

View File

@@ -9,9 +9,9 @@ assignees: ''
**Describe the bug**
A clear and concise description of what the bug is.
It would be great [upgrading](https://victoriametrics.github.io/#how-to-upgrade) to [the latest avaialble release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
It would be a great [upgrading](https://docs.victoriametrics.com/#how-to-upgrade) to [the latest avaialble release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verifying whether the bug is reproducible there.
It is also recommended reading [troubleshooting docs](https://victoriametrics.github.io/#troubleshooting).
It is also recommended reading [troubleshooting docs](https://docs.victoriametrics.com/#troubleshooting).
**To Reproduce**
Steps to reproduce the behavior.
@@ -30,11 +30,11 @@ victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
**Used command-line flags**
Command-line flags are listed as `flag{name="httpListenAddr", value=":443"} 1` lines at `/metrics` page.
Command-line flags are listed as `flag{name="httpListenAddr", value=":443"} 1` lines at the `/metrics` page.
See the following docs for details:
* [monitoring for single-node VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#monitoring)
* [montioring for VictoriaMetrics cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/README.md#monitoring)
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [montioring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
**Additional context**
Add any other context about the problem here such as error logs from VictoriaMetrics and Prometheus,

6
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"

23
.github/workflows/check-licenses.yml vendored Normal file
View File

@@ -0,0 +1,23 @@
name: license-check
on:
push:
paths:
- 'vendor'
pull_request:
paths:
- 'vendor'
jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.16
id: go
- name: Code checkout
uses: actions/checkout@master
- name: Check License
run: |
make check-licenses

View File

@@ -16,7 +16,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.15
go-version: 1.16
id: go
- name: Dependencies
run: |
@@ -45,19 +45,22 @@ jobs:
GOOS=freebsd go build -mod=vendor ./app/vmalert
GOOS=freebsd go build -mod=vendor ./app/vmbackup
GOOS=freebsd go build -mod=vendor ./app/vmrestore
GOOS=freebsd go build -mod=vendor ./app/vmctl
GOOS=openbsd go build -mod=vendor ./app/victoria-metrics
GOOS=openbsd go build -mod=vendor ./app/vmagent
GOOS=openbsd go build -mod=vendor ./app/vmalert
GOOS=openbsd go build -mod=vendor ./app/vmbackup
GOOS=openbsd go build -mod=vendor ./app/vmrestore
GOOS=openbsd go build -mod=vendor ./app/vmctl
GOOS=darwin go build -mod=vendor ./app/victoria-metrics
GOOS=darwin go build -mod=vendor ./app/vmagent
GOOS=darwin go build -mod=vendor ./app/vmalert
GOOS=darwin go build -mod=vendor ./app/vmbackup
GOOS=darwin go build -mod=vendor ./app/vmrestore
GOOS=darwin go build -mod=vendor ./app/vmctl
CGO_ENABLED=0 GOOS=windows go build -mod=vendor ./app/vmagent
- name: Publish coverage
uses: codecov/codecov-action@v1.0.6
uses: codecov/codecov-action@v1.5.0
with:
token: ${{secrets.CODECOV_TOKEN}}
file: ./coverage.txt

1
.gitignore vendored
View File

@@ -15,3 +15,4 @@
/package/temp-rpm-*
/package/*.deb
/package/*.rpm
.DS_store

5
.wwhrd.yml Normal file
View File

@@ -0,0 +1,5 @@
allowlist:
- Apache-2.0
- MIT
- BSD-3-Clause
- BSD-2-Clause

View File

@@ -7,7 +7,7 @@ contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
appearance, race, religion or sexual identity and orientation.
## Our Standards
@@ -24,9 +24,9 @@ Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Trolling, insulting/derogatory comments and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
* Publishing others' private information, such as physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
@@ -38,26 +38,26 @@ behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
reject comments, commits, code, wiki edits, issues and other contributions
that are not aligned to this Code of Conduct or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
threatening, offensive or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
address, posting via an official social media account or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
Instances of abusive, harassing or otherwise unacceptable behavior may be
reported by contacting the project team at info@victoriametrics.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
is deemed necessary and appropriate for the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

View File

@@ -175,7 +175,7 @@
END OF TERMS AND CONDITIONS
Copyright 2019-2020 VictoriaMetrics, Inc.
Copyright 2019-2021 VictoriaMetrics, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

178
Makefile
View File

@@ -1,5 +1,6 @@
PKG_PREFIX := github.com/VictoriaMetrics/VictoriaMetrics
DATEINFO_TAG ?= $(shell date -u +'%Y%m%d-%H%M%S')
BUILDINFO_TAG ?= $(shell echo $$(git describe --long --all | tr '/' '-')$$( \
git diff-index --quiet HEAD -- || echo '-dirty-'$$(git diff-index -u HEAD | openssl sha1 | cut -c 10-17)))
@@ -8,7 +9,7 @@ ifeq ($(PKG_TAG),)
PKG_TAG := $(BUILDINFO_TAG)
endif
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(shell date -u +'%Y%m%d-%H%M%S')-$(BUILDINFO_TAG)'
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(DATEINFO_TAG)-$(BUILDINFO_TAG)'
.PHONY: $(MAKECMDGOALS)
@@ -18,7 +19,8 @@ all: \
vmalert-prod \
vmauth-prod \
vmbackup-prod \
vmrestore-prod
vmrestore-prod \
vmctl-prod
include app/*/Makefile
include deployment/*/Makefile
@@ -32,7 +34,8 @@ publish: \
publish-vmalert \
publish-vmauth \
publish-vmbackup \
publish-vmrestore
publish-vmrestore \
publish-vmctl
package: \
package-victoria-metrics \
@@ -40,31 +43,136 @@ package: \
package-vmalert \
package-vmauth \
package-vmbackup \
package-vmrestore
package-vmrestore \
package-vmctl
vmutils: \
vmagent \
vmalert \
vmauth \
vmbackup \
vmrestore
vmrestore \
vmctl
vmutils-pure: \
vmagent-pure \
vmalert-pure \
vmauth-pure \
vmbackup-pure \
vmrestore-pure \
vmctl-pure
vmutils-arm64: \
vmagent-arm64 \
vmalert-arm64 \
vmauth-arm64 \
vmbackup-arm64 \
vmrestore-arm64 \
vmctl-arm64
vmutils-arm: \
vmagent-arm \
vmalert-arm \
vmauth-arm \
vmbackup-arm \
vmrestore-arm \
vmctl-arm
vmutils-windows-amd64: \
vmagent-windows-amd64 \
vmalert-windows-amd64 \
vmauth-windows-amd64 \
vmctl-windows-amd64
release-snap:
snapcraft
snapcraft upload "victoriametrics_$(PKG_TAG)_multi.snap" --release beta,edge,candidate
release: \
release-victoria-metrics \
release-vmutils
release-victoria-metrics: victoria-metrics-prod
cd bin && tar czf victoria-metrics-$(PKG_TAG).tar.gz victoria-metrics-prod && \
sha256sum victoria-metrics-$(PKG_TAG).tar.gz > victoria-metrics-$(PKG_TAG)_checksums.txt
release-victoria-metrics: \
release-victoria-metrics-amd64 \
release-victoria-metrics-arm \
release-victoria-metrics-arm64
release-victoria-metrics-amd64:
GOARCH=amd64 $(MAKE) release-victoria-metrics-generic
release-victoria-metrics-arm:
GOARCH=arm $(MAKE) release-victoria-metrics-generic
release-victoria-metrics-arm64:
GOARCH=arm64 $(MAKE) release-victoria-metrics-generic
release-victoria-metrics-generic: victoria-metrics-$(GOARCH)-prod
cd bin && \
tar --transform="flags=r;s|-$(GOARCH)||" -czf victoria-metrics-$(GOARCH)-$(PKG_TAG).tar.gz \
victoria-metrics-$(GOARCH)-prod \
&& sha256sum victoria-metrics-$(GOARCH)-$(PKG_TAG).tar.gz \
victoria-metrics-$(GOARCH)-prod \
| sed s/-$(GOARCH)-prod/-prod/ > victoria-metrics-$(GOARCH)-$(PKG_TAG)_checksums.txt
release-vmutils: \
vmagent-prod \
vmalert-prod \
vmauth-prod \
vmbackup-prod \
vmrestore-prod
cd bin && tar czf vmutils-$(PKG_TAG).tar.gz vmagent-prod vmalert-prod vmauth-prod vmbackup-prod vmrestore-prod && \
sha256sum vmutils-$(PKG_TAG).tar.gz > vmutils-$(PKG_TAG)_checksums.txt
release-vmutils-amd64 \
release-vmutils-arm64 \
release-vmutils-arm \
release-vmutils-windows-amd64
release-vmutils-amd64:
GOARCH=amd64 $(MAKE) release-vmutils-generic
release-vmutils-arm64:
GOARCH=arm64 $(MAKE) release-vmutils-generic
release-vmutils-arm:
GOARCH=arm $(MAKE) release-vmutils-generic
release-vmutils-windows-amd64:
GOARCH=amd64 $(MAKE) release-vmutils-windows-generic
release-vmutils-generic: \
vmagent-$(GOARCH)-prod \
vmalert-$(GOARCH)-prod \
vmauth-$(GOARCH)-prod \
vmbackup-$(GOARCH)-prod \
vmrestore-$(GOARCH)-prod \
vmctl-$(GOARCH)-prod
cd bin && \
tar --transform="flags=r;s|-$(GOARCH)||" -czf vmutils-$(GOARCH)-$(PKG_TAG).tar.gz \
vmagent-$(GOARCH)-prod \
vmalert-$(GOARCH)-prod \
vmauth-$(GOARCH)-prod \
vmbackup-$(GOARCH)-prod \
vmrestore-$(GOARCH)-prod \
vmctl-$(GOARCH)-prod \
&& sha256sum vmutils-$(GOARCH)-$(PKG_TAG).tar.gz \
vmagent-$(GOARCH)-prod \
vmalert-$(GOARCH)-prod \
vmauth-$(GOARCH)-prod \
vmbackup-$(GOARCH)-prod \
vmrestore-$(GOARCH)-prod \
vmctl-$(GOARCH)-prod \
| sed s/-$(GOARCH)-prod/-prod/ > vmutils-$(GOARCH)-$(PKG_TAG)_checksums.txt
release-vmutils-windows-generic: \
vmagent-windows-$(GOARCH)-prod \
vmalert-windows-$(GOARCH)-prod \
vmauth-windows-$(GOARCH)-prod \
vmctl-windows-$(GOARCH)-prod
cd bin && \
zip vmutils-windows-$(GOARCH)-$(PKG_TAG).zip \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
&& sha256sum vmutils-windows-$(GOARCH)-$(PKG_TAG).zip \
vmagent-windows-$(GOARCH)-prod.exe \
vmalert-windows-$(GOARCH)-prod.exe \
vmauth-windows-$(GOARCH)-prod.exe \
vmctl-windows-$(GOARCH)-prod.exe \
> vmutils-windows-$(GOARCH)-$(PKG_TAG)_checksums.txt
pprof-cpu:
go tool pprof -trim_path=github.com/VictoriaMetrics/VictoriaMetrics@ $(PPROF_FILE)
@@ -82,7 +190,7 @@ lint: install-golint
golint app/...
install-golint:
which golint || go install golang.org/x/lint/golint
which golint || GO111MODULE=off go get golang.org/x/lint/golint
errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./lib/...
@@ -94,9 +202,10 @@ errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./app/vmauth/...
errcheck -exclude=errcheck_excludes.txt ./app/vmbackup/...
errcheck -exclude=errcheck_excludes.txt ./app/vmrestore/...
errcheck -exclude=errcheck_excludes.txt ./app/vmctl/...
install-errcheck:
which errcheck || go install github.com/kisielk/errcheck
which errcheck || GO111MODULE=off go get github.com/kisielk/errcheck
check-all: fmt vet lint errcheck golangci-lint
@@ -138,23 +247,42 @@ app-local-pure:
app-local-with-goarch:
GO111MODULE=on go build $(RACE) -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/$(APP_NAME)-$(GOARCH)$(RACE) $(PKG_PREFIX)/app/$(APP_NAME)
app-local-windows-with-goarch:
CGO_ENABLED=0 GO111MODULE=on go build $(RACE) -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/$(APP_NAME)-windows-$(GOARCH)$(RACE).exe $(PKG_PREFIX)/app/$(APP_NAME)
quicktemplate-gen: install-qtc
qtc
install-qtc:
which qtc || go install github.com/valyala/quicktemplate/qtc
which qtc || GO111MODULE=off go get github.com/valyala/quicktemplate/qtc
golangci-lint: install-golangci-lint
golangci-lint run --exclude '(SA4003|SA1019|SA5011):' -D errcheck -D structcheck --timeout 2m
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.29.0
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.40.1
install-wwhrd:
which wwhrd || GO111MODULE=off go get github.com/frapposelli/wwhrd
check-licenses: install-wwhrd
wwhrd check -f .wwhrd.yml
copy-docs:
echo "---\nsort: ${ORDER}\n---\n" > ${DST}
cat ${SRC} >> ${DST}
# Copies docs for all components and adds the order tag.
# Cluster docs are supposed to be ordered as 9th.
# For The rest of docs is ordered manually.t
docs-sync:
cp app/vmagent/README.md docs/vmagent.md
cp app/vmalert/README.md docs/vmalert.md
cp app/vmauth/README.md docs/vmauth.md
cp app/vmbackup/README.md docs/vmbackup.md
cp app/vmrestore/README.md docs/vmrestore.md
cp README.md docs/Single-server-VictoriaMetrics.md
SRC=README.md DST=docs/Single-server-VictoriaMetrics.md ORDER=1 $(MAKE) copy-docs
SRC=app/vmagent/README.md DST=docs/vmagent.md ORDER=2 $(MAKE) copy-docs
SRC=app/vmalert/README.md DST=docs/vmalert.md ORDER=3 $(MAKE) copy-docs
SRC=app/vmauth/README.md DST=docs/vmauth.md ORDER=4 $(MAKE) copy-docs
SRC=app/vmbackup/README.md DST=docs/vmbackup.md ORDER=5 $(MAKE) copy-docs
SRC=app/vmrestore/README.md DST=docs/vmrestore.md ORDER=6 $(MAKE) copy-docs
SRC=app/vmctl/README.md DST=docs/vmctl.md ORDER=7 $(MAKE) copy-docs
SRC=app/vmgateway/README.md DST=docs/vmgateway.md ORDER=8 $(MAKE) copy-docs
SRC=app/vmbackupmanager/README.md DST=docs/vmbackupmanager.md ORDER=9 $(MAKE) copy-docs

568
README.md
View File

@@ -1,3 +1,5 @@
# VictoriaMetrics
[![Latest Release](https://img.shields.io/github/release/VictoriaMetrics/VictoriaMetrics.svg?style=flat-square)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
[![Docker Pulls](https://img.shields.io/docker/pulls/victoriametrics/victoria-metrics.svg?maxAge=604800)](https://hub.docker.com/r/victoriametrics/victoria-metrics)
[![Slack](https://img.shields.io/badge/join%20slack-%23victoriametrics-brightgreen.svg)](http://slack.victoriametrics.com/)
@@ -6,11 +8,9 @@
[![Build Status](https://github.com/VictoriaMetrics/VictoriaMetrics/workflows/main/badge.svg)](https://github.com/VictoriaMetrics/VictoriaMetrics/actions)
[![codecov](https://codecov.io/gh/VictoriaMetrics/VictoriaMetrics/branch/master/graph/badge.svg)](https://codecov.io/gh/VictoriaMetrics/VictoriaMetrics)
![Victoria Metrics logo](logo.png "Victoria Metrics")
<img src="logo.png" width="300" alt="VictoriaMetrics logo">
## VictoriaMetrics
VictoriaMetrics is fast, cost-effective and scalable monitoring solution and time series database.
VictoriaMetrics is a fast, cost-effective and scalable monitoring solution and time series database.
It is available in [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases),
[docker images](https://hub.docker.com/r/victoriametrics/victoria-metrics/), [Snap package](https://snapcraft.io/victoriametrics)
@@ -30,28 +30,28 @@ See [features available for enterprise customers](https://victoriametrics.com/en
Alphabetically sorted links to case studies:
* [adidas](https://victoriametrics.github.io/CaseStudies.html#adidas)
* [Adsterra](https://victoriametrics.github.io/CaseStudies.html#adsterra)
* [ARNES](https://victoriametrics.github.io/CaseStudies.html#arnes)
* [Brandwatch](https://victoriametrics.github.io/CaseStudies.html#brandwatch)
* [CERN](https://victoriametrics.github.io/CaseStudies.html#cern)
* [COLOPL](https://victoriametrics.github.io/CaseStudies.html#colopl)
* [Dreamteam](https://victoriametrics.github.io/CaseStudies.html#dreamteam)
* [Idealo.de](https://victoriametrics.github.io/CaseStudies.html#idealode)
* [MHI Vestas Offshore Wind](https://victoriametrics.github.io/CaseStudies.html#mhi-vestas-offshore-wind)
* [Synthesio](https://victoriametrics.github.io/CaseStudies.html#synthesio)
* [Wedos.com](https://victoriametrics.github.io/CaseStudies.html#wedoscom)
* [Wix.com](https://victoriametrics.github.io/CaseStudies.html#wixcom)
* [Zerodha](https://victoriametrics.github.io/CaseStudies.html#zerodha)
* [zhihu](https://victoriametrics.github.io/CaseStudies.html#zhihu)
* [adidas](https://docs.victoriametrics.com/CaseStudies.html#adidas)
* [Adsterra](https://docs.victoriametrics.com/CaseStudies.html#adsterra)
* [ARNES](https://docs.victoriametrics.com/CaseStudies.html#arnes)
* [Brandwatch](https://docs.victoriametrics.com/CaseStudies.html#brandwatch)
* [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern)
* [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl)
* [Dreamteam](https://docs.victoriametrics.com/CaseStudies.html#dreamteam)
* [Idealo.de](https://docs.victoriametrics.com/CaseStudies.html#idealode)
* [MHI Vestas Offshore Wind](https://docs.victoriametrics.com/CaseStudies.html#mhi-vestas-offshore-wind)
* [Synthesio](https://docs.victoriametrics.com/CaseStudies.html#synthesio)
* [Wedos.com](https://docs.victoriametrics.com/CaseStudies.html#wedoscom)
* [Wix.com](https://docs.victoriametrics.com/CaseStudies.html#wixcom)
* [Zerodha](https://docs.victoriametrics.com/CaseStudies.html#zerodha)
* [zhihu](https://docs.victoriametrics.com/CaseStudies.html#zhihu)
## Prominent features
* VictoriaMetrics can be used as long-term storage for Prometheus or for [vmagent](https://victoriametrics.github.io/vmagent.html).
* VictoriaMetrics can be used as long-term storage for Prometheus or for [vmagent](https://docs.victoriametrics.com/vmagent.html).
See [these docs](#prometheus-setup) for details.
* VictoriaMetrics supports [Prometheus querying API](https://prometheus.io/docs/prometheus/latest/querying/api/), so it can be used as Prometheus drop-in replacement in Grafana.
* VictoriaMetrics implements [MetricsQL](https://victoriametrics.github.io/MetricsQL.html) query language backwards compatible with PromQL.
* VictoriaMetrics implements [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) query language backwards compatible with PromQL.
* VictoriaMetrics provides global query view. Multiple Prometheus instances or any other data sources may ingest data into VictoriaMetrics.
Later this data may be queried via a single query.
* High performance and good scalability for both [inserts](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b)
@@ -76,7 +76,7 @@ Alphabetically sorted links to case studies:
* All the configuration is done via explicit command-line flags with reasonable defaults.
* All the data is stored in a single directory pointed by `-storageDataPath` command-line flag.
* Easy and fast backups from [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
to S3 or GCS with [vmbackup](https://victoriametrics.github.io/vmbackup.html) / [vmrestore](https://victoriametrics.github.io/vmrestore.html).
to S3 or GCS with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) / [vmrestore](https://docs.victoriametrics.com/vmrestore.html).
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
* Storage is protected from corruption on unclean shutdown (i.e. OOM, hardware reset or `kill -9`) thanks to [the storage architecture](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
* Supports metrics' scraping, ingestion and [backfilling](#backfilling) via the following protocols:
@@ -93,9 +93,10 @@ Alphabetically sorted links to case studies:
* [Prometheus exposition format](#how-to-import-data-in-prometheus-exposition-format).
* [Arbitrary CSV data](#how-to-import-csv-data).
* Supports metrics' relabeling. See [these docs](#relabeling) for details.
* Can deal with high cardinality and high churn rate issues using [series limiter](#cardinality-limiter).
* Ideally works with big amounts of time series data from Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data and various Enterprise workloads.
* Has open source [cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
* See also technical [Articles about VictoriaMetrics](https://victoriametrics.github.io/Articles.html).
* See also technical [Articles about VictoriaMetrics](https://docs.victoriametrics.com/Articles.html).
## Operation
@@ -104,6 +105,7 @@ Alphabetically sorted links to case studies:
* [How to start VictoriaMetrics](#how-to-start-victoriametrics)
* [Environment variables](#environment-variables)
* [Configuration with snap package](#configuration-with-snap-package)
* [Prometheus setup](#prometheus-setup)
* [Grafana setup](#grafana-setup)
* [How to upgrade VictoriaMetrics](#how-to-upgrade-victoriametrics)
@@ -116,6 +118,7 @@ Alphabetically sorted links to case studies:
* [Prometheus querying API usage](#prometheus-querying-api-usage)
* [Prometheus querying API enhancements](#prometheus-querying-api-enhancements)
* [Graphite API usage](#graphite-api-usage)
* [Graphite Render API usage](#graphite-render-api-usage)
* [Graphite Metrics API usage](#graphite-metrics-api-usage)
* [Graphite Tags API usage](#graphite-tags-api-usage)
* [How to build from sources](#how-to-build-from-sources)
@@ -152,7 +155,10 @@ Alphabetically sorted links to case studies:
* [Security](#security)
* [Tuning](#tuning)
* [Monitoring](#monitoring)
* [TSDB stats](#tsdb-stats)
* [Cardinality limiter](#cardinality-limiter)
* [Troubleshooting](#troubleshooting)
* [Data migration](#data-migration)
* [Backfilling](#backfilling)
* [Data updates](#data-updates)
* [Replication](#replication)
@@ -163,11 +169,12 @@ Alphabetically sorted links to case studies:
* [Contacts](#contacts)
* [Community and contributions](#community-and-contributions)
* [Reporting bugs](#reporting-bugs)
* [Victoria Metrics Logo](#victoria-metrics-logo)
* [VictoriaMetrics Logo](#victoria-metrics-logo)
* [Logo Usage Guidelines](#logo-usage-guidelines)
* [Font used](#font-used)
* [Color Palette](#color-palette)
* [We kindly ask](#we-kindly-ask)
* [List of command-line flags](#list-of-command-line-flags)
## How to start VictoriaMetrics
@@ -180,7 +187,7 @@ The following command-line flags are used the most:
* `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory. Default path is `victoria-metrics-data` in the current working directory.
* `-retentionPeriod` - retention for stored data. Older data is automatically deleted. Default retention is 1 month. See [these docs](#retention) for more details.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see all the available flags with description and default values.
Other flags have good enough default values, so set them only if you really need this. Pass `-help` to see [all the available flags with description and default values](#list-of-command-line-flags).
See how to [ingest data to VictoriaMetrics](#how-to-import-time-series-data), how to [query VictoriaMetrics](#grafana-setup)
and how to [handle alerts](#alerting).
@@ -197,6 +204,26 @@ Each flag value can be set via environment variables according to these rules:
* For repeating flags an alternative syntax can be used by joining the different values into one using `,` char as separator (for example `-storageNode <nodeA> -storageNode <nodeB>` will translate to `storageNode=<nodeA>,<nodeB>`)
* It is possible setting prefix for environment vars with `-envflag.prefix`. For instance, if `-envflag.prefix=VM_`, then env vars must be prepended with `VM_`
### Configuration with snap package
Command-line flags can be changed with following command:
```text
echo 'FLAGS="-selfScrapeInterval=10s -search.logSlowQueryDuration=20s"' > $SNAP_DATA/var/snap/victoriametrics/current/extra_flags
snap restart victoriametrics
```
Or add needed command-line flags to the file `$SNAP_DATA/var/snap/victoriametrics/current/extra_flags`.
Note you cannot change value for `-storageDataPath` flag, for safety snap package has limited access to host system.
Changing scrape configuration is possible with text editor:
```text
vi $SNAP_DATA/var/snap/victoriametrics/current/etc/victoriametrics-scrape-config.yaml
```
After changes was made, trigger config re-read with command `curl 127.0.0.1:8248/-/reload`.
## Prometheus setup
@@ -250,8 +277,8 @@ Read more about tuning remote write for Prometheus [here](https://prometheus.io/
It is recommended upgrading Prometheus to [v2.12.0](https://github.com/prometheus/prometheus/releases) or newer, since previous versions may have issues with `remote_write`.
Take a look also at [vmagent](https://victoriametrics.github.io/vmagent.html)
and [vmalert](https://victoriametrics.github.io/vmalert.html),
Take a look also at [vmagent](https://docs.victoriametrics.com/vmagent.html)
and [vmalert](https://docs.victoriametrics.com/vmalert.html),
which can be used as faster and less resource-hungry alternative to Prometheus.
@@ -266,7 +293,7 @@ http://<victoriametrics-addr>:8428
Substitute `<victoriametrics-addr>` with the hostname or IP address of VictoriaMetrics.
Then build graphs with the created datasource using [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/)
or [MetricsQL](https://victoriametrics.github.io/MetricsQL.html). VictoriaMetrics supports [Prometheus querying API](#prometheus-querying-api-usage),
or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). VictoriaMetrics supports [Prometheus querying API](#prometheus-querying-api-usage),
which is used by Grafana.
@@ -324,7 +351,7 @@ The file pointed by `-promscrape.config` may contain `%{ENV_VAR}` placeholders,
VictoriaMetrics also supports [importing data in Prometheus exposition format](#how-to-import-data-in-prometheus-exposition-format).
See also [vmagent](https://victoriametrics.github.io/vmagent.html), which can be used as drop-in replacement for Prometheus.
See also [vmagent](https://docs.victoriametrics.com/vmagent.html), which can be used as drop-in replacement for Prometheus.
## How to send data from InfluxDB-compatible agents such as [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/)
@@ -388,6 +415,12 @@ The `/api/v1/export` endpoint should return the following response:
Note that Influx line protocol expects [timestamps in *nanoseconds* by default](https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/#timestamp),
while VictoriaMetrics stores them with *milliseconds* precision.
Extra labels may be added to all the written time series by passing `extra_label=name=value` query args.
For example, `/write?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
Some plugins for Telegraf such as [fluentd](https://github.com/fangli/fluent-plugin-influxdb), [Juniper/open-nti](https://github.com/Juniper/open-nti)
or [Juniper/jitmon](https://github.com/Juniper/jtimon) send `SHOW DATABASES` query to `/query` and expect a particular database name in the response.
Comma-separated list of expected databases can be passed to VictoriaMetrics via `-influx.databaseNames` command-line flag.
## How to send data from Graphite-compatible agents such as [StatsD](https://github.com/etsy/statsd)
@@ -425,9 +458,12 @@ The `/api/v1/export` endpoint should return the following response:
Data sent to VictoriaMetrics via `Graphite plaintext protocol` may be read via the following APIs:
* [Prometheus querying API](#prometheus-querying-api-usage)
* Metric names can be explored via [Graphite metrics API](#graphite-metrics-api-usage)
* Tags can be explored via [Graphite tags API](#graphite-tags-api-usage)
* [Graphite API](#graphite-api-usage)
* [Prometheus querying API](#prometheus-querying-api-usage). Graphite metric names may special chars such as `-`, which may clash
with [MetricsQL operations](https://docs.victoriametrics.com/MetricsQL.html). Such metrics can be queries via `{__name__="foo-bar.baz"}`.
VictoriaMetrics supports `__graphite__` pseudo-label for selecting time series with Graphite-compatible filters in [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
For example, `{__graphite__="foo.*.bar"}` is equivalent to `{__name__=~"foo[.][^.]*[.]bar"}`, but it works faster
and it is easier to use when migrating from Graphite to VictoriaMetrics.
* [go-graphite/carbonapi](https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/carbonapi.example.victoriametrics.yaml)
## How to send data from OpenTSDB-compatible agents
@@ -503,6 +539,9 @@ The `/api/v1/export` endpoint should return the following response:
{"metric":{"__name__":"x.y.z","t1":"v1","t2":"v2"},"values":[45.34],"timestamps":[1566464763000]}
```
Extra labels may be added to all the imported time series by passing `extra_label=name=value` query args.
For example, `/api/put?extra_label=foo=bar` would add `{foo="bar"}` label to all the ingested metrics.
## Prometheus querying API usage
@@ -513,42 +552,76 @@ VictoriaMetrics supports the following handlers from [Prometheus querying API](h
* [/api/v1/series](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers)
* [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
* [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats). VictoriaMetrics accepts optional `topN=N` and `date=YYYY-MM-DD`
query args for this handler, where `N` is the number of top entries to return in the response and `YYYY-MM-DD` is the date for collecting the stats.
By default top 10 entries are returned and the stats is collected for the current day.
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats). See [these docs](#tsdb-stats) for details.
* [/api/v1/targets](https://prometheus.io/docs/prometheus/latest/querying/api/#targets) - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter) for more details.
These handlers can be queried from Prometheus-compatible clients such as Grafana or curl.
All the Prometheus querying API handlers can be prepended with `/prometheus` prefix. For example, both `/prometheus/api/v1/query` and `/api/v1/query` should work.
### Prometheus querying API enhancements
Additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) VictoriaMetrics accepts relative times in `time`, `start` and `end` query args.
VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query arg, which can be used for enforcing additional label filters for queries. For example,
`/api/v1/query_range?extra_label=user_id=123&query=<query>` would automatically add `{user_id="123"}` label filter to the given `<query>`. This functionality can be used
for limiting the scope of time series visible to the given tenant. It is expected that the `extra_label` query arg is automatically set by auth proxy sitting
in front of VictoriaMetrics. See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
By default, VictoriaMetrics returns time series for the last 5 minutes from /api/v1/series, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v1/query_range` handlers. It can be used for rounding response values to the given number of digits after the decimal point. For example, `/api/v1/query?query=avg_over_time(temperature[1h])&round_digits=2` would round response values to up to two digits after the decimal point.
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details:
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
Additionally VictoriaMetrics provides the following handlers:
* `/api/v1/series/count` - it returns the total number of time series in the database. Some notes:
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
* the handler scans all the inverted index, so it can be slow if the database contains tens of millions of time series;
* the handler may count [deleted time series](#how-to-delete-time-series) additionally to normal time series due to internal implementation restrictions;
* `/api/v1/labels/count` - it returns a list of `label: values_count` entries. It can be used for determining labels with the maximum number of values.
* `/api/v1/status/active_queries` - it returns a list of currently running queries.
* `/api/v1/labels/count` - returns a list of `label: values_count` entries. It can be used for determining labels with the maximum number of values.
* `/api/v1/status/active_queries` - returns a list of currently running queries.
* `/api/v1/status/top_queries` - returns the following query lists:
* the most frequently executed queries - `topByCount`
* queries with the biggest average execution duration - `topByAvgDuration`
* queries that took the most time for execution - `topBySumDuration`
The number of returned queries can be limited via `topN` query arg. Old queries can be filtered out with `maxLifetime` query arg.
For example, request to `/api/v1/status/top_queries?topN=5&maxLifetime=30s` would return up to 5 queries per list, which were executed during the last 30 seconds.
VictoriaMetrics tracks the last `-search.queryStats.lastQueriesCount` queries with durations at least `-search.queryStats.minQueryDuration`.
## Graphite API usage
VictoriaMetrics supports the following Graphite APIs:
VictoriaMetrics supports the following Graphite APIs, which are needed for [Graphite datasource in Grafana](https://grafana.com/docs/grafana/latest/datasources/graphite/):
* Render API - see [these docs](#graphite-render-api-usage).
* Metrics API - see [these docs](#graphite-metrics-api-usage).
* Tags API - see [these docs](#graphite-tags-api-usage).
All the Graphite handlers can be pre-pended with `/graphite` prefix. For example, both `/graphite/metrics/find` and `/metrics/find` should work.
VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query arg for all the Graphite APIs. This arg can be used for limiting the scope of time series
visible to the given tenant. It is expected that the `extra_label` query arg is automatically set by auth proxy sitting in front of VictoriaMetrics.
[Contact us](mailto:sales@victoriametrics.com) if you need assistance with such a proxy.
VictoriaMetrics supports `__graphite__` pseudo-label for filtering time series with Graphite-compatible filters in [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
For example, `{__graphite__="foo.*.bar"}` is equivalent to `{__name__=~"foo[.][^.]*[.]bar"}`, but it works faster
and it is easier to use when migrating from Graphite to VictoriaMetrics.
### Graphite Render API usage
[VictoriaMetrics Enterprise](https://victoriametrics.com/enterprise.html) supports [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) subset
at `/render` endpoint, which is used by [Graphite datasource in Grafana](https://grafana.com/docs/grafana/latest/datasources/graphite/).
It supports `Storage-Step` http request header, which must be set to a step between data points stored in VictoriaMetrics when configuring Graphite datasource in Grafana.
### Graphite Metrics API usage
@@ -587,14 +660,14 @@ to your needs or when testing bugfixes.
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make victoria-metrics` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make victoria-metrics` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics` binary and puts it into the `bin` folder.
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make victoria-metrics-prod` from the root folder of the repository.
2. Run `make victoria-metrics-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics-prod` binary and puts it into the `bin` folder.
### ARM build
@@ -603,24 +676,22 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make victoria-metrics-arm` or `make victoria-metrics-arm64` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make victoria-metrics-arm` or `make victoria-metrics-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics-arm` or `victoria-metrics-arm64` binary respectively and puts it into the `bin` folder.
### Production ARM build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make victoria-metrics-arm-prod` or `make victoria-metrics-arm64-prod` from the root folder of the repository.
2. Run `make victoria-metrics-arm-prod` or `make victoria-metrics-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics-arm-prod` or `victoria-metrics-arm64-prod` binary respectively and puts it into the `bin` folder.
### Pure Go build (CGO_ENABLED=0)
`Pure Go` mode builds only Go code without [cgo](https://golang.org/cmd/cgo/) dependencies.
This is an experimental mode, which may result in a lower compression ratio and slower decompression performance.
Use it with caution!
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make victoria-metrics-pure` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make victoria-metrics-pure` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics-pure` binary and puts it into the `bin` folder.
### Building docker images
@@ -640,7 +711,7 @@ ROOT_IMAGE=scratch make package-victoria-metrics
## Start with docker-compose
[Docker-compose](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/docker-compose.yml)
helps to spin up VictoriaMetrics, [vmagent](https://victoriametrics.github.io/vmagent.html) and Grafana with one command.
helps to spin up VictoriaMetrics, [vmagent](https://docs.victoriametrics.com/vmagent.html) and Grafana with one command.
More details may be found [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#folder-contains-basic-images-and-tools-for-building-and-running-victoria-metrics-in-docker).
@@ -663,7 +734,7 @@ The page will return the following JSON response:
Snapshots are created under `<-storageDataPath>/snapshots` directory, where `<-storageDataPath>`
is the command-line flag value. Snapshots can be archived to backup storage at any time
with [vmbackup](https://victoriametrics.github.io/vmbackup.html).
with [vmbackup](https://docs.victoriametrics.com/vmbackup.html).
The `http://<victoriametrics-addr>:8428/snapshot/list` page contains the list of available snapshots.
@@ -675,7 +746,7 @@ Navigate to `http://<victoriametrics-addr>:8428/snapshot/delete_all` in order to
Steps for restoring from a snapshot:
1. Stop VictoriaMetrics with `kill -INT`.
2. Restore snapshot contents from backup with [vmrestore](https://victoriametrics.github.io/vmrestore.html)
2. Restore snapshot contents from backup with [vmrestore](https://docs.victoriametrics.com/vmrestore.html)
to the directory pointed by `-storageDataPath`.
3. Start VictoriaMetrics.
@@ -755,14 +826,13 @@ Optional `start` and `end` args may be added to the request in order to limit th
unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values.
The exported data can be imported to VictoriaMetrics via [/api/v1/import/native](#how-to-import-data-in-native-format).
The native export format may change in incompatible way between VictoriaMetrics releases, so the data exported from the release X
can fail to be imported into VictoriaMetrics release Y.
### How to export data in JSON line format
Consider [exporting data in native format](#how-to-export-data-in-native-format) if big amounts of data must be migrated between VictoriaMetrics instances,
since exporting in native format usually consumes lower amounts of CPU and memory resources, while the resulting exported data occupies lower amounts of disk space.
In order to export data in JSON line format, send a request to `http://<victoriametrics-addr>:8428/api/v1/export?match[]=<timeseries_selector_for_export>`,
Send a request to `http://<victoriametrics-addr>:8428/api/v1/export?match[]=<timeseries_selector_for_export>`,
where `<timeseries_selector_for_export>` may contain any [time series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors)
for metrics to export. Use `{__name__!=""}` selector for fetching all the time series.
The response would contain all the data for the selected time series in [JSON streaming format](https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON).
@@ -890,6 +960,8 @@ For example, `/api/v1/import?extra_label=foo=bar` would add `"foo":"bar"` label
Note that it could be required to flush response cache after importing historical data. See [these docs](#backfilling) for detail.
VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. The solution is to split too long JSON lines into smaller lines. It is OK if samples for a single time series are split among multiple JSON lines.
### How to import CSV data
@@ -947,6 +1019,7 @@ Note that it could be required to flush response cache after importing historica
### How to import data in Prometheus exposition format
VictoriaMetrics accepts data in [Prometheus exposition format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format)
and in [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md)
via `/api/v1/import/prometheus` path. For example, the following line imports a single line in Prometheus exposition format into VictoriaMetrics:
```bash
@@ -1004,7 +1077,7 @@ VictoriaMetrics provides the following extra actions for relabeling rules:
* `keep_if_equal`: keeps the entry if all label values from `source_labels` are equal.
* `drop_if_equal`: drops the entry if all the label values from `source_labels` are equal.
See also [relabeling in vmagent](https://victoriametrics.github.io/vmagent.html#relabeling).
See also [relabeling in vmagent](https://docs.victoriametrics.com/vmagent.html#relabeling).
## Federation
@@ -1036,7 +1109,8 @@ A rough estimation of the required resources for ingestion path:
* Storage space: less than a byte per data point on average. So, ~260GB is required for storing a month-long insert stream
of 100K data points per second.
The actual storage size heavily depends on data randomness (entropy). Higher randomness means higher storage size requirements.
The actual storage size heavily depends on data randomness (entropy) and the average number of samples per time series.
Higher randomness means higher storage size requirements. Lower average number of samples per time series means higher storage requirement.
Read [this article](https://medium.com/faun/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932)
for details.
@@ -1062,7 +1136,7 @@ The required resources for query path:
## High availability
* Install multiple VictoriaMetrics instances in distinct datacenters (availability zones).
* Pass addresses of these instances to [vmagent](https://victoriametrics.github.io/vmagent.html) via `-remoteWrite.url` command-line flag:
* Pass addresses of these instances to [vmagent](https://docs.victoriametrics.com/vmagent.html) via `-remoteWrite.url` command-line flag:
```bash
/path/to/vmagent -remoteWrite.url=http://<victoriametrics-addr-1>:8428/api/v1/write -remoteWrite.url=http://<victoriametrics-addr-2>:8428/api/v1/write
@@ -1087,7 +1161,7 @@ remote_write:
kill -HUP `pidof prometheus`
```
It is recommended to use [vmagent](https://victoriametrics.github.io/vmagent.html) instead of Prometheus for highly loaded setups.
It is recommended to use [vmagent](https://docs.victoriametrics.com/vmagent.html) instead of Prometheus for highly loaded setups.
* Now Prometheus should write data into all the configured `remote_write` urls in parallel.
* Set up [Promxy](https://github.com/jacksontj/promxy) in front of all the VictoriaMetrics replicas.
@@ -1108,8 +1182,8 @@ on the same time series if they fall within the same discrete 60s bucket. The e
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs.
The de-duplication reduces disk space usage if multiple identically configured Prometheus instances in HA pair
write data to the same VictoriaMetrics instance. Note that these Prometheus instances must have identical
The de-duplication reduces disk space usage if multiple identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus instances in HA pair
write data to the same VictoriaMetrics instance. These vmagent or Prometheus instances must have identical
`external_labels` section in their configs, so they write data to the same time series.
@@ -1136,9 +1210,9 @@ Just start multiple VictoriaMetrics instances with distinct values for the follo
* `-storageDataPath`, so the data for each retention period is saved in a separate directory
* `-httpListenAddr`, so clients may reach VictoriaMetrics instance with proper retention
Then set up [vmauth](https://victoriametrics.github.io/vmauth.html) in front of VictoriaMetrics instances,
Then set up [vmauth](https://docs.victoriametrics.com/vmauth.html) in front of VictoriaMetrics instances,
so it could route requests from particular user to VictoriaMetrics with the desired retention.
The same scheme could be implemented for multiple tenants in [VictoriaMetrics cluster](https://victoriametrics.github.io/Cluster-VictoriaMetrics.html).
The same scheme could be implemented for multiple tenants in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
## Downsampling
@@ -1176,7 +1250,7 @@ horizontally scalable long-term remote storage for really large Prometheus deplo
## Alerting
It is recommended using [vmalert](https://victoriametrics.github.io/vmalert.html) for alerting.
It is recommended using [vmalert](https://docs.victoriametrics.com/vmalert.html) for alerting.
Additionally, alerting can be set up with the following tools:
@@ -1201,7 +1275,7 @@ Consider setting the following command-line flags:
Explicitly set internal network interface for TCP and UDP ports for data ingestion with Graphite and OpenTSDB formats.
For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<internal_iface_ip>:2003`.
Prefer authorizing all the incoming requests from untrusted networks with [vmauth](https://victoriametrics.github.io/vmauth.html)
Prefer authorizing all the incoming requests from untrusted networks with [vmauth](https://docs.victoriametrics.com/vmauth.html)
or similar auth proxy.
@@ -1224,7 +1298,7 @@ mkfs.ext4 ... -O 64bit,huge_file,extent -T huge
## Monitoring
VictoriaMetrics exports internal metrics in Prometheus format at `/metrics` page.
These metrics may be collected by [vmagent](https://victoriametrics.github.io/vmagent.html)
These metrics may be collected by [vmagent](https://docs.victoriametrics.com/vmagent.html)
or Prometheus by adding the corresponding scrape config to it.
Alternatively they can be self-scraped by setting `-selfScrapeInterval` command-line flag to duration greater than 0.
For example, `-selfScrapeInterval=10s` would enable self-scraping of `/metrics` page with 10 seconds interval.
@@ -1232,6 +1306,8 @@ For example, `-selfScrapeInterval=10s` would enable self-scraping of `/metrics`
There are officials Grafana dashboards for [single-node VictoriaMetrics](https://grafana.com/dashboards/10229) and [clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11176).
There is also an [alternative dashboard for clustered VictoriaMetrics](https://grafana.com/grafana/dashboards/11831).
It is recommended setting up alerts in [vmalert](https://docs.victoriametrics.com/vmalert.html) or in Prometheus from [this config](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts.yml).
The most interesting metrics are:
* `vm_cache_entries{type="storage/hour_metric_ids"}` - the number of time series with new data points during the last hour
@@ -1250,16 +1326,55 @@ The most interesting metrics are:
VictoriaMetrics also exposes currently running queries with their execution times at `/api/v1/status/active_queries` page.
See the example of alerting rules for VM components [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts.yml).
## TSDB stats
VictoriaMetrics returns TSDB stats at `/api/v1/status/tsdb` page in the way similar to Prometheus - see [these Prometheus docs](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats). VictoriaMetrics accepts the following optional query args at `/api/v1/status/tsdb` page:
* `topN=N` where `N` is the number of top entries to return in the response. By default top 10 entries are returned.
* `date=YYYY-MM-DD` where `YYYY-MM-DD` is the date for collecting the stats. By default the stats is collected for the current day.
* `match[]=SELECTOR` where `SELECTOR` is an arbitrary [time series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) for series to take into account during stats calculation. By default all the series are taken into account.
* `extra_label=LABEL=VALUE`. See [these docs](#prometheus-querying-api-enhancements) for more details.
## Cardinality limiter
By default VictoriaMetrics doesn't limit the number of stored time series. The limit can be enforced by setting the following command-line flags:
* `-storage.maxHourlySeries` - limits the number of time series that can be added during the last hour. Useful for limiting the number of active time series.
* `-storage.maxDailySeries` - limits the number of time series that can be added during the last day. Useful for limiting daily churn rate.
Both limits can be set simultaneously. If any of these limits is reached, then incoming samples for new time series are dropped. A sample of dropped series is put in the log with `WARNING` level.
The exceeded limits can be [monitored](#monitoring) with the following metrics:
* `vm_hourly_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded hourly limit on the number of unique time series.
* `vm_daily_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded daily limit on the number of unique time series.
These limits are approximate, so VictoriaMetrics can underflow/overflow the limit by a small percentage (usually less than 1%).
## Troubleshooting
* It is recommended to use default command-line flag values (i.e. don't set them explicitly) until the need
of tweaking these flag values arises.
* It is recommended inspecting logs during troubleshooting, since they may contain useful information.
* It is recommended upgrading to the latest available release from [this page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases),
since the encountered issue could be already fixed there.
* It is recommended inspecting logs during troubleshooting, since they may contain useful information.
* It is recommended to have at least 50% of spare resources for CPU, disk IO and RAM, so VictoriaMetrics could handle short spikes in the workload without performance issues.
* VictoriaMetrics requires free disk space for [merging data files to bigger ones](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
It may slow down when there is no enough free space left. So make sure `-storageDataPath` directory
has at least 20% of free space. The remaining amount of free space
can be [monitored](#monitoring) via `vm_free_disk_space_bytes` metric. The total size of data
stored on the disk can be monitored via sum of `vm_data_size_bytes` metrics.
See also `vm_merge_need_free_disk_space` metrics, which are set to values higher than 0
if background merge cannot be initiated due to free disk space shortage. The value shows the number of per-month partitions,
which would start background merge if they had more free disk space.
* VictoriaMetrics buffers incoming data in memory for up to a few seconds before flushing it to persistent storage.
This may lead to the following "issues":
@@ -1270,22 +1385,16 @@ VictoriaMetrics also exposes currently running queries with their execution time
* If VictoriaMetrics works slowly and eats more than a CPU core per 100K ingested data points per second,
then it is likely you have too many active time series for the current amount of RAM.
VictoriaMetrics [exposes](#monitoring) `vm_slow_*` metrics, which could be used as an indicator of low amounts of RAM.
It is recommended increasing the amount of RAM on the node with VictoriaMetrics in order to improve
VictoriaMetrics [exposes](#monitoring) `vm_slow_*` metrics such as `vm_slow_row_inserts_total` and `vm_slow_metric_name_loads_total`, which could be used
as an indicator of low amounts of RAM. It is recommended increasing the amount of RAM on the node with VictoriaMetrics in order to improve
ingestion and query performance in this case.
* If the order of labels for the same metrics can change over time (e.g. if `metric{k1="v1",k2="v2"}` may become `metric{k2="v2",k1="v1"}`),
then it is recommended running VictoriaMetrics with `-sortLabels` command-line flag in order to reduce memory usage and CPU usage.
* VictoriaMetrics prioritizes data ingestion over data querying. So if it has no enough resources for data ingestion,
then data querying may slow down significantly.
* VictoriaMetrics requires free disk space for [merging data files to bigger ones](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
It may slow down when there is no enough free space left. So make sure `-storageDataPath` directory
has at least 20% of free space comparing to disk size. The remaining amount of free space
can be [monitored](#monitoring) via `vm_free_disk_space_bytes` metric. The total size of data
stored on the disk can be monitored via sum of `vm_data_size_bytes` metrics.
See also `vm_merge_need_free_disk_space` metrics, which are set to values higher than 0
if background merge cannot be initiated due to free disk space shortage. The value shows the number of per-month partitions,
which would start background merge if they had more free disk space.
* If VictoriaMetrics doesn't work because of certain parts are corrupted due to disk errors,
then just remove directories with broken parts. It is safe removing subdirectories under `<-storageDataPath>/data/{big,small}/YYYY_MM` directories
when VictoriaMetrics isn't running. This recovers VictoriaMetrics at the cost of data loss stored in the deleted broken parts.
@@ -1302,23 +1411,31 @@ VictoriaMetrics also exposes currently running queries with their execution time
It may be needed in order to suppress default gap filling algorithm used by VictoriaMetrics - by default it assumes
each time series is continuous instead of discrete, so it fills gaps between real samples with regular intervals.
* Metrics and labels leading to high cardinality or high churn rate can be determined at `/api/v1/status/tsdb` page.
See [these docs](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats) for details.
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
while `topN` equals to 10.
* Metrics and labels leading to high cardinality or high churn rate can be determined at `/api/v1/status/tsdb` page. See [these docs](#tsdb-stats) for details.
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
* VictoriaMetrics limits the number of labels per each metric with `-maxLabelsPerTimeseries` command-line flag.
This prevents from ingesting metrics with too many labels. It is recommended [monitoring](#monitoring) `vm_metrics_with_dropped_labels_total`
metric in order to determine whether `-maxLabelsPerTimeseries` must be adjusted for your workload.
* If you store Graphite metrics like `foo.bar.baz` in VictoriaMetrics, then `-search.treatDotsAsIsInRegexps` command-line flag could be useful.
By default `.` chars in regexps match any char. If you need matching only dots, then the `\\.` must be used in regexp filters.
When `-search.treatDotsAsIsInRegexps` option is enabled, then dots in regexps are automatically escaped in order to match only dots instead of arbitrary chars.
This may significantly increase performance when locating time series for the given label filters.
* If you store Graphite metrics like `foo.bar.baz` in VictoriaMetrics, then use `{__graphite__="foo.*.baz"}` syntax for selecting such metrics.
This expression is equivalent to `{__name__=~"foo[.][^.]*[.]baz"}`, but it works faster and it is easier to use when migrating from Graphite.
* VictoriaMetrics ignores `NaN` values during data ingestion.
## Data migration
Use [vmctl](https://docs.victoriametrics.com/vmctl.html) for data migration. It supports the following data migration types:
* From Prometheus to VictoriaMetrics
* From InfluxDB to VictoriaMetrics
* From VictoriaMetrics to VictoriaMetrics
See [vmctl docs](https://docs.victoriametrics.com/vmctl.html) for more details.
## Backfilling
VictoriaMetrics accepts historical data in arbitrary order of time via [any supported ingestion method](#how-to-import-time-series-data).
@@ -1339,7 +1456,7 @@ cache when samples with timestamps older than `now - search.cacheTimestampOffset
## Data updates
VictoriaMetrics doesn't support updating already existing sample values to new ones. It stores all the ingested data points
for the same time series with identical timestamps. While is possible substituting old time series with new time series via
for the same time series with identical timestamps. While it is possible substituting old time series with new time series via
[removal of old time series](#how-to-delete-timeseries) and then [writing new time series](#backfilling), this approach
should be used only for one-off updates. It shouldn't be used for frequent updates because of non-zero overhead related to data removal.
@@ -1347,7 +1464,7 @@ should be used only for one-off updates. It shouldn't be used for frequent updat
## Replication
Single-node VictoriaMetrics doesn't support application-level replication. Use cluster version instead.
See [these docs](https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#replication-and-data-safety) for details.
See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety) for details.
Storage-level replication may be offloaded to durable persistent storage such as [Google Cloud disks](https://cloud.google.com/compute/docs/disks#pdspecs).
@@ -1356,9 +1473,9 @@ See also [high availability docs](#high-availability) and [backup docs](#backups
## Backups
VictoriaMetrics supports backups via [vmbackup](https://victoriametrics.github.io/vmbackup.html)
and [vmrestore](https://victoriametrics.github.io/vmrestore.html) tools.
We also provide `vmbackuper` tool for paid enterprise subscribers - see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for details.
VictoriaMetrics supports backups via [vmbackup](https://docs.victoriametrics.com/vmbackup.html)
and [vmrestore](https://docs.victoriametrics.com/vmrestore.html) tools.
We also provide `vmbackupmanager` tool for paid enterprise subscribers - see [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for details.
## Profiling
@@ -1386,14 +1503,14 @@ The collected profiles may be analyzed with [go tool pprof](https://github.com/g
* [Helm charts for single-node and cluster versions of VictoriaMetrics](https://github.com/VictoriaMetrics/helm-charts).
* [Kubernetes operator for VictoriaMetrics](https://github.com/VictoriaMetrics/operator).
* [vmctl tool for data migration to VictoriaMetrics](https://github.com/VictoriaMetrics/vmctl).
* [netdata](https://github.com/netdata/netdata) can push data into VictoriaMetrics via `Prometheus remote_write API`.
See [these docs](https://github.com/netdata/netdata#integrations).
* [go-graphite/carbonapi](https://github.com/go-graphite/carbonapi) can use VictoriaMetrics as time series backend.
See [this example](https://github.com/go-graphite/carbonapi/blob/master/cmd/carbonapi/carbonapi.example.prometheus.yaml).
See [this example](https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/carbonapi.example.victoriametrics.yaml).
* [Ansible role for installing single-node VictoriaMetrics](https://github.com/dreamteam-gg/ansible-victoriametrics-role).
* [Ansible role for installing cluster VictoriaMetrics](https://github.com/Slapper/ansible-victoriametrics-cluster-role).
* [Snap package for VictoriaMetrics](https://snapcraft.io/victoriametrics).
* [vmalert-cli](https://github.com/aorfanos/vmalert-cli) - a CLI application for managing [vmalert](https://docs.victoriametrics.com/vmalert.html).
## Third-party contributions
@@ -1441,7 +1558,7 @@ Adhering `KISS` principle simplifies the resulting code and architecture, so it
Report bugs and propose new features [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
## Victoria Metrics Logo
## VictoriaMetrics Logo
[Zip](VM_logo.zip) contains three folders with different image orientations (main color and inverted version).
@@ -1469,3 +1586,260 @@ Files included in each folder:
* There should be sufficient clear space around the logo.
* Do not change spacing, alignment, or relative locations of the design elements.
* Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
## List of command-line flags
Pass `-help` to VictoriaMetrics in order to see the list of supported command-line flags with their description:
```
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dedup.minScrapeInterval duration
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
-dryRun
Whether to check only -promscrape.config and then exit. Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-finalMergeDelay duration
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
-forceFlushAuthKey string
authKey, which must be passed in query string to /internal/force_flush pages
-forceMergeAuthKey string
authKey, which must be passed in query string to /internal/force_merge pages
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports an array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-logNewSeries
Whether to log new series. This option is for debug purposes only. It can lead to performance issues when big number of new series are ingested into VictoriaMetrics
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-maxLabelsPerTimeseries int
The maximum number of labels accepted per time series. Superfluous labels are dropped (default 30)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-precisionBits int
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
-promscrape.cluster.memberNum int
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://docs.victoriametrics.com/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consul.waitTime duration
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://docs.victoriametrics.com/#relabeling for details
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
-search.cacheTimestampOffset duration
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.latencyOffset duration
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
-search.maxExportDuration duration
The maximum duration for /api/v1/export call (default 720h0m0s)
-search.maxLookback duration
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values bigger than the horizontal resolution of the graph (default 30000)
-search.maxQueryDuration duration
The maximum duration for query execution (default 30s)
-search.maxQueryLen size
The maximum search query length in bytes
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
-search.maxQueueDuration duration
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
-search.maxStalenessInterval duration
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
-search.maxStatusRequestDuration duration
The maximum duration for /api/v1/status/* requests (default 5m0s)
-search.maxStepForPointsAdjustment duration
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
-search.maxTagKeys int
The maximum number of tag keys returned from /api/v1/labels (default 100000)
-search.maxTagValueSuffixesPerSearch int
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
-search.maxTagValues int
The maximum number of tag values returned from /api/v1/label/<label_name>/values (default 100000)
-search.maxUniqueTimeseries int
The maximum number of unique time series each search can scan (default 300000)
-search.minStalenessInterval duration
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
-search.queryStats.lastQueriesCount int
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
-search.queryStats.minQueryDuration duration
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats
-search.resetCacheAuthKey string
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
-search.treatDotsAsIsInRegexps
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
-selfScrapeInstance string
Value for 'instance' label, which is added to self-scraped metrics (default "self")
-selfScrapeInterval duration
Interval for self-scraping own metrics at /metrics page
-selfScrapeJob string
Value for 'job' label, which is added to self-scraped metrics (default "victoria-metrics")
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-sortLabels
Whether to sort labels for incoming samples before writing them to storage. This may be needed for reducing memory usage at storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}. Enabled sorting for labels can slow down ingestion performance a bit
-storage.maxDailySeries int
The maximum number of unique series can be added to the storage during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -storage.maxHourlySeries
-storage.maxHourlySeries int
The maximum number of unique series can be added to the storage during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -storage.maxDailySeries
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```

Binary file not shown.

View File

@@ -3,10 +3,8 @@ package main
import (
"flag"
"fmt"
"io"
"net/http"
"os"
"path"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert"
@@ -15,6 +13,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -35,6 +34,7 @@ var (
func main() {
// Write flags and help message to stdout, since it is easier to grep or pipe.
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
buildinfo.Init()
logger.Init()
@@ -84,14 +84,19 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
fmt.Fprintf(w, "<h2>Single-node VictoriaMetrics.</h2></br>")
fmt.Fprintf(w, "See docs at <a href='https://victoriametrics.github.io/'>https://victoriametrics.github.io/</a></br>")
fmt.Fprintf(w, "Useful endpoints: </br>")
writeAPIHelp(w, [][]string{
if r.Method != "GET" {
return false
}
fmt.Fprintf(w, "<h2>Single-node VictoriaMetrics</h2></br>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/'>https://docs.victoriametrics.com/</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"/targets", "discovered targets list"},
{"/api/v1/targets", "advanced information about discovered targets in JSON format"},
{"/metrics", "available service metrics"},
{"/api/v1/status/tsdb", "tsdb status page"},
{"/api/v1/status/top_queries", "top queries"},
{"/api/v1/status/active_queries", "active queries"},
})
return true
}
@@ -107,11 +112,11 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return false
}
func writeAPIHelp(w io.Writer, pathList [][]string) {
pathPrefix := httpserver.GetPathPrefix()
for _, p := range pathList {
p, doc := p[0], p[1]
p = path.Join(pathPrefix, p)
fmt.Fprintf(w, "<a href='%s'>%q</a> - %s<br/>", p, p, doc)
}
func usage() {
const s = `
victoria-metrics is a time series database and monitoring solution.
See the docs at https://docs.victoriametrics.com/
`
flagutil.Usage(s)
}

View File

@@ -22,7 +22,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -58,6 +57,7 @@ var (
type test struct {
Name string `json:"name"`
Data []string `json:"data"`
InsertQuery string `json:"insert_query"`
Query []string `json:"query"`
ResultMetrics []Metric `json:"result_metrics"`
ResultSeries Series `json:"result_series"`
@@ -148,7 +148,7 @@ func setUp() {
}
func processFlags() {
envflag.Parse()
flag.Parse()
for _, fv := range []struct {
flag string
value string
@@ -209,7 +209,7 @@ func testWrite(t *testing.T) {
t.Errorf("error compressing %v %s", r, err)
t.Fail()
}
httpWrite(t, testPromWriteHTTPPath, bytes.NewBuffer(data))
httpWrite(t, testPromWriteHTTPPath, test.InsertQuery, bytes.NewBuffer(data))
}
})
@@ -218,7 +218,7 @@ func testWrite(t *testing.T) {
test := x
t.Run(test.Name, func(t *testing.T) {
t.Parallel()
httpWrite(t, testWriteHTTPPath, bytes.NewBufferString(strings.Join(test.Data, "\n")))
httpWrite(t, testWriteHTTPPath, test.InsertQuery, bytes.NewBufferString(strings.Join(test.Data, "\n")))
})
}
})
@@ -246,7 +246,7 @@ func testWrite(t *testing.T) {
t.Run(test.Name, func(t *testing.T) {
t.Parallel()
logger.Infof("writing %s", test.Data)
httpWrite(t, testOpenTSDBWriteHTTPPath, bytes.NewBufferString(strings.Join(test.Data, "\n")))
httpWrite(t, testOpenTSDBWriteHTTPPath, test.InsertQuery, bytes.NewBufferString(strings.Join(test.Data, "\n")))
})
}
})
@@ -324,10 +324,10 @@ func readIn(readFor string, t *testing.T, insertTime time.Time) []test {
return tt
}
func httpWrite(t *testing.T, address string, r io.Reader) {
func httpWrite(t *testing.T, address, query string, r io.Reader) {
t.Helper()
s := newSuite(t)
resp, err := http.Post(address, "", r)
resp, err := http.Post(address+query, "", r)
s.noError(err)
s.noError(resp.Body.Close())
s.equalInt(resp.StatusCode, 204)

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 8428
ENTRYPOINT ["/victoria-metrics-prod"]
ARG TARGETARCH
COPY victoria-metrics-${TARGETARCH}-prod ./victoria-metrics-prod

View File

@@ -11,6 +11,6 @@
"status":"success",
"data":{"resultType":"matrix",
"result":[
{"metric":{"item":"y"},"values":[["{TIME_S-1m}","0.5"],["{TIME_S}","0.5"]]}
{"metric":{"item":"y"},"values":[["{TIME_S-1m}","0.5"]]}
]}}
}

View File

@@ -0,0 +1,17 @@
{
"name": "graphite-selector",
"issue": "",
"data": [
"graphite-selector.bar.baz 1 {TIME_S-1m}",
"graphite-selector.xxx.yy 2 {TIME_S-1m}",
"graphite-selector.bb.cc 3 {TIME_S-1m}",
"graphite-selector.a.baz 4 {TIME_S-1m}"],
"query": ["/api/v1/query?query=sort({__graphite__='graphite-selector.*.baz'})&time={TIME_S-1m}"],
"result_query": {
"status":"success",
"data":{"resultType":"vector","result":[
{"metric":{"__name__":"graphite-selector.bar.baz"},"value":["{TIME_S-1m}","1"]},
{"metric":{"__name__":"graphite-selector.a.baz"},"value":["{TIME_S-1m}","4"]}
]}
}
}

View File

@@ -0,0 +1,16 @@
{
"name": "name-plus-negative-filter",
"issue": "",
"data": [
"name-plus-negative-filter;foo=123 1 {TIME_S-1m}",
"name-plus-negative-filter;bar=123 2 {TIME_S-1m}",
"name-plus-negative-filter;foo=qwe 3 {TIME_S-1m}"
],
"query": ["/api/v1/query?query={__name__='name-plus-negative-filter',foo!='123'}&time={TIME_S-1m}"],
"result_query": {
"status":"success",
"data":{"resultType":"vector","result":[
{"metric":{"__name__":"name-plus-negative-filter","foo":"qwe"},"value":["{TIME_S-1m}","3"]}
]}
}
}

View File

@@ -13,6 +13,6 @@
"data":{"resultType":"matrix",
"result":[
{"metric":{"__name__":"not_nan_as_missing_data","item":"x"},"values":[["{TIME_S-2m}","2"]]},
{"metric":{"__name__":"not_nan_as_missing_data","item":"y"},"values":[["{TIME_S-2m}","4"],["{TIME_S-1m}","3"],["{TIME_S}","3"]]}
{"metric":{"__name__":"not_nan_as_missing_data","item":"y"},"values":[["{TIME_S-2m}","4"],["{TIME_S-1m}","3"]]}
]}}
}

View File

@@ -0,0 +1,10 @@
{
"name": "insert_with_extra_labels",
"data": ["measurement,tag1=value1,tag2=value2 field6=1.23,field5=123 {TIME_NS}"],
"insert_query": "?extra_label=job=test&extra_label=tag2=value10",
"query": ["/api/v1/export?match={__name__!=''}"],
"result_metrics": [
{"metric":{"__name__":"measurement_field5","tag1":"value1","job": "test","tag2":"value10"},"values":[123], "timestamps": ["{TIME_MS}"]},
{"metric":{"__name__":"measurement_field6","tag1":"value1","job": "test","tag2":"value10"},"values":[1.23], "timestamps": ["{TIME_MS}"]}
]
}

View File

@@ -0,0 +1,9 @@
{
"name": "insert_with_extra_labels",
"data": ["{\"metric\": \"opentsdbhttp.foobar\", \"value\": 1001, \"timestamp\": {TIME_S}, \"tags\": {\"bar\":\"baz\", \"x\": \"y\"}}"],
"insert_query": "?extra_label=job=open-test&extra_label=x=z",
"query": ["/api/v1/export?match={__name__!=''}"],
"result_metrics": [
{"metric":{"__name__":"opentsdbhttp.foobar","bar":"baz","x":"z","job": "open-test"},"values":[1001], "timestamps": ["{TIME_MSZ}"]}
]
}

View File

@@ -0,0 +1,9 @@
{
"name": "basic_insertion_with_extra_labels",
"insert_query": "?extra_label=job=prom-test&extra_label=baz=bar",
"data": ["[{\"labels\":[{\"name\":\"__name__\",\"value\":\"prometheus.foobar\"},{\"name\":\"baz\",\"value\":\"qux\"}],\"samples\":[{\"value\":100000,\"timestamp\":\"{TIME_MS}\"}]}]"],
"query": ["/api/v1/export?match={__name__!=''}"],
"result_metrics": [
{"metric":{"__name__":"prometheus.foobar","baz":"bar","job": "prom-test"},"values":[100000], "timestamps": ["{TIME_MS}"]}
]
}

View File

@@ -0,0 +1,8 @@
{
"name": "basic_select_with_extra_labels",
"data": ["[{\"labels\":[{\"name\":\"__name__\",\"value\":\"prometheus.tenant.limits\"},{\"name\":\"baz\",\"value\":\"qux\"},{\"name\":\"tenant\",\"value\":\"dev\"}],\"samples\":[{\"value\":100000,\"timestamp\":\"{TIME_MS}\"}]},{\"labels\":[{\"name\":\"__name__\",\"value\":\"prometheus.up\"},{\"name\":\"baz\",\"value\":\"qux\"}],\"samples\":[{\"value\":100000,\"timestamp\":\"{TIME_MS}\"}]}]"],
"query": ["/api/v1/export?match={__name__!=''}&extra_label=tenant=dev"],
"result_metrics": [
{"metric":{"__name__":"prometheus.tenant.limits","baz":"qux","tenant": "dev"},"values":[100000], "timestamps": ["{TIME_MS}"]}
]
}

View File

@@ -78,3 +78,9 @@ vmagent-local-with-goarch:
vmagent-pure:
APP_NAME=vmagent $(MAKE) app-local-pure
vmagent-windows-amd64:
GOARCH=amd64 APP_NAME=vmagent $(MAKE) app-local-windows-with-goarch
vmagent-windows-amd64-prod:
APP_NAME=vmagent $(MAKE) app-via-docker-windows-amd64

View File

@@ -1,48 +1,51 @@
## vmagent
# vmagent
`vmagent` is a tiny but brave agent, which helps you collect metrics from various sources
and stores them in [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)
or any other Prometheus-compatible storage system that supports the `remote_write` protocol.
`vmagent` is a tiny but mighty agent which helps you collect metrics from various sources
and store them in [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)
or any other Prometheus-compatible storage systems that support the `remote_write` protocol.
<img alt="vmagent" src="vmagent.png">
### Motivation
## Motivation
While VictoriaMetrics provides an efficient solution to store and observe metrics, our users needed something fast
and RAM friendly to scrape metrics from Prometheus-compatible exporters to VictoriaMetrics.
Also, we found that users infrastructure are snowflakes - no two are alike, and we decided to add more flexibility
to `vmagent` (like the ability to push metrics instead of pulling them). We did our best and plan to do even more.
and RAM friendly to scrape metrics from Prometheus-compatible exporters into VictoriaMetrics.
Also, we found that our user's infrastructure are like snowflakes in that no two are alike. Therefore we decided to add more flexibility
to `vmagent` such as the ability to push metrics instead of pulling them. We did our best and will continue to improve vmagent.
### Features
## Features
* Can be used as drop-in replacement for Prometheus for scraping targets such as [node_exporter](https://github.com/prometheus/node_exporter).
* Can be used as a drop-in replacement for Prometheus for scraping targets such as [node_exporter](https://github.com/prometheus/node_exporter).
See [Quick Start](#quick-start) for details.
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
* Accepts data via all the ingestion protocols supported by VictoriaMetrics:
* Influx line protocol via `http://<vmagent>:8429/write`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf).
* Graphite plaintext protocol if `-graphiteListenAddr` command-line flag is set. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-graphite-compatible-agents-such-as-statsd).
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-opentsdb-compatible-agents).
* Accepts data via all ingestion protocols supported by VictoriaMetrics:
* Influx line protocol via `http://<vmagent>:8429/write`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf).
* Graphite plaintext protocol if `-graphiteListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-graphite-compatible-agents-such-as-statsd).
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-opentsdb-compatible-agents).
* Prometheus remote write protocol via `http://<vmagent>:8429/api/v1/write`.
* JSON lines import protocol via `http://<vmagent>:8429/api/v1/import`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-json-line-format).
* Native data import protocol via `http://<vmagent>:8429/api/v1/import/native`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-native-format).
* Data in Prometheus exposition format. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format) for details.
* Arbitrary CSV data via `http://<vmagent>:8429/api/v1/import/csv`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-csv-data).
* JSON lines import protocol via `http://<vmagent>:8429/api/v1/import`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-json-line-format).
* Native data import protocol via `http://<vmagent>:8429/api/v1/import/native`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-native-format).
* Data in Prometheus exposition format. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format) for details.
* Arbitrary CSV data via `http://<vmagent>:8429/api/v1/import/csv`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-csv-data).
* Can replicate collected metrics simultaneously to multiple remote storage systems.
* Works in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as connection
to remote storage is recovered. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
* Uses lower amounts of RAM, CPU, disk IO and network bandwidth compared to Prometheus.
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
* Uses lower amounts of RAM, CPU, disk IO and network bandwidth compared with Prometheus.
* Scrape targets can be spread among multiple `vmagent` instances when big number of targets must be scraped. See [these docs](#scraping-big-number-of-targets).
* Can efficiently scrape targets that expose millions of time series such as [/federate endpoint in Prometheus](https://prometheus.io/docs/prometheus/latest/federation/). See [these docs](#stream-parsing-mode).
* Can deal with high cardinality and high churn rate issues by limiting the number of unique time series sent to remote storage systems. See [these docs](#cardinality-limiter).
### Quick Start
## Quick Start
Just download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases), unpack it
and pass the following flags to `vmagent` binary in order to start scraping Prometheus targets:
Please download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases), unpack it
and configure the following flags to the `vmagent` binary in order to start scraping Prometheus targets:
* `-promscrape.config` with the path to Prometheus config file (it is usually located at `/etc/prometheus/prometheus.yml`)
* `-remoteWrite.url` with the remote storage endpoint such as VictoriaMetrics. The `-remoteWrite.url` argument can be specified multiple times in order to replicate data concurrently to an arbitrary number of remote storage systems.
* `-promscrape.config` with the path to Prometheus config file (usually located at `/etc/prometheus/prometheus.yml`)
* `-remoteWrite.url` with the remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified multiple times to replicate data concurrently to an arbitrary number of remote storage systems.
Example command line:
@@ -50,20 +53,20 @@ Example command line:
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
If you only need to collect Influx data, then the following is sufficient:
If you only need to collect Influx data, then the following command is sufficient:
```
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
Then send Influx data to `http://vmagent-host:8429`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf) for more details.
Then send Influx data to `http://vmagent-host:8429`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf) for more details.
`vmagent` is also available in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags).
Pass `-help` to `vmagent` in order to see the full list of supported command-line flags with their descriptions.
Pass `-help` to `vmagent` in order to see [the full list of supported command-line flags with their descriptions](#advanced-usage).
### Configuration update
## Configuration update
`vmagent` should be restarted in order to update config options set via command-line args.
@@ -79,141 +82,149 @@ Pass `-help` to `vmagent` in order to see the full list of supported command-lin
There is also `-promscrape.configCheckInterval` command-line option, which can be used for automatic reloading configs from updated `-promscrape.config` file.
### Use cases
## Use cases
#### IoT and Edge monitoring
### IoT and Edge monitoring
`vmagent` can run and collect metrics in IoT and industrial networks with unreliable or scheduled connections to the remote storage.
`vmagent` can run and collect metrics in IoT and industrial networks with unreliable or scheduled connections to their remote storage.
It buffers the collected data in local files until the connection to remote storage becomes available and then sends the buffered
data to the remote storage. It re-tries sending the data to remote storage on any errors.
data to the remote storage. It re-tries sending the data to remote storage until any errors are resolved.
The maximum buffer size can be limited with `-remoteWrite.maxDiskUsagePerURL`.
`vmagent` works on various architectures from IoT world - 32-bit arm, 64-bit arm, ppc64, 386, amd64.
`vmagent` works on various architectures from the IoT world - 32-bit arm, 64-bit arm, ppc64, 386, amd64.
See [the corresponding Makefile rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmagent/Makefile) for details.
#### Drop-in replacement for Prometheus
### Drop-in replacement for Prometheus
If you use Prometheus only for scraping metrics from various targets and forwarding these metrics to remote storage,
then `vmagent` can replace such Prometheus setup. Usually `vmagent` requires lower amounts of RAM, CPU and network bandwidth comparing to Prometheus for such a setup.
If you use Prometheus only for scraping metrics from various targets and forwarding those metrics to remote storage
then `vmagent` can replace Prometheus. Typically, `vmagent` requires lower amounts of RAM, CPU and network bandwidth compared with Prometheus.
See [these docs](#how-to-collect-metrics-in-prometheus-format) for details.
#### Replication and high availability
### Replication and high availability
`vmagent` replicates the collected metrics among multiple remote storage instances configured via `-remoteWrite.url` args.
If a single remote storage instance temporarily is out of service, then the collected data remains available in another remote storage instances.
`vmagent` buffers the collected data in files at `-remoteWrite.tmpDataPath` until the remote storage becomes available again.
Then it sends the buffered data to the remote storage in order to prevent data gaps in the remote storage.
If a single remote storage instance temporarily is out of service, then the collected data remains available in another remote storage instance.
`vmagent` buffers the collected data in files at `-remoteWrite.tmpDataPath` until the remote storage becomes available again and then it sends the buffered data to the remote storage in order to prevent data gaps.
#### Relabeling and filtering
### Relabeling and filtering
`vmagent` can add, remove or update labels on the collected data before sending it to remote storage. Additionally,
`vmagent` can add, remove or update labels on the collected data before sending it to the remote storage. Additionally,
it can remove unwanted samples via Prometheus-like relabeling before sending the collected data to remote storage.
See [these docs](#relabeling) for details.
Please see [these docs](#relabeling) for details.
#### Splitting data streams among multiple systems
### Splitting data streams among multiple systems
`vmagent` supports splitting the collected data between muliple destinations with the help of `-remoteWrite.urlRelabelConfig`,
which is applied independently for each configured `-remoteWrite.url` destination. For instance, it is possible to replicate or split
data among long-term remote storage, short-term remote storage and real-time analytical system [built on top of Kafka](https://github.com/Telefonica/prometheus-kafka-adapter).
Note that each destination can receive its own subset of the collected data thanks to per-destination relabeling via `-remoteWrite.urlRelabelConfig`.
which is applied independently for each configured `-remoteWrite.url` destination. For example, it is possible to replicate or split
data among long-term remote storage, short-term remote storage and a real-time analytical system [built on top of Kafka](https://github.com/Telefonica/prometheus-kafka-adapter).
Note that each destination can receive it's own subset of the collected data due to per-destination relabeling via `-remoteWrite.urlRelabelConfig`.
#### Prometheus remote_write proxy
### Prometheus remote_write proxy
`vmagent` may be used as a proxy for Prometheus data sent via Prometheus `remote_write` protocol. It can accept data via `remote_write` API
at `/api/v1/write` endpoint, apply relabeling and filtering and then proxy it to another `remote_write` systems.
`vmagent` can be used as a proxy for Prometheus data sent via Prometheus `remote_write` protocol. It can accept data via the `remote_write` API
at the`/api/v1/write` endpoint. Then apply relabeling and filtering and proxy it to another `remote_write` system .
The `vmagent` can be configured to encrypt the incoming `remote_write` requests with `-tls*` command-line flags.
Additionally, Basic Auth can be enabled for the incoming `remote_write` requests with `-httpAuth.*` command-line flags.
Also, Basic Auth can be enabled for the incoming `remote_write` requests with `-httpAuth.*` command-line flags.
### remote_write for clustered version
### How to collect metrics in Prometheus format
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<customer-id>/prometheus/api/v1/write`
Pass the path to `prometheus.yml` to `-promscrape.config` command-line flag. `vmagent` takes into account the following
## How to collect metrics in Prometheus format
Specify the path to `prometheus.yml` file via `-promscrape.config` command-line flag. `vmagent` takes into account the following
sections from [Prometheus config file](https://prometheus.io/docs/prometheus/latest/configuration/configuration/):
* `global`
* `scrape_configs`
All the other sections are ignored, including [remote_write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) section.
Use `-remoteWrite.*` command-line flags instead for configuring remote write settings.
All other sections are ignored, including the [remote_write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) section.
Use `-remoteWrite.*` command-line flag instead for configuring remote write settings.
The following scrape types in [scrape_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) section are supported:
* `static_configs` - for scraping statically defined targets. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#static_config) for details.
* `file_sd_configs` - for scraping targets defined in external files aka file-based service discover.
See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config) for details.
* `static_configs` - is for scraping statically defined targets. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#static_config) for details.
* `file_sd_configs` - is for scraping targets defined in external files (aka file-based service discover).
See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config) for details
* `kubernetes_sd_configs` - for scraping targets in Kubernetes (k8s).
See [kubernetes_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) for details.
* `ec2_sd_configs` - for scraping targets in Amazon EC2.
* `ec2_sd_configs` - is for scraping targets in Amazon EC2.
See [ec2_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config) for details.
`vmagent` doesn't support `profile` config param and aws credentials file yet.
* `gce_sd_configs` - for scraping targets in Google Compute Engine (GCE).
`vmagent` doesn't support the `profile` config param yet.
* `gce_sd_configs` - is for scraping targets in Google Compute Engine (GCE).
See [gce_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config) for details.
`vmagent` provides the following additional functionality for `gce_sd_config`:
* if `project` arg is missing, then `vmagent` uses the project for the instance where it runs;
* if `zone` arg is missing, then `vmagent` uses the zone for the instance where it runs;
* if `zone` arg equals to `"*"`, then `vmagent` discovers all the zones for the given project;
* `zone` may contain arbitrary number of zones, i.e. `zone: [us-east1-a, us-east1-b]`.
* `consul_sd_configs` - for scraping targets registered in Consul.
* if `project` arg is missing then `vmagent` uses the project for the instance where it runs;
* if `zone` arg is missing then `vmagent` uses the zone for the instance where it runs;
* if `zone` arg is equal to `"*"`, then `vmagent` discovers all the zones for the given project;
* `zone` may contain an arbitrary number of zones, i.e. `zone: [us-east1-a, us-east1-b]`.
* `consul_sd_configs` - is for scraping the targets registered in Consul.
See [consul_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config) for details.
* `dns_sd_configs` - for scraping targets discovered from DNS records (SRV, A and AAAA).
* `dns_sd_configs` - is for scraping targets discovered from DNS records (SRV, A and AAAA).
See [dns_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config) for details.
* `openstack_sd_configs` - for scraping OpenStack targets.
* `openstack_sd_configs` - is for scraping OpenStack targets.
See [openstack_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config) for details.
[OpenStack identity API v3](https://docs.openstack.org/api-ref/identity/v3/) is supported only.
* `dockerswarm_sd_configs` - for scraping Docker Swarm targets.
* `dockerswarm_sd_configs` - is for scraping Docker Swarm targets.
See [dockerswarm_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config) for details.
* `eureka_sd_configs` - for scraping targets registered in [Netflix Eureka](https://github.com/Netflix/eureka).
* `eureka_sd_configs` - is for scraping targets registered in [Netflix Eureka](https://github.com/Netflix/eureka).
See [eureka_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config) for details.
File feature requests at [our issue tracker](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you need other service discovery mechanisms to be supported by `vmagent`.
Please file feature requests to [our issue tracker](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you need other service discovery mechanisms to be supported by `vmagent`.
`vmagent` also support the following additional options in `scrape_config` section:
`vmagent` also support the following additional options in `scrape_configs` section:
* `disable_compression: true` - for disabling response compression on a per-job basis. By default `vmagent` requests compressed responses from scrape targets
in order to save network bandwidth.
* `disable_keepalive: true` - for disabling [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
By default `vmagent` uses keep-alive connections to scrape targets in order to reduce overhead on connection re-establishing.
* `disable_compression: true` - to disable response compression on a per-job basis. By default `vmagent` requests compressed responses from scrape targets
to save network bandwidth.
* `disable_keepalive: true` - to disable [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
By default, `vmagent` uses keep-alive connections to scrape targets to reduce overhead on connection re-establishing.
* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics. See [these docs](#stream-parsing-mode).
Note that `vmagent` doesn't support `refresh_interval` option these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
Note that `vmagent` doesn't support `refresh_interval` option for these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
command-line flag instead. For example, `-promscrape.consulSDCheckInterval=60s` sets `refresh_interval` for all the `consul_sd_configs`
entries to 60s. Run `vmagent -help` in order to see default values for `-promscrape.*CheckInterval` flags.
entries to 60s. Run `vmagent -help` in order to see default values for the `-promscrape.*CheckInterval` flags.
The file pointed by `-promscrape.config` may contain `%{ENV_VAR}` placeholders, which are substituted by the corresponding `ENV_VAR` environment variable values.
The file pointed by `-promscrape.config` may contain `%{ENV_VAR}` placeholders which are substituted by the corresponding `ENV_VAR` environment variable values.
### Adding labels to metrics
## Adding labels to metrics
Labels can be added to metrics via the following mechanisms:
Labels can be added to metrics by the following mechanisms:
* Via `global -> external_labels` section in `-promscrape.config` file. These labels are added only to metrics scraped from targets configured in `-promscrape.config` file.
* Via `-remoteWrite.label` command-line flag. These labels are added to all the collected metrics before sending them to `-remoteWrite.url`.
* The `global -> external_labels` section in `-promscrape.config` file. These labels are added only to metrics scraped from targets configured in the `-promscrape.config` file. They aren't added to metrics collected via other [data ingestion protocols](https://docs.victoriametrics.com/#how-to-import-time-series-data).
* The `-remoteWrite.label` command-line flag. These labels are added to all the collected metrics before sending them to `-remoteWrite.url`. For example, the following command will start `vmagent`, which will add `{datacenter="foobar"}` label to all the metrics pushed to all the configured remote storage systems (all the `-remoteWrite.url` flag values):
```
/path/to/vmagent -remoteWrite.label=datacenter=foobar ...
```
### Relabeling
## Relabeling
`vmagent` supports [Prometheus relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config).
Additionally it provides the following extra actions:
and also provides the following actions:
* `replace_all`: replaces all the occurences of `regex` in the values of `source_labels` with the `replacement` and stores the result in the `target_label`.
* `labelmap_all`: replaces all the occurences of `regex` in all the label names with the `replacement`.
* `keep_if_equal`: keeps the entry if all label values from `source_labels` are equal.
* `replace_all`: replaces all of the occurences of `regex` in the values of `source_labels` with the `replacement` and stores the results in the `target_label`.
* `labelmap_all`: replaces all of the occurences of `regex` in all the label names with the `replacement`.
* `keep_if_equal`: keeps the entry if all the label values from `source_labels` are equal.
* `drop_if_equal`: drops the entry if all the label values from `source_labels` are equal.
The relabeling can be defined in the following places:
* At `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels.
* At `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`.
* At `-remoteWrite.relabelConfig` file. This relabeling is aplied to all the collected metrics before sending them to remote storage.
* At `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`.
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels.
* At the `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`.
* At the `-remoteWrite.relabelConfig` file. This relabeling is aplied to all the collected metrics before sending them to remote storage.
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`.
Read more about relabeling in the following articles:
You can read more about relabeling in the following articles:
* [How to use Relabeling in Prometheus and VictoriaMetrics](https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2)
* [Life of a label](https://www.robustperception.io/life-of-a-label)
@@ -223,45 +234,9 @@ Read more about relabeling in the following articles:
* [relabel_configs vs metric_relabel_configs](https://www.robustperception.io/relabel_configs-vs-metric_relabel_configs)
### Monitoring
## Stream parsing mode
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. It is recommended setting up regular scraping of this page
either via `vmagent` itself or via Prometheus, so the exported metrics could be analyzed later.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
If you have suggestions, improvements or found a bug - feel free to open an issue on github or add review to the dashboard.
`vmagent` also exports target statuses at the following handlers:
* `http://vmagent-host:8429/targets`. This handler returns human-readable plaintext status for every active target.
This page is convenient to query from command line with `wget`, `curl` or similar tools.
It accepts optional `show_original_labels=1` query arg, which shows the original labels per each target before applying relabeling.
This information may be useful for debugging target relabeling.
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
* `http://vmagent-host:8429/ready`. This handler returns http 200 status code when `vmagent` finishes initialization for all service_discovery configs.
It may be useful for performing `vmagent` rolling update without scrape loss.
### Troubleshooting
* It is recommended [setting up the official Grafana dashboard](#monitoring) in order to monitor `vmagent` state.
* It is recommended increasing the maximum number of open files in the system (`ulimit -n`) when scraping big number of targets,
since `vmagent` establishes at least a single TCP connection per each target.
* When `vmagent` scrapes many unreliable targets, it can flood error log with scrape errors. These errors can be suppressed
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
and `http://vmagent-host:8429/api/v1/targets`.
* The `/api/v1/targets` page could be useful for debugging relabeling process for scrape targets.
This page contains original labels for targets dropped during relabeling (see "droppedTargets" section in the page output). By default up to `-promscrape.maxDroppedTargets` targets are shown here. If your setup drops more targets during relabeling, then increase `-promscrape.maxDroppedTargets` command-line flag value in order to see all the dropped targets. Note that tracking each dropped target requires up to 10Kb of RAM, so big values for `-promscrape.maxDroppedTargets` may result in increased memory usage if big number of scrape targets are dropped during relabeling.
* If `vmagent` scrapes big number of targets, then `-promscrape.dropOriginalLabels` command-line option may be passed to `vmagent` in order to reduce memory usage.
This option drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page, which may result in reduced debuggability for improperly configured per-target relabeling.
* If `vmagent` scrapes targets with millions of metrics per each target (for instance, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
then it is recommended enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all the scrape targets
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
By default `vmagent` reads the full response from scrape target into memory, then parses it, applies [relabeling](#relabeling) and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode. When this mode is enabled, then `vmagent` reads response from scrape target in chunks, then immediately processes every chunk and pushes the processed metrics to remote storage. This allows saving memory when scraping targets that expose millions of metrics. Stream parsing mode may be enabled either globally for all of the scrape targets by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
```yml
scrape_configs:
@@ -277,33 +252,176 @@ It may be useful for performing `vmagent` rolling update without scrape loss.
'match[]': ['{__name__!=""}']
```
Note that `sample_limit` option doesn't work if stream parsing is enabled, since the parsed data is pushed to remote storage as soon as it is parsed. So `sample_limit` option
has no sense during stream parsing.
Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option doesn't make sense during stream parsing.
* It is recommended to increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page constantly grows.
* If you see gaps on the data pushed by `vmagent` to remote storage when `-remoteWrite.maxDiskUsagePerURL` is set, then try increasing `-remoteWrite.queues`.
Such gaps may appear because `vmagent` cannot keep up with sending the collected data to remote storage, so it starts dropping the buffered data
## Scraping big number of targets
A single `vmagent` instance can scrape tens of thousands of scrape targets. Sometimes this isn't enough due to limitations on CPU, network, RAM, etc.
In this case scrape targets can be split among multiple `vmagent` instances (aka `vmagent` horizontal scaling and clustering).
Each `vmagent` instance in the cluster must use identical `-promscrape.config` files with distinct `-promscrape.cluster.memberNum` values.
The flag value must be in the range `0 ... N-1`, where `N` is the number of `vmagent` instances in the cluster.
The number of `vmagent` instances in the cluster must be passed to `-promscrape.cluster.membersCount` command-line flag. For example, the following commands
spread scrape targets among a cluster of two `vmagent` instances:
```
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
```
By default each scrape target is scraped only by a single `vmagent` instance in the cluster. If there is a need for replicating scrape targets among multiple `vmagent` instances,
then `-promscrape.cluster.replicationFactor` command-line flag must be set to the desired number of replicas. For example, the following commands
start a cluster of three `vmagent` instances, where each target is scraped by two `vmagent` instances:
```
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...
```
If each target is scraped by multiple `vmagent` instances, then data deduplication must be enabled at remote storage pointed by `-remoteWrite.url`.
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
## Scraping targets via a proxy
`vmagent` supports scraping targets via http, https and socks5 proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
```
Proxy can be configured with the following optional settings:
* `proxy_authorization` for generic token authorization. See [Prometheus docs for details on authorization section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config)
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
For example:
```yml
scrape_configs:
- job_name: foo
proxy_url: https://proxy-addr:1234
proxy_basic_auth:
username: foobar
password: secret
proxy_tls_config:
insecure_skip_verify: true
cert_file: /path/to/cert
key_file: /path/to/key
ca_file: /path/to/ca
server_name: real-server-name
```
## Cardinality limiter
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`. The limit can be enforced by setting the following command-line flags:
* `-remoteWrite.maxHourlySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last hour. Useful for limiting the number of active time series.
* `-remoteWrite.maxDailySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last day. Useful for limiting daily churn rate.
Both limits can be set simultaneously. If any of these limits is reached, then samples for new time series are dropped instead of sending them to remote storage systems. A sample of dropped series is put in the log with `WARNING` level.
The exceeded limits can be [monitored](#monitoring) with the following metrics:
* `vmagent_hourly_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded hourly limit on the number of unique time series.
* `vmagent_daily_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded daily limit on the number of unique time series.
These limits are approximate, so `vmagent` can underflow/overflow the limit by a small percentage (usually less than 1%).
## Monitoring
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
either through `vmagent` itself or by Prometheus so that the exported metrics may be analyzed later.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview.
If you have suggestions for improvements or have found a bug - please open an issue on github or add a review to the dashboard.
`vmagent` also exports the status for various targets at the following handlers:
* `http://vmagent-host:8429/targets`. This handler returns human-readable status for every active target.
This page is easy to query from the command line with `wget`, `curl` or similar tools.
It accepts optional `show_original_labels=1` query arg which shows the original labels per each target before applying the relabeling.
This information may be useful for debugging target relabeling.
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
* `http://vmagent-host:8429/ready`. This handler returns http 200 status code when `vmagent` finishes it's initialization for all service_discovery configs.
It may be useful to perform `vmagent` rolling update without any scrape loss.
## Troubleshooting
* We recommend you [set up the official Grafana dashboard](#monitoring) in order to monitor the state of `vmagent'.
* We recommend you increase the maximum number of open files in the system (`ulimit -n`) when scraping a big number of targets,
as `vmagent` establishes at least a single TCP connection per target.
* If `vmagent` uses too big amounts of memory, then the following options can help:
* Enabling stream parsing. See [these docs](#stream-parsing-mode).
* Reducing the number of output queues with `-remoteWrite.queues` command-line option.
* Reducing the amounts of RAM vmagent can use for in-memory buffering with `-memory.allowedPercent` or `-memory.allowedBytes` command-line option. Another option is to reduce memory limits in Docker and/or Kuberntes if `vmagent` runs under these systems.
* Reducing the number of CPU cores vmagent can use by passing `GOMAXPROCS=N` environment variable to `vmagent`, where `N` is the desired limit on CPU cores. Another option is to reduce CPU limits in Docker or Kubernetes if `vmagent` runs under these systems.
* When `vmagent` scrapes many unreliable targets, it can flood the error log with scrape errors. These errors can be suppressed
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
and `http://vmagent-host:8429/api/v1/targets`.
* The `/api/v1/targets` page could be useful for debugging relabeling process for scrape targets.
This page contains original labels for targets dropped during relabeling (see "droppedTargets" section in the page output). By default the `-promscrape.maxDroppedTargets` targets are shown here. If your setup drops more targets during relabeling, then increase `-promscrape.maxDroppedTargets` command-line flag value to see all the dropped targets. Note that tracking each dropped target requires up to 10Kb of RAM. Therefore big values for `-promscrape.maxDroppedTargets` may result in increased memory usage if a big number of scrape targets are dropped during relabeling.
* If `vmagent` scrapes a big number of targets then the `-promscrape.dropOriginalLabels` command-line option may be passed to `vmagent` in order to reduce memory usage.
This option drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page, which may result in reduced debuggability for improperly configured per-target relabeling.
* If `vmagent` scrapes targets with millions of metrics per target (for example, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
we recommend enabling [stream parsing mode](#stream-parsing-mode) in order to reduce memory usage during scraping.
* We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page grows constantly.
* If you see gaps in the data pushed by `vmagent` to remote storage when `-remoteWrite.maxDiskUsagePerURL` is set, try increasing `-remoteWrite.queues`.
Such gaps may appear because `vmagent` cannot keep up with sending the collected data to remote storage. Therefore it starts dropping the buffered data
if the on-disk buffer size exceeds `-remoteWrite.maxDiskUsagePerURL`.
* `vmagent` buffers scraped data at `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
* `vmagent` drops data blocks if remote storage replies with `400 Bad Request` and `409 Conflict` HTTP responses. The number of dropped blocks can be monitored via `vmagent_remotewrite_packets_dropped_total` metric exported at [/metrics page](#monitoring).
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling). Such storage systems include Prometheus, Cortex and Thanos.
* `vmagent` buffers scraped data at the `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
The directory can grow large when remote storage is unavailable for extended periods of time and if `-remoteWrite.maxDiskUsagePerURL` isn't set.
If you don't want to send all the data from the directory to remote storage, simply stop `vmagent` and delete the directory.
If you don't want to send all the data from the directory to remote storage then simply stop `vmagent` and delete the directory.
* By default `vmagent` masks `-remoteWrite.url` with `secret-url` values in logs and at `/metrics` page because
the url may contain sensitive information such as auth tokens or passwords.
Pass `-remoteWrite.showURL` command-line flag when starting `vmagent` in order to see all the valid urls.
* If you see `skipping duplicate scrape target with identical labels` errors when scraping Kubernetes pods, then it is likely these pods listen multiple ports
or they use init container. These errors can be either fixed or suppressed with `-promscrape.suppressDuplicateScrapeTargetErrors` command-line flag.
See available options below if you prefer fixing the root cause of the error:
* By default `vmagent` evenly spreads scrape load in time. If a particular scrape target must be scraped at the beginning of some interval,
then `scrape_align_interval` option must be used. For example, the following config aligns hourly scrapes to the beginning of hour:
The following `relabel_configs` section may help determining `__meta_*` labels resulting in duplicate targets:
```yml
- action: labelmap
regex: __meta_(.*)
scrape_configs:
- job_name: foo
scrape_interval: 1h
scrape_align_interval: 1h
```
* By default `vmagent` evenly spreads scrape load in time. If a particular scrape target must be scraped at specific offset, then `scrape_offset` option must be used.
For example, the following config instructs `vmagent` to scrape the target at 10 seconds of every minute:
```yml
scrape_configs:
- job_name: foo
scrape_interval: 1m
scrape_offset: 10s
```
* If you see `skipping duplicate scrape target with identical labels` errors when scraping Kubernetes pods, then it is likely these pods listen to multiple ports
or they use an init container. These errors can either be fixed or suppressed with the `-promscrape.suppressDuplicateScrapeTargetErrors` command-line flag.
See the available options below if you prefer fixing the root cause of the error:
The following relabeling rule may be added to `relabel_configs` section in order to filter out pods with unneeded ports:
```yml
- action: keep_if_equal
@@ -318,27 +436,27 @@ It may be useful for performing `vmagent` rolling update without scrape loss.
```
### How to build from sources
## How to build from sources
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmagent` is located in `vmutils-*` archives there.
We recommend using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmagent` is located in the `vmutils-*` archives .
#### Development build
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make vmagent` from the root folder of the repository.
It builds `vmagent` binary and puts it into the `bin` folder.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmagent` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds the `vmagent` binary and puts it into the `bin` folder.
#### Production build
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmagent-prod` from the root folder of the repository.
2. Run `make vmagent-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmagent-prod` binary and puts it into the `bin` folder.
#### Building docker images
### Building docker images
Run `make package-vmagent`. It builds `victoriametrics/vmagent:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
`<PKG_TAG>` is an auto-generated image tag, which depends on source code in [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmagent`.
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
@@ -348,34 +466,34 @@ by setting it via `<ROOT_IMAGE>` environment variable. For example, the followin
ROOT_IMAGE=scratch make package-vmagent
```
#### ARM build
### ARM build
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
#### Development ARM build
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make vmagent-arm` or `make vmagent-arm64` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmagent-arm` or `make vmagent-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics)
It builds `vmagent-arm` or `vmagent-arm64` binary respectively and puts it into the `bin` folder.
#### Production ARM build
### Production ARM build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmagent-arm-prod` or `make vmagent-arm64-prod` from the root folder of the repository.
2. Run `make vmagent-arm-prod` or `make vmagent-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmagent-arm-prod` or `vmagent-arm64-prod` binary respectively and puts it into the `bin` folder.
### Profiling
## Profiling
`vmagent` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):
* Memory profile. It can be collected with the following command:
* Memory profile can be collected with the following command:
```bash
curl -s http://<vmagent-host>:8429/debug/pprof/heap > mem.pprof
```
* CPU profile. It can be collected with the following command:
* CPU profile can be collected with the following command:
```bash
curl -s http://<vmagent-host>:8429/debug/pprof/profile > cpu.pprof
@@ -384,3 +502,267 @@ curl -s http://<vmagent-host>:8429/debug/pprof/profile > cpu.pprof
The command for collecting CPU profile waits for 30 seconds before returning.
The collected profiles may be analyzed with [go tool pprof](https://github.com/google/pprof).
## Advanced usage
`vmagent` can be fine-tuned with various command-line flags. Run `./vmagent -help` in order to see the full list of these flags with their desciptions and default values:
```
./vmagent -help
vmagent collects metrics data via popular data ingestion protocols and routes them to VictoriaMetrics.
See the docs at https://docs.victoriametrics.com/vmagent.html .
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dryRun
Whether to check only config files without running vmagent. The following files are checked: -promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig . Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
-import.maxLineLen size
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports an array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
-opentsdbTrimTimestamp duration
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-opentsdbhttp.maxInsertRequestSize size
The maximum size of OpenTSDB HTTP put request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-opentsdbhttpTrimTimestamp duration
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-promscrape.cluster.memberNum int
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://docs.victoriametrics.com/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
-promscrape.configCheckInterval duration
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-promscrape.consul.waitTime duration
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
-promscrape.discovery.concurrency int
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
-promscrape.discovery.concurrentWaitTime duration
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
-promscrape.ec2SDCheckInterval duration
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
-promscrape.eurekaSDCheckInterval duration
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
-promscrape.fileSDCheckInterval duration
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
-promscrape.maxDroppedTargets int
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-remoteWrite.basicAuth.password array
Optional basic auth password to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.basicAuth.passwordFile array
Optional path to basic auth password to use for -remoteWrite.url. The file is re-read every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.basicAuth.username array
Optional basic auth username to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.bearerToken array
Optional bearer auth token to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.bearerTokenFile array
Optional path to bearer token file to use for -remoteWrite.url. The token is re-read from the file every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.flushInterval duration
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
-remoteWrite.label array
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDailySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -remoteWrite.maxHourlySeries
-remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.maxHourlySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -remoteWrite.maxDailySeries
-remoteWrite.oauth2.clientID array
Optional OAuth2 clientID to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientSecret array
Optional OAuth2 clientSecret to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientSecretFile array
Optional OAuth2 clientSecretFile to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.scopes array
Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.tokenUrl array
Optional OAuth2 tokenURL to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.queues int
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage (default 4)
-remoteWrite.rateLimit array
Optional rate limit in bytes per second for data sent to -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.relabelConfig string
Optional path to file with relabel_config entries. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.sendTimeout array
Timeout for sending a single block of data to -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.showURL
Whether to show -remoteWrite.url in the exported metrics. It is hidden by default, since it can contain sensitive info such as auth key
-remoteWrite.significantFigures array
The number of significant figures to leave in metric values before writing them to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsInsecureSkipVerify array
Whether to skip tls verification when connecting to -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsKeyFile array
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsServerName array
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tmpDataPath string
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
-remoteWrite.url array
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelConfig array
Optional path to relabel config for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```

View File

@@ -12,6 +12,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
@@ -33,7 +34,9 @@ var (
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, false, "", "", insertRows)
return parser.ParseStream(r, false, "", "", func(db string, rows []parser.Row) error {
return insertRows(db, rows, nil)
})
})
}
@@ -41,17 +44,23 @@ func InsertHandlerForReader(r io.Reader) error {
//
// See https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/tsdb/README.md
func InsertHandlerForHTTP(req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
q := req.URL.Query()
precision := q.Get("precision")
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, insertRows)
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(db, rows, extraLabels)
})
})
}
func insertRows(db string, rows []parser.Row) error {
func insertRows(db string, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := getPushCtx()
defer putPushCtx(ctx)
@@ -82,11 +91,13 @@ func insertRows(db string, rows []parser.Row) error {
Value: db,
})
}
commonLabels = append(commonLabels, extraLabels...)
ctx.metricGroupBuf = ctx.metricGroupBuf[:0]
if !*skipMeasurement {
ctx.metricGroupBuf = append(ctx.metricGroupBuf, r.Measurement...)
}
skipFieldKey := len(r.Fields) == 1 && *skipSingleField
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1139
skipFieldKey := len(r.Measurement) > 0 && len(r.Fields) == 1 && *skipSingleField
if len(ctx.metricGroupBuf) > 0 && !skipFieldKey {
ctx.metricGroupBuf = append(ctx.metricGroupBuf, *measurementFieldSeparator...)
}

View File

@@ -23,6 +23,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
influxserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/influx"
opentsdbserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/opentsdb"
@@ -40,7 +41,7 @@ var (
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to `http://<vmagent>:8429/write`")
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
"Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. "+
@@ -144,7 +145,18 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
fmt.Fprintf(w, "vmagent - see docs at https://victoriametrics.github.io/vmagent.html")
if r.Method != "GET" {
return false
}
fmt.Fprintf(w, "<h2>vmagent</h2>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/vmagent.html'>https://docs.victoriametrics.com/vmagent.html</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"/targets", "discovered targets list"},
{"/api/v1/targets", "advanced information about discovered targets in JSON format"},
{"/metrics", "available service metrics"},
{"/-/reload", "reload configuration"},
})
return true
}
path := strings.Replace(r.URL.Path, "//", "/", -1)
@@ -204,10 +216,8 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusNoContent)
return true
case "/query":
// Emulate fake response for influx query.
// This is required for TSBS benchmark.
influxQueryRequests.Inc()
fmt.Fprintf(w, `{"results":[{"series":[{"values":[]}]}]}`)
influxutils.WriteDatabaseNames(w)
return true
case "/targets":
promscrapeTargetsRequests.Inc()
@@ -269,7 +279,7 @@ func usage() {
const s = `
vmagent collects metrics data via popular data ingestion protocols and routes it to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmagent.html .
See the docs at https://docs.victoriametrics.com/vmagent.html .
`
flagutil.Usage(s)
}

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 8429
ENTRYPOINT ["/vmagent-prod"]
ARG TARGETARCH
COPY vmagent-${TARGETARCH}-prod ./vmagent-prod

View File

@@ -6,6 +6,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/opentsdbhttp"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
@@ -19,12 +20,18 @@ var (
// InsertHandler processes HTTP OpenTSDB put requests.
// See http://opentsdb.net/docs/build/html/api_http/put.html
func InsertHandler(req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, insertRows)
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
})
}
func insertRows(rows []parser.Row) error {
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -45,6 +52,7 @@ func insertRows(rows []parser.Row) error {
Value: tag.Value,
})
}
labels = append(labels, extraLabels...)
samples = append(samples, prompbmarshal.Sample{
Value: r.Value,
Timestamp: r.Timestamp,

View File

@@ -31,7 +31,7 @@ func InsertHandler(req *http.Request) error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
})
}, nil)
})
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
@@ -20,12 +21,18 @@ var (
// InsertHandler processes remote write for prometheus.
func InsertHandler(req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, insertRows)
return parser.ParseStream(req, func(tss []prompb.TimeSeries) error {
return insertRows(tss, extraLabels)
})
})
}
func insertRows(timeseries []prompb.TimeSeries) error {
func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -44,6 +51,7 @@ func insertRows(timeseries []prompb.TimeSeries) error {
Value: bytesutil.ToUnsafeString(label.Value),
})
}
labels = append(labels, extraLabels...)
samplesLen := len(samples)
for i := range ts.Samples {
sample := &ts.Samples[i]

View File

@@ -2,8 +2,6 @@ package remotewrite
import (
"bytes"
"crypto/tls"
"encoding/base64"
"fmt"
"io/ioutil"
"net/http"
@@ -16,10 +14,14 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/metrics"
)
var (
rateLimit = flagutil.NewArrayInt("remoteWrite.rateLimit", "Optional rate limit in bytes per second for data sent to -remoteWrite.url. "+
"By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data "+
"is sent after temporary unavailability of the remote storage")
sendTimeout = flagutil.NewArrayDuration("remoteWrite.sendTimeout", "Timeout for sending a single block of data to -remoteWrite.url")
proxyURL = flagutil.NewArray("remoteWrite.proxyURL", "Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. "+
"Example: -remoteWrite.proxyURL=socks5://proxy:1234")
@@ -38,17 +40,37 @@ var (
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPassword = flagutil.NewArray("remoteWrite.basicAuth.password", "Optional basic auth password to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPasswordFile = flagutil.NewArray("remoteWrite.basicAuth.passwordFile", "Optional path to basic auth password to use for -remoteWrite.url. "+
"The file is re-read every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerToken = flagutil.NewArray("remoteWrite.bearerToken", "Optional bearer auth token to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerTokenFile = flagutil.NewArray("remoteWrite.bearerTokenFile", "Optional path to bearer token file to use for -remoteWrite.url. "+
"The token is re-read from the file every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientID = flagutil.NewArray("remoteWrite.oauth2.clientID", "Optional OAuth2 clientID to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecret = flagutil.NewArray("remoteWrite.oauth2.clientSecret", "Optional OAuth2 clientSecret to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecretFile = flagutil.NewArray("remoteWrite.oauth2.clientSecretFile", "Optional OAuth2 clientSecretFile to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2TokenURL = flagutil.NewArray("remoteWrite.oauth2.tokenUrl", "Optional OAuth2 tokenURL to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2Scopes = flagutil.NewArray("remoteWrite.oauth2.scopes", "Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
)
type client struct {
sanitizedURL string
remoteWriteURL string
authHeader string
fq *persistentqueue.FastQueue
hc *http.Client
authCfg *promauth.Config
rl rateLimiter
bytesSent *metrics.Counter
blocksSent *metrics.Counter
requestDuration *metrics.Histogram
@@ -62,10 +84,11 @@ type client struct {
}
func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
tlsCfg, err := getTLSConfig(argIdx)
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Panicf("FATAL: cannot initialize TLS config: %s", err)
logger.Panicf("FATAL: cannot initialize auth config: %s", err)
}
tlsCfg := authCfg.NewTLSConfig()
tr := &http.Transport{
Dial: statDial,
TLSClientConfig: tlsCfg,
@@ -86,26 +109,10 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
}
tr.Proxy = http.ProxyURL(urlProxy)
}
authHeader := ""
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
if len(username) > 0 || len(password) > 0 {
// See https://en.wikipedia.org/wiki/Basic_access_authentication
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
authHeader = "Basic " + token64
}
token := bearerToken.GetOptionalArg(argIdx)
if len(token) > 0 {
if authHeader != "" {
logger.Fatalf("`-remoteWrite.bearerToken`=%q cannot be set when `-remoteWrite.basicAuth.*` flags are set", token)
}
authHeader = "Bearer " + token
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authHeader: authHeader,
authCfg: authCfg,
fq: fq,
hc: &http.Client{
Transport: tr,
@@ -113,6 +120,12 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
},
stopCh: make(chan struct{}),
}
if bytesPerSec := rateLimit.GetOptionalArgOrDefault(argIdx, 0); bytesPerSec > 0 {
logger.Infof("applying %d bytes per second rate limit for -remoteWrite.url=%q", bytesPerSec, sanitizedURL)
c.rl.perSecondLimit = int64(bytesPerSec)
}
c.rl.limitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remote_write_rate_limit_reached_total{url=%q}`, c.sanitizedURL))
c.bytesSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_bytes_sent_total{url=%q}`, c.sanitizedURL))
c.blocksSent = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_blocks_sent_total{url=%q}`, c.sanitizedURL))
c.requestDuration = metrics.GetOrCreateHistogram(fmt.Sprintf(`vmagent_remotewrite_duration_seconds{url=%q}`, c.sanitizedURL))
@@ -137,58 +150,93 @@ func (c *client) MustStop() {
logger.Infof("stopped client for -remoteWrite.url=%q", c.sanitizedURL)
}
func getTLSConfig(argIdx int) (*tls.Config, error) {
c := &promauth.TLSConfig{
func getAuthConfig(argIdx int) (*promauth.Config, error) {
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
passwordFile := basicAuthPasswordFile.GetOptionalArg(argIdx)
var basicAuthCfg *promauth.BasicAuthConfig
if username != "" || password != "" || passwordFile != "" {
basicAuthCfg = &promauth.BasicAuthConfig{
Username: username,
Password: password,
PasswordFile: passwordFile,
}
}
token := bearerToken.GetOptionalArg(argIdx)
tokenFile := bearerTokenFile.GetOptionalArg(argIdx)
var oauth2Cfg *promauth.OAuth2Config
clientSecret := oauth2ClientSecret.GetOptionalArg(argIdx)
clientSecretFile := oauth2ClientSecretFile.GetOptionalArg(argIdx)
if clientSecretFile != "" || clientSecret != "" {
oauth2Cfg = &promauth.OAuth2Config{
ClientID: oauth2ClientID.GetOptionalArg(argIdx),
ClientSecret: clientSecret,
ClientSecretFile: clientSecretFile,
TokenURL: oauth2TokenURL.GetOptionalArg(argIdx),
Scopes: strings.Split(oauth2Scopes.GetOptionalArg(argIdx), ";"),
}
}
tlsCfg := &promauth.TLSConfig{
CAFile: tlsCAFile.GetOptionalArg(argIdx),
CertFile: tlsCertFile.GetOptionalArg(argIdx),
KeyFile: tlsKeyFile.GetOptionalArg(argIdx),
ServerName: tlsServerName.GetOptionalArg(argIdx),
InsecureSkipVerify: tlsInsecureSkipVerify.GetOptionalArg(argIdx),
}
if c.CAFile == "" && c.CertFile == "" && c.KeyFile == "" && c.ServerName == "" && !c.InsecureSkipVerify {
return nil, nil
}
cfg, err := promauth.NewConfig(".", nil, "", "", c)
authCfg, err := promauth.NewConfig(".", nil, basicAuthCfg, token, tokenFile, oauth2Cfg, tlsCfg)
if err != nil {
return nil, fmt.Errorf("cannot populate TLS config: %w", err)
return nil, fmt.Errorf("cannot populate OAuth2 config for remoteWrite idx: %d, err: %w", argIdx, err)
}
tlsCfg := cfg.NewTLSConfig()
return tlsCfg, nil
return authCfg, nil
}
func (c *client) runWorker() {
var ok bool
var block []byte
ch := make(chan struct{})
ch := make(chan bool, 1)
for {
block, ok = c.fq.MustReadBlock(block[:0])
if !ok {
return
}
go func() {
c.sendBlock(block)
ch <- struct{}{}
ch <- c.sendBlock(block)
}()
select {
case <-ch:
// The block has been sent successfully
continue
case ok := <-ch:
if ok {
// The block has been sent successfully
continue
}
// Return unsent block to the queue.
c.fq.MustWriteBlock(block)
return
case <-c.stopCh:
// c must be stopped. Wait for a while in the hope the block will be sent.
graceDuration := 5 * time.Second
select {
case <-ch:
// The block has been sent successfully.
case ok := <-ch:
if !ok {
// Return unsent block to the queue.
c.fq.MustWriteBlock(block)
}
case <-time.After(graceDuration):
logger.Errorf("couldn't sent block with size %d bytes to %q in %.3f seconds during shutdown; dropping it",
len(block), c.sanitizedURL, graceDuration.Seconds())
// Return unsent block to the queue.
c.fq.MustWriteBlock(block)
}
return
}
}
}
func (c *client) sendBlock(block []byte) {
// sendBlock returns false only if c.stopCh is closed.
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlock(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
retriesCount := 0
c.bytesSent.Add(len(block))
@@ -204,8 +252,8 @@ again:
h.Set("Content-Type", "application/x-protobuf")
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
if c.authHeader != "" {
req.Header.Set("Authorization", c.authHeader)
if ah := c.authCfg.GetAuthHeader(); ah != "" {
req.Header.Set("Authorization", ah)
}
startTime := time.Now()
@@ -217,14 +265,15 @@ again:
if retryDuration > time.Minute {
retryDuration = time.Minute
}
logger.Errorf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
logger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
t := time.NewTimer(retryDuration)
t := timerpool.Get(retryDuration)
select {
case <-c.stopCh:
t.Stop()
return
timerpool.Put(t)
return false
case <-t.C:
timerpool.Put(t)
}
c.retriesCount.Inc()
goto again
@@ -233,18 +282,16 @@ again:
if statusCode/100 == 2 {
_ = resp.Body.Close()
c.requestsOKCount.Inc()
return
return true
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode == 409 {
// Just drop block on 409 status code like Prometheus does.
if statusCode == 409 || statusCode == 400 {
// Just drop block on 409 and 400 status codes like Prometheus does.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
body, _ := ioutil.ReadAll(resp.Body)
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
_ = resp.Body.Close()
logger.Errorf("unexpected status code received when sending a block with size %d bytes to %q: #%d; dropping the block like Prometheus does; "+
"response body=%q", len(block), c.sanitizedURL, statusCode, body)
c.packetsDropped.Inc()
return
return true
}
// Unexpected status code returned
@@ -261,13 +308,56 @@ again:
logger.Errorf("unexpected status code received after sending a block with size %d bytes to %q during retry #%d: %d; response body=%q; "+
"re-sending the block in %.3f seconds", len(block), c.sanitizedURL, retriesCount, statusCode, body, retryDuration.Seconds())
}
t := time.NewTimer(retryDuration)
t := timerpool.Get(retryDuration)
select {
case <-c.stopCh:
t.Stop()
return
timerpool.Put(t)
return false
case <-t.C:
timerpool.Put(t)
}
c.retriesCount.Inc()
goto again
}
type rateLimiter struct {
perSecondLimit int64
// mu protects budget and deadline from concurrent access.
mu sync.Mutex
// The current budget. It is increased by perSecondLimit every second.
budget int64
// The next deadline for increasing the budget by perSecondLimit
deadline time.Time
limitReached *metrics.Counter
}
func (rl *rateLimiter) register(dataLen int, stopCh <-chan struct{}) {
limit := rl.perSecondLimit
if limit <= 0 {
return
}
rl.mu.Lock()
defer rl.mu.Unlock()
for rl.budget <= 0 {
if d := time.Until(rl.deadline); d > 0 {
rl.limitReached.Inc()
t := timerpool.Get(d)
select {
case <-stopCh:
timerpool.Put(t)
return
case <-t.C:
timerpool.Put(t)
}
}
rl.budget += limit
rl.deadline = time.Now().Add(time.Second)
}
rl.budget -= int64(dataLen)
}

View File

@@ -7,6 +7,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
@@ -18,8 +19,7 @@ import (
var (
flushInterval = flag.Duration("remoteWrite.flushInterval", time.Second, "Interval for flushing the data to remote storage. "+
"Higher value reduces network bandwidth usage at the cost of delayed push of scraped data to remote storage. "+
"Minimum supported interval is 1 second")
"This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url")
maxUnpackedBlockSize = flagutil.NewBytes("remoteWrite.maxBlockSize", 8*1024*1024, "The maximum size in bytes of unpacked request to send to remote storage. "+
"It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics")
)
@@ -27,6 +27,9 @@ var (
// the maximum number of rows to send per each block.
const maxRowsPerBlock = 10000
// the maximum number of labels to send per each block.
const maxLabelsPerBlock = 10 * maxRowsPerBlock
type pendingSeries struct {
mu sync.Mutex
wr writeRequest
@@ -35,9 +38,11 @@ type pendingSeries struct {
periodicFlusherWG sync.WaitGroup
}
func newPendingSeries(pushBlock func(block []byte)) *pendingSeries {
func newPendingSeries(pushBlock func(block []byte), significantFigures, roundDigits int) *pendingSeries {
var ps pendingSeries
ps.wr.pushBlock = pushBlock
ps.wr.significantFigures = significantFigures
ps.wr.roundDigits = roundDigits
ps.stopCh = make(chan struct{})
ps.periodicFlusherWG.Add(1)
go func() {
@@ -85,9 +90,17 @@ type writeRequest struct {
// Move lastFlushTime to the top of the struct in order to guarantee atomic access on 32-bit architectures.
lastFlushTime uint64
wr prompbmarshal.WriteRequest
// pushBlock is called when whe write request is ready to be sent.
pushBlock func(block []byte)
// How many significant figures must be left before sending the writeRequest to pushBlock.
significantFigures int
// How many decimal digits after point must be left before sending the writeRequest to pushBlock.
roundDigits int
wr prompbmarshal.WriteRequest
tss []prompbmarshal.TimeSeries
labels []prompbmarshal.Label
@@ -96,6 +109,8 @@ type writeRequest struct {
}
func (wr *writeRequest) reset() {
// Do not reset pushBlock, significantFigures and roundDigits, since they are re-used.
wr.wr.Timeseries = nil
for i := range wr.tss {
@@ -114,17 +129,34 @@ func (wr *writeRequest) reset() {
func (wr *writeRequest) flush() {
wr.wr.Timeseries = wr.tss
wr.adjustSampleValues()
atomic.StoreUint64(&wr.lastFlushTime, fasttime.UnixTimestamp())
pushWriteRequest(&wr.wr, wr.pushBlock)
wr.reset()
}
func (wr *writeRequest) adjustSampleValues() {
samples := wr.samples
if n := wr.significantFigures; n > 0 {
for i := range samples {
s := &samples[i]
s.Value = decimal.RoundToSignificantFigures(s.Value, n)
}
}
if n := wr.roundDigits; n < 100 {
for i := range samples {
s := &samples[i]
s.Value = decimal.RoundToDecimalDigits(s.Value, n)
}
}
}
func (wr *writeRequest) push(src []prompbmarshal.TimeSeries) {
tssDst := wr.tss
for i := range src {
tssDst = append(tssDst, prompbmarshal.TimeSeries{})
wr.copyTimeSeries(&tssDst[len(tssDst)-1], &src[i])
if len(wr.samples) >= maxRowsPerBlock {
if len(wr.samples) >= maxRowsPerBlock || len(wr.labels) >= maxLabelsPerBlock {
wr.tss = tssDst
wr.flush()
tssDst = wr.tss

View File

@@ -14,9 +14,9 @@ import (
var (
unparsedLabelsGlobal = flagutil.NewArray("remoteWrite.label", "Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. "+
"Pass multiple -remoteWrite.label flags in order to add multiple flags to metrics before sending them to remote storage")
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabel_config entries. These entries are applied to all the metrics "+
"before sending them to -remoteWrite.url. See https://victoriametrics.github.io/vmagent.html#relabeling for details")
"before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details")
relabelConfigPaths = flagutil.NewArray("remoteWrite.urlRelabelConfig", "Optional path to relabel config for the corresponding -remoteWrite.url")
)
@@ -41,7 +41,7 @@ func loadRelabelConfigs() (*relabelConfigs, error) {
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url args: %d",
len(*relabelConfigPaths), len(*remoteWriteURLs))
}
rcs.perURL = make([][]promrelabel.ParsedRelabelConfig, len(*remoteWriteURLs))
rcs.perURL = make([]*promrelabel.ParsedConfigs, len(*remoteWriteURLs))
for i, path := range *relabelConfigPaths {
if len(path) == 0 {
// Skip empty relabel config.
@@ -57,8 +57,8 @@ func loadRelabelConfigs() (*relabelConfigs, error) {
}
type relabelConfigs struct {
global []promrelabel.ParsedRelabelConfig
perURL [][]promrelabel.ParsedRelabelConfig
global *promrelabel.ParsedConfigs
perURL []*promrelabel.ParsedConfigs
}
// initLabelsGlobal must be called after parsing command-line flags.
@@ -79,8 +79,8 @@ func initLabelsGlobal() {
}
}
func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, extraLabels []prompbmarshal.Label, prcs []promrelabel.ParsedRelabelConfig) []prompbmarshal.TimeSeries {
if len(extraLabels) == 0 && len(prcs) == 0 {
func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, extraLabels []prompbmarshal.Label, pcs *promrelabel.ParsedConfigs) []prompbmarshal.TimeSeries {
if len(extraLabels) == 0 && pcs.Len() == 0 {
// Nothing to change.
return tss
}
@@ -100,7 +100,7 @@ func (rctx *relabelCtx) applyRelabeling(tss []prompbmarshal.TimeSeries, extraLab
labels = append(labels, *extraLabel)
}
}
labels = promrelabel.ApplyRelabelConfigs(labels, labelsLen, prcs, true)
labels = pcs.Apply(labels, labelsLen, true)
if len(labels) == labelsLen {
// Drop the current time series, since relabeling removed all the labels.
continue

View File

@@ -3,17 +3,21 @@ package remotewrite
import (
"flag"
"fmt"
"strconv"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/memory"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/metrics"
xxhash "github.com/cespare/xxhash/v2"
)
@@ -22,8 +26,9 @@ var (
remoteWriteURLs = flagutil.NewArray("remoteWrite.url", "Remote storage URL to write data to. It must support Prometheus remote_write API. "+
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored")
queues = flag.Int("remoteWrite.queues", 4, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
queues = flag.Int("remoteWrite.queues", 4, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
@@ -31,9 +36,21 @@ var (
"for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. "+
"Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. "+
"Disk usage is unlimited if the value is set to 0")
significantFigures = flag.Int("remoteWrite.significantFigures", 0, "The number of significant figures to leave in metric values before writing them to remote storage. "+
"See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. "+
"This option may be used for increasing on-disk compression level for the stored metrics")
significantFigures = flagutil.NewArrayInt("remoteWrite.significantFigures", "The number of significant figures to leave in metric values before writing them "+
"to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. "+
"This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits")
roundDigits = flagutil.NewArrayInt("remoteWrite.roundDigits", "Round metric values to this number of decimal digits after the point before writing them to remote storage. "+
"Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. "+
"By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. "+
"This option may be used for improving data compression for the stored metrics")
sortLabels = flag.Bool("sortLabels", false, `Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. `+
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
maxHourlySeries = flag.Int("remoteWrite.maxHourlySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -remoteWrite.maxDailySeries")
maxDailySeries = flag.Int("remoteWrite.maxDailySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -remoteWrite.maxHourlySeries")
)
var rwctxs []*remoteWriteCtx
@@ -43,7 +60,7 @@ var allRelabelConfigs atomic.Value
// maxQueues limits the maximum value for `-remoteWrite.queues`. There is no sense in setting too high value,
// since it may lead to high memory usage due to big number of buffers.
var maxQueues = cgroup.AvailableCPUs() * 4
var maxQueues = cgroup.AvailableCPUs() * 16
// InitSecretFlags must be called after flag.Parse and before any logging.
func InitSecretFlags() {
@@ -62,6 +79,24 @@ func Init() {
if len(*remoteWriteURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
}
if *maxHourlySeries > 0 {
hourlySeriesLimiter = bloomfilter.NewLimiter(*maxHourlySeries, time.Hour)
_ = metrics.NewGauge(`vmagent_hourly_series_limit_max_series`, func() float64 {
return float64(hourlySeriesLimiter.MaxItems())
})
_ = metrics.NewGauge(`vmagent_hourly_series_limit_current_series`, func() float64 {
return float64(hourlySeriesLimiter.CurrentItems())
})
}
if *maxDailySeries > 0 {
dailySeriesLimiter = bloomfilter.NewLimiter(*maxDailySeries, 24*time.Hour)
_ = metrics.NewGauge(`vmagent_daily_series_limit_max_series`, func() float64 {
return float64(dailySeriesLimiter.MaxItems())
})
_ = metrics.NewGauge(`vmagent_daily_series_limit_current_series`, func() float64 {
return float64(dailySeriesLimiter.CurrentItems())
})
}
if *queues > maxQueues {
*queues = maxQueues
}
@@ -69,6 +104,12 @@ func Init() {
*queues = 1
}
initLabelsGlobal()
// Register SIGHUP handler for config reload before loadRelabelConfigs.
// This guarantees that the config will be re-read if the signal arrives just after loadRelabelConfig.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
rcs, err := loadRelabelConfigs()
if err != nil {
logger.Fatalf("cannot load relabel configs: %s", err)
@@ -76,11 +117,11 @@ func Init() {
allRelabelConfigs.Store(rcs)
maxInmemoryBlocks := memory.Allowed() / len(*remoteWriteURLs) / maxRowsPerBlock / 100
if maxInmemoryBlocks > 200 {
if maxInmemoryBlocks > 400 {
// There is no much sense in keeping higher number of blocks in memory,
// since this means that the producer outperforms consumer and the queue
// will continue growing. It is better storing the queue to file.
maxInmemoryBlocks = 200
maxInmemoryBlocks = 400
}
if maxInmemoryBlocks < 2 {
maxInmemoryBlocks = 2
@@ -95,7 +136,6 @@ func Init() {
}
// Start config reloader.
sighupCh := procutil.NewSighupChan()
configReloaderWG.Add(1)
go func() {
defer configReloaderWG.Done()
@@ -137,32 +177,23 @@ func Stop() {
//
// Note that wr may be modified by Push due to relabeling and rounding.
func Push(wr *prompbmarshal.WriteRequest) {
if *significantFigures > 0 {
// Round values according to significantFigures
for i := range wr.Timeseries {
samples := wr.Timeseries[i].Samples
for j := range samples {
s := &samples[j]
s.Value = decimal.Round(s.Value, *significantFigures)
}
}
}
var rctx *relabelCtx
rcs := allRelabelConfigs.Load().(*relabelConfigs)
prcsGlobal := rcs.global
if len(prcsGlobal) > 0 || len(labelsGlobal) > 0 {
pcsGlobal := rcs.global
if pcsGlobal.Len() > 0 || len(labelsGlobal) > 0 {
rctx = getRelabelCtx()
}
tss := wr.Timeseries
for len(tss) > 0 {
// Process big tss in smaller blocks in order to reduce the maximum memory usage
samplesCount := 0
labelsCount := 0
i := 0
for i < len(tss) {
samplesCount += len(tss[i].Samples)
labelsCount += len(tss[i].Labels)
i++
if samplesCount > maxRowsPerBlock {
if samplesCount >= maxRowsPerBlock || labelsCount >= maxLabelsPerBlock {
break
}
}
@@ -175,11 +206,15 @@ func Push(wr *prompbmarshal.WriteRequest) {
}
if rctx != nil {
tssBlockLen := len(tssBlock)
tssBlock = rctx.applyRelabeling(tssBlock, labelsGlobal, prcsGlobal)
tssBlock = rctx.applyRelabeling(tssBlock, labelsGlobal, pcsGlobal)
globalRelabelMetricsDropped.Add(tssBlockLen - len(tssBlock))
}
for _, rwctx := range rwctxs {
rwctx.Push(tssBlock)
sortLabelsIfNeeded(tssBlock)
tssBlock = limitSeriesCardinality(tssBlock)
if len(tssBlock) > 0 {
for _, rwctx := range rwctxs {
rwctx.Push(tssBlock)
}
}
if rctx != nil {
rctx.reset()
@@ -190,6 +225,87 @@ func Push(wr *prompbmarshal.WriteRequest) {
}
}
// sortLabelsIfNeeded sorts labels if -sortLabels command-line flag is set.
func sortLabelsIfNeeded(tss []prompbmarshal.TimeSeries) {
if !*sortLabels {
return
}
for i := range tss {
promrelabel.SortLabels(tss[i].Labels)
}
}
func limitSeriesCardinality(tss []prompbmarshal.TimeSeries) []prompbmarshal.TimeSeries {
if hourlySeriesLimiter == nil && dailySeriesLimiter == nil {
return tss
}
dst := make([]prompbmarshal.TimeSeries, 0, len(tss))
for i := range tss {
labels := tss[i].Labels
h := getLabelsHash(labels)
if hourlySeriesLimiter != nil && !hourlySeriesLimiter.Add(h) {
hourlySeriesLimitRowsDropped.Add(len(tss[i].Samples))
logSkippedSeries(labels, "-remoteWrite.maxHourlySeries", hourlySeriesLimiter.MaxItems())
continue
}
if dailySeriesLimiter != nil && !dailySeriesLimiter.Add(h) {
dailySeriesLimitRowsDropped.Add(len(tss[i].Samples))
logSkippedSeries(labels, "-remoteWrite.maxDailySeries", dailySeriesLimiter.MaxItems())
continue
}
dst = append(dst, tss[i])
}
return dst
}
var (
hourlySeriesLimiter *bloomfilter.Limiter
dailySeriesLimiter *bloomfilter.Limiter
hourlySeriesLimitRowsDropped = metrics.NewCounter(`vmagent_hourly_series_limit_rows_dropped_total`)
dailySeriesLimitRowsDropped = metrics.NewCounter(`vmagent_daily_series_limit_rows_dropped_total`)
)
func getLabelsHash(labels []prompbmarshal.Label) uint64 {
bb := labelsHashBufPool.Get()
b := bb.B[:0]
for _, label := range labels {
b = append(b, label.Name...)
b = append(b, label.Value...)
}
h := xxhash.Sum64(b)
bb.B = b
labelsHashBufPool.Put(bb)
return h
}
var labelsHashBufPool bytesutil.ByteBufferPool
func logSkippedSeries(labels []prompbmarshal.Label, flagName string, flagValue int) {
select {
case <-logSkippedSeriesTicker.C:
logger.Warnf("skip series %s because %s=%d reached", labelsToString(labels), flagName, flagValue)
default:
}
}
var logSkippedSeriesTicker = time.NewTicker(5 * time.Second)
func labelsToString(labels []prompbmarshal.Label) string {
var b []byte
b = append(b, '{')
for i, label := range labels {
b = append(b, label.Name...)
b = append(b, '=')
b = strconv.AppendQuote(b, label.Value)
if i+1 < len(labels) {
b = append(b, ',')
}
}
b = append(b, '}')
return string(b)
}
var globalRelabelMetricsDropped = metrics.NewCounter("vmagent_remotewrite_global_relabel_metrics_dropped_total")
type remoteWriteCtx struct {
@@ -213,9 +329,17 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL string, maxInmemoryBlocks int,
return float64(fq.GetInmemoryQueueLen())
})
c := newClient(argIdx, remoteWriteURL, sanitizedURL, fq, *queues)
pss := make([]*pendingSeries, *queues)
sf := significantFigures.GetOptionalArgOrDefault(argIdx, 0)
rd := roundDigits.GetOptionalArgOrDefault(argIdx, 100)
pssLen := *queues
if n := cgroup.AvailableCPUs(); pssLen > n {
// There is no sense in running more than availableCPUs concurrent pendingSeries,
// since every pendingSeries can saturate up to a single CPU.
pssLen = n
}
pss := make([]*pendingSeries, pssLen)
for i := range pss {
pss[i] = newPendingSeries(fq.MustWriteBlock)
pss[i] = newPendingSeries(fq.MustWriteBlock, sf, rd)
}
return &remoteWriteCtx{
idx: argIdx,
@@ -233,10 +357,11 @@ func (rwctx *remoteWriteCtx) MustStop() {
}
rwctx.idx = 0
rwctx.pss = nil
rwctx.fq.MustClose()
rwctx.fq = nil
rwctx.fq.UnblockAllReaders()
rwctx.c.MustStop()
rwctx.c = nil
rwctx.fq.MustClose()
rwctx.fq = nil
rwctx.relabelMetricsDropped = nil
}
@@ -245,8 +370,8 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
var rctx *relabelCtx
var v *[]prompbmarshal.TimeSeries
rcs := allRelabelConfigs.Load().(*relabelConfigs)
prcs := rcs.perURL[rwctx.idx]
if len(prcs) > 0 {
pcs := rcs.perURL[rwctx.idx]
if pcs.Len() > 0 {
rctx = getRelabelCtx()
// Make a copy of tss before applying relabeling in order to prevent
// from affecting time series for other remoteWrite.url configs.
@@ -255,7 +380,7 @@ func (rwctx *remoteWriteCtx) Push(tss []prompbmarshal.TimeSeries) {
v = tssRelabelPool.Get().(*[]prompbmarshal.TimeSeries)
tss = append(*v, tss...)
tssLen := len(tss)
tss = rctx.applyRelabeling(tss, nil, prcs)
tss = rctx.applyRelabeling(tss, nil, pcs)
rwctx.relabelMetricsDropped.Add(tssLen - len(tss))
}
pss := rwctx.pss

View File

@@ -1,25 +1,17 @@
package remotewrite
import (
"fmt"
"net"
"strings"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
"github.com/VictoriaMetrics/fasthttp"
"github.com/VictoriaMetrics/metrics"
)
func statDial(network, addr string) (conn net.Conn, err error) {
if !strings.HasPrefix(network, "tcp") {
return nil, fmt.Errorf("unexpected network passed to statDial: %q; it must start from `tcp`", network)
}
if netutil.TCP6Enabled() {
conn, err = fasthttp.DialDualStack(addr)
} else {
conn, err = fasthttp.Dial(addr)
}
func statDial(networkUnused, addr string) (conn net.Conn, err error) {
network := netutil.GetTCPNetwork()
conn, err = net.DialTimeout(network, addr, 5*time.Second)
dialsTotal.Inc()
if err != nil {
dialErrors.Inc()

View File

@@ -88,3 +88,9 @@ vmalert-local-with-goarch:
vmalert-pure:
APP_NAME=vmalert $(MAKE) app-local-pure
vmalert-windows-amd64:
GOARCH=amd64 APP_NAME=vmalert $(MAKE) app-local-windows-with-goarch
vmalert-windows-amd64-prod:
APP_NAME=vmalert $(MAKE) app-via-docker-windows-amd64

View File

@@ -1,21 +1,22 @@
## vmalert
# vmalert
`vmalert` executes a list of given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
rules against configured address.
### Features:
## Features
* Integration with [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) TSDB;
* VictoriaMetrics [MetricsQL](https://victoriametrics.github.io/MetricsQL.html)
* VictoriaMetrics [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html)
support and expressions validation;
* Prometheus [alerting rules definition format](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#defining-alerting-rules)
support;
* Integration with [Alertmanager](https://github.com/prometheus/alertmanager);
* Keeps the alerts [state on restarts](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmalert#alerts-state-on-restarts);
* Keeps the alerts [state on restarts](#alerts-state-on-restarts);
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite) for details.
* Lightweight without extra dependencies.
### Limitations:
* `vmalert` execute queries against remote datasource which has reliability risks because of network.
## Limitations
* `vmalert` execute queries against remote datasource which has reliability risks because of network.
It is recommended to configure alerts thresholds and rules expressions with understanding that network request
may fail;
* by default, rules execution is sequential within one group, but persisting of execution results to remote
@@ -23,7 +24,7 @@ storage is asynchronous. Hence, user shouldn't rely on recording rules chaining
recording rule is reused in next one;
* `vmalert` has no UI, just an API for getting groups and rules statuses.
### QuickStart
## QuickStart
To build `vmalert` from sources:
```
@@ -36,7 +37,7 @@ The build binary will be placed to `VictoriaMetrics/bin` folder.
To start using `vmalert` you will need the following things:
* list of rules - PromQL/MetricsQL expressions to execute;
* datasource address - reachable VictoriaMetrics instance for rules execution;
* notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
* notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
aggregating alerts and sending notifications.
* remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional.
@@ -50,16 +51,15 @@ Then configure `vmalert` accordingly:
-remoteWrite.url=http://localhost:8428 \ # remote write compatible storage to persist rules
-remoteRead.url=http://localhost:8428 \ # PromQL compatible datasource to restore alerts state from
-external.label=cluster=east-1 \ # External label to be applied for each rule
-external.label=replica=a \ # Multiple external labels may be set
-evaluationInterval=3s # Default evaluation interval if not specified in rules group
-external.label=replica=a # Multiple external labels may be set
```
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
similar to Prometheus rules and configured using YAML. Configuration examples may be found
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
similar to Prometheus rules and configured using YAML. Configuration examples may be found
in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder.
Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups:
```yaml
@@ -67,7 +67,7 @@ groups:
[ - <rule_group> ]
```
#### Groups
### Groups
Each group has following attributes:
```yaml
@@ -78,34 +78,50 @@ name: <string>
[ interval: <duration> | default = global.evaluation_interval ]
# How many rules execute at once. Increasing concurrency may speed
# up round execution speed.
# up round execution speed.
[ concurrency: <integer> | default = 1 ]
# Optional type for expressions inside the rules. Supported values: "graphite" and "prometheus".
# By default "prometheus" rule type is used.
[ type: <string> ]
# Optional list of label filters applied to every rule's
# request withing a group. Is compatible only with VM datasource.
# See more details at https://docs.victoriametrics.com#prometheus-querying-api-enhancements
extra_filter_labels:
[ <labelname>: <labelvalue> ... ]
rules:
[ - <rule> ... ]
```
#### Rules
### Rules
There are two types of Rules:
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allows to define alert conditions via [MetricsQL](https://victoriametrics.github.io/MetricsQL.html)
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allows to define alert conditions via [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html)
and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager).
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow you to precompute frequently needed or computationally expensive expressions
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow you to precompute frequently needed or computationally expensive expressions
and save their result as a new set of time series.
`vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels
within one group.
within one group.
##### Alerting rules
#### Alerting rules
The syntax for alerting rule is following:
```yaml
# The name of the alert. Must be a valid metric name.
alert: <string>
# The MetricsQL expression to evaluate.
# Optional type for the rule. Supported values: "graphite", "prometheus".
# By default "prometheus" rule type is used.
[ type: <string> ]
# The expression to evaluate. The expression language depends on the type value.
# By default MetricsQL expression is used. If type="graphite", then the expression
# must contain valid Graphite expression.
expr: <string>
# Alerts are considered firing once they have been returned for this long.
@@ -119,16 +135,27 @@ labels:
# Annotations to add to each alert.
annotations:
[ <labelname>: <tmpl_string> ]
```
```
##### Recording rules
It is allowed to use [Go templating](https://golang.org/pkg/text/template/) in annotations
to format data, iterate over it or execute expressions.
Additionally, `vmalert` provides some extra templating functions
listed [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/notifier/template_func.go).
#### Recording rules
The syntax for recording rules is following:
```yaml
# The name of the time series to output to. Must be a valid metric name.
record: <string>
# The MetricsQL expression to evaluate.
# Optional type for the rule. Supported values: "graphite", "prometheus".
# By default "prometheus" rule type is used.
[ type: <string> ]
# The expression to evaluate. The expression language depends on the type value.
# By default MetricsQL expression is used. If type="graphite", then the expression
# must contain valid Graphite expression.
expr: <string>
# Labels to add or overwrite before storing the result.
@@ -139,47 +166,88 @@ labels:
For recording rules to work `-remoteWrite.url` must specified.
#### Alerts state on restarts
### Alerts state on restarts
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert`
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert`
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or VMInsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
These are regular time series and may be queried from VM just as any other time series.
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
These are regular time series and may be queried from VM just as any other time series.
The state stored to the configured address on every rule evaluation.
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or VMSelect (Cluster). `vmalert` will try to restore alerts state
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state
from configured address by querying time series with name `ALERTS_FOR_STATE`.
Both flags are required for the proper state restoring. Restore process may fail if time series are missing
in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert`
in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert`
rules configuration.
#### WEB
### Multitenancy
There are the following approaches for alerting and recording rules across [multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy) exist:
* To run a separate `vmalert` instance per each tenant. The corresponding tenant must be specified in `-datasource.url` command-line flag according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format). For example, `/path/to/vmalert -datasource.url=http://vmselect:8481/select/123/prometheus` would run alerts against `AccountID=123`. For recording rules the `-remoteWrite.url` command-line flag must contain the url for the specific tenant as well. For example, `-remoteWrite.url=http://vminsert:8480/insert/123/prometheus` would write recording rules to `AccountID=123`.
* To specify `tenant` parameter per each alerting and recording group if [enterprise version of vmalert](https://victoriametrics.com/enterprise.html) is used with `-clusterMode` command-line flag. For example:
```yaml
groups:
- name: rules_for_tenant_123
tenant: "123"
rules:
# Rules for accountID=123
- name: rules_for_tenant_456:789
tenant: "456:789"
rules:
# Rules for accountID=456, projectID=789
```
If `-clusterMode` is enabled, then `-datasource.url`, `-remoteRead.url` and `-remoteWrite.url` must contain only the hostname without tenant id. For example: `-datasource.url=http://vmselect:8481` . `vmselect` automatically adds the specified tenant to urls per each recording rule in this case.
The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz` files at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise` tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### WEB
`vmalert` runs a web-server (`-httpListenAddr`) for serving metrics and alerts endpoints:
* `http://<vmalert-addr>/api/v1/groups` - list of all loaded groups and rules;
* `http://<vmalert-addr>/api/v1/alerts` - list of all active alerts;
* `http://<vmalert-addr>/api/v1/<groupName>/<alertID>/status" ` - get alert status by ID.
* `http://<vmalert-addr>/api/v1/<groupID>/<alertID>/status" ` - get alert status by ID.
Used as alert source in AlertManager.
* `http://<vmalert-addr>/metrics` - application metrics.
* `http://<vmalert-addr>/-/reload` - hot configuration reload.
### Configuration
## Graphite
vmalert sends requests to `<-datasource.url>/render?format=json` during evaluation of alerting and recording rules
if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render`
implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) for `format=json`.
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on query type.
## Configuration
The shortlist of configuration flags is the following:
```
-datasource.appendTypePrefix
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
-datasource.basicAuth.password string
Optional basic auth password for -datasource.url
-datasource.basicAuth.username string
Optional basic auth username for -datasource.url
-datasource.lookback duration
Lookback defines how far to look into past when evaluating queries. For example, if datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
-datasource.maxIdleConnections int
Defines the number of idle (keep-alive connections) to configured datasource.Consider to set this value equal to the value: groups_total * group.concurrency. Too low value may result into high number of sockets in TIME_WAIT state. (default 100)
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
-datasource.queryStep duration
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
-datasource.roundDigits int
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
-datasource.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default system CA is used
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
-datasource.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
-datasource.tlsInsecureSkipVerify
@@ -187,11 +255,13 @@ The shortlist of configuration flags is the following:
-datasource.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
-datasource.tlsServerName string
Optional TLS server name to use for connections to -datasource.url. By default the server name from -datasource.url is used
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
-datasource.url string
Victoria Metrics or VMSelect url. Required parameter. E.g. http://127.0.0.1:8428
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
-dryRun -rule
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
@@ -200,74 +270,85 @@ The shortlist of configuration flags is the following:
How often to evaluate the rules (default 1m0s)
-external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-external.url string
External URL is used as alert's source for sent alerts to the notifier
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
Address to listen for http connections (default ":8880")
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit (default 10)
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-memory.allowedBytes value
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-notifier.basicAuth.password array
Optional basic auth password for -datasource.url
Supports array of values separated by comma or specified via multiple flags.
Optional basic auth password for -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.basicAuth.username array
Optional basic auth username for -datasource.url
Supports array of values separated by comma or specified via multiple flags.
Optional basic auth username for -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
-notifier.tlsInsecureSkipVerify
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsInsecureSkipVerify array
Whether to skip tls verification when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
-notifier.tlsKeyFile array
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsServerName array
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-remoteRead.basicAuth.password string
Optional basic auth password for -remoteRead.url
-remoteRead.basicAuth.username string
Optional basic auth username for -remoteRead.url
-remoteRead.ignoreRestoreErrors
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.tlsCAFile string
@@ -281,7 +362,7 @@ The shortlist of configuration flags is the following:
-remoteRead.tlsServerName string
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
-remoteRead.url vmalert
Optional URL to Victoria Metrics or VMSelect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
-remoteWrite.basicAuth.password string
Optional basic auth password for -remoteWrite.url
-remoteWrite.basicAuth.username string
@@ -305,16 +386,16 @@ The shortlist of configuration flags is the following:
-remoteWrite.tlsServerName string
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
-remoteWrite.url string
Optional URL to Victoria Metrics or VMInsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
@@ -322,57 +403,57 @@ The shortlist of configuration flags is the following:
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```
Pass `-help` to `vmalert` in order to see the full list of supported
Pass `-help` to `vmalert` in order to see the full list of supported
command-line flags with their descriptions.
To reload configuration without `vmalert` restart send SIGHUP signal
or send GET request to `/-/reload` endpoint.
### Contributing
## Contributing
`vmalert` is mostly designed and built by VictoriaMetrics community.
Feel free to share your experience and ideas for improving this
Feel free to share your experience and ideas for improving this
software. Please keep simplicity as the main priority.
### How to build from sources
## How to build from sources
It is recommended using
[binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
It is recommended using
[binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
- `vmalert` is located in `vmutils-*` archives there.
#### Development build
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make vmalert` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmalert` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert` binary and puts it into the `bin` folder.
#### Production build
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmalert-prod` from the root folder of the repository.
2. Run `make vmalert-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert-prod` binary and puts it into the `bin` folder.
#### ARM build
### ARM build
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
#### Development ARM build
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make vmalert-arm` or `make vmalert-arm64` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmalert-arm` or `make vmalert-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert-arm` or `vmalert-arm64` binary respectively and puts it into the `bin` folder.
#### Production ARM build
### Production ARM build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmalert-arm-prod` or `make vmalert-arm64-prod` from the root folder of the repository.
2. Run `make vmalert-arm-prod` or `make vmalert-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert-arm-prod` or `vmalert-arm64-prod` binary respectively and puts it into the `bin` folder.

View File

@@ -19,6 +19,7 @@ import (
// AlertingRule is basic alert entity
type AlertingRule struct {
Type datasource.Type
RuleID uint64
Name string
Expr string
@@ -28,6 +29,8 @@ type AlertingRule struct {
GroupID uint64
GroupName string
q datasource.Querier
// guard status fields
mu sync.RWMutex
// stores list of active alerts
@@ -48,8 +51,9 @@ type alertingRuleMetrics struct {
active *gauge
}
func newAlertingRule(group *Group, cfg config.Rule) *AlertingRule {
func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *AlertingRule {
ar := &AlertingRule{
Type: cfg.Type,
RuleID: cfg.ID,
Name: cfg.Alert,
Expr: cfg.Expr,
@@ -58,8 +62,13 @@ func newAlertingRule(group *Group, cfg config.Rule) *AlertingRule {
Annotations: cfg.Annotations,
GroupID: group.ID(),
GroupName: group.Name,
alerts: make(map[uint64]*notifier.Alert),
metrics: &alertingRuleMetrics{},
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: &cfg.Type,
EvaluationInterval: group.Interval,
ExtraLabels: group.ExtraFilterLabels,
}),
alerts: make(map[uint64]*notifier.Alert),
metrics: &alertingRuleMetrics{},
}
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
@@ -119,8 +128,8 @@ func (ar *AlertingRule) ID() uint64 {
// Exec executes AlertingRule expression via the given Querier.
// Based on the Querier results AlertingRule maintains notifier.Alerts
func (ar *AlertingRule) Exec(ctx context.Context, q datasource.Querier, series bool) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := q.Query(ctx, ar.Expr)
func (ar *AlertingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := ar.q.Query(ctx, ar.Expr)
ar.mu.Lock()
defer ar.mu.Unlock()
@@ -137,7 +146,7 @@ func (ar *AlertingRule) Exec(ctx context.Context, q datasource.Querier, series b
}
}
qFn := func(query string) ([]datasource.Metric, error) { return q.Query(ctx, query) }
qFn := func(query string) ([]datasource.Metric, error) { return ar.q.Query(ctx, query) }
updated := make(map[uint64]struct{})
// update list of active alerts
for _, m := range qMetrics {
@@ -242,6 +251,7 @@ func (ar *AlertingRule) UpdateWith(r Rule) error {
ar.For = nr.For
ar.Labels = nr.Labels
ar.Annotations = nr.Annotations
ar.q = nr.q
return nil
}
@@ -310,6 +320,7 @@ func (ar *AlertingRule) RuleAPI() APIAlertingRule {
// encode as strings to avoid rounding
ID: fmt.Sprintf("%d", ar.ID()),
GroupID: fmt.Sprintf("%d", ar.GroupID),
Type: ar.Type.String(),
Name: ar.Name,
Expression: ar.Expr,
For: ar.For.String(),
@@ -404,7 +415,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
return fmt.Errorf("querier is nil")
}
qFn := func(query string) ([]datasource.Metric, error) { return q.Query(ctx, query) }
qFn := func(query string) ([]datasource.Metric, error) { return ar.q.Query(ctx, query) }
// account for external labels in filter
var labelsFilter string

View File

@@ -294,11 +294,12 @@ func TestAlertingRule_Exec(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
for _, step := range tc.steps {
fq.reset()
fq.add(step...)
if _, err := tc.rule.Exec(context.TODO(), fq, false); err != nil {
if _, err := tc.rule.Exec(context.TODO(), false); err != nil {
t.Fatalf("unexpected err: %s", err)
}
// artificial delay between applying steps
@@ -410,6 +411,7 @@ func TestAlertingRule_Restore(t *testing.T) {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if err := tc.rule.Restore(context.TODO(), fq, time.Hour, nil); err != nil {
t.Fatalf("unexpected err: %s", err)
@@ -437,17 +439,18 @@ func TestAlertingRule_Exec_Negative(t *testing.T) {
fq := &fakeQuerier{}
ar := newTestAlertingRule("test", 0)
ar.Labels = map[string]string{"job": "test"}
ar.q = fq
// successful attempt
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "bar"))
_, err := ar.Exec(context.TODO(), fq, false)
_, err := ar.Exec(context.TODO(), false)
if err != nil {
t.Fatal(err)
}
// label `job` will collide with rule extra label and will make both time series equal
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "baz"))
_, err = ar.Exec(context.TODO(), fq, false)
_, err = ar.Exec(context.TODO(), false)
if !errors.Is(err, errDuplicate) {
t.Fatalf("expected to have %s error; got %s", errDuplicate, err)
}
@@ -456,7 +459,7 @@ func TestAlertingRule_Exec_Negative(t *testing.T) {
expErr := "connection reset by peer"
fq.setErr(errors.New(expErr))
_, err = ar.Exec(context.TODO(), fq, false)
_, err = ar.Exec(context.TODO(), false)
if err == nil {
t.Fatalf("expected to get err; got nil")
}
@@ -544,8 +547,9 @@ func TestAlertingRule_Template(t *testing.T) {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if _, err := tc.rule.Exec(context.TODO(), fq, false); err != nil {
if _, err := tc.rule.Exec(context.TODO(), false); err != nil {
t.Fatalf("unexpected err: %s", err)
}
for hash, expAlert := range tc.expAlerts {

View File

@@ -10,6 +10,8 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
@@ -21,11 +23,16 @@ import (
// Group contains list of Rules grouped into
// entity with one name and evaluation interval
type Group struct {
Type datasource.Type `yaml:"type,omitempty"`
File string
Name string `yaml:"name"`
Interval time.Duration `yaml:"interval,omitempty"`
Rules []Rule `yaml:"rules"`
Concurrency int `yaml:"concurrency"`
// ExtraFilterLabels is a list label filters applied to every rule
// request withing a group. Is compatible only with VM datasources.
// See https://docs.victoriametrics.com#prometheus-querying-api-enhancements
ExtraFilterLabels map[string]string `yaml:"extra_filter_labels"`
// Checksum stores the hash of yaml definition for this group.
// May be used to detect any changes like rules re-ordering etc.
Checksum string
@@ -44,6 +51,19 @@ func (g *Group) UnmarshalYAML(unmarshal func(interface{}) error) error {
if err != nil {
return fmt.Errorf("failed to marshal group configuration for checksum: %w", err)
}
// change default value to prometheus datasource.
if g.Type.Get() == "" {
g.Type.Set(datasource.NewPrometheusType())
}
// update rules with empty type.
for i, r := range g.Rules {
if r.Type.Get() == "" {
r.Type.Set(g.Type)
r.ID = HashRule(r)
g.Rules[i] = r
}
}
h := md5.New()
h.Write(b)
g.Checksum = fmt.Sprintf("%x", h.Sum(nil))
@@ -58,6 +78,7 @@ func (g *Group) Validate(validateAnnotations, validateExpressions bool) error {
if len(g.Rules) == 0 {
return fmt.Errorf("group %q can't contain no rules", g.Name)
}
uniqueRules := map[uint64]struct{}{}
for _, r := range g.Rules {
ruleName := r.Record
@@ -72,7 +93,13 @@ func (g *Group) Validate(validateAnnotations, validateExpressions bool) error {
return fmt.Errorf("invalid rule %q.%q: %w", g.Name, ruleName, err)
}
if validateExpressions {
if _, err := metricsql.Parse(r.Expr); err != nil {
// its needed only for tests.
// because correct types must be inherited after unmarshalling.
exprValidator := g.Type.ValidateExpr
if r.Type.Get() != "" {
exprValidator = r.Type.ValidateExpr
}
if err := exprValidator(r.Expr); err != nil {
return fmt.Errorf("invalid expression for rule %q.%q: %w", g.Name, ruleName, err)
}
}
@@ -92,6 +119,7 @@ func (g *Group) Validate(validateAnnotations, validateExpressions bool) error {
// recording rule or alerting rule.
type Rule struct {
ID uint64
Type datasource.Type `yaml:"type,omitempty"`
Record string `yaml:"record,omitempty"`
Alert string `yaml:"alert,omitempty"`
Expr string `yaml:"expr"`
@@ -169,6 +197,7 @@ func HashRule(r Rule) uint64 {
h.Write([]byte("alerting"))
h.Write([]byte(r.Alert))
}
h.Write([]byte(r.Type.Get()))
kv := sortMap(r.Labels)
for _, i := range kv {
h.Write([]byte(i.key))

View File

@@ -7,6 +7,8 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
@@ -53,6 +55,10 @@ func TestParseBad(t *testing.T) {
[]string{"testdata/dir/rules4-bad.rules"},
"either `record` or `alert` must be set",
},
{
[]string{"testdata/rules1-bad.rules"},
"bad graphite expr",
},
}
for _, tc := range testCases {
_, err := Parse(tc.path, true, true)
@@ -215,6 +221,75 @@ func TestGroup_Validate(t *testing.T) {
},
expErr: "",
},
{
group: &Group{Name: "test thanos",
Type: datasource.NewRawType("thanos"),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "{{ value|query }}",
}},
},
},
validateExpressions: true,
expErr: "unknown datasource type",
},
{
group: &Group{Name: "test graphite",
Type: datasource.NewGraphiteType(),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "some-description",
}},
},
},
validateExpressions: true,
expErr: "",
},
{
group: &Group{Name: "test prometheus",
Type: datasource.NewPrometheusType(),
Rules: []Rule{
{Alert: "alert", Expr: "up == 1", Labels: map[string]string{
"description": "{{ value|query }}",
}},
},
},
validateExpressions: true,
expErr: "",
},
{
group: &Group{
Name: "test graphite inherit",
Type: datasource.NewGraphiteType(),
Rules: []Rule{
{
Expr: "sumSeries(time('foo.bar',10))",
For: PromDuration{milliseconds: 10},
},
{
Expr: "sum(up == 0 ) by (host)",
Type: datasource.NewPrometheusType(),
},
},
},
},
{
group: &Group{
Name: "test graphite prometheus bad expr",
Type: datasource.NewGraphiteType(),
Rules: []Rule{
{
Expr: "sum(up == 0 ) by (host)",
For: PromDuration{milliseconds: 10},
},
{
Expr: "sumSeries(time('foo.bar',10))",
Type: datasource.NewPrometheusType(),
},
},
},
expErr: "invalid rule",
},
}
for _, tc := range testCases {
err := tc.group.Validate(tc.validateAnnotations, tc.validateExpressions)

View File

@@ -0,0 +1,13 @@
groups:
- name: TestUpdateGroup
interval: 2s
concurrency: 2
type: prometheus
rules:
- alert: up
expr: up == 0
for: 30s
- alert: up graphite
expr: filterSeries(time('host.1',20),'>','0')
for: 30s
type: graphite

View File

@@ -0,0 +1,12 @@
groups:
- name: TestUpdateGroup
interval: 30s
type: graphite
rules:
- alert: up
expr: filterSeries(time('host.2',20),'>','0')
for: 30s
- alert: up graphite
expr: filterSeries(time('host.1',20),'>','0')
for: 30s
type: graphite

View File

@@ -0,0 +1,15 @@
groups:
- name: alertmanager.rules
rules:
- alert: AlertmanagerConfigInconsistent
annotations:
message: |
The configuration of the instances of the Alertmanager cluster `{{ $labels.namespace }}/{{ $labels.service }}` are out of sync.
{{ range printf "alertmanager_config_hash{namespace=\"%s\",service=\"%s\"}" $labels.namespace $labels.service | query }}
Configuration hash for pod {{ .Labels.pod }} is "{{ printf "%.f" .Value }}"
{{ end }}
expr: |
count by(namespace,service) (count_values by(namespace,service) ("config_hash", alertmanager_config_hash{job="alertmanager-main",namespace="openshift-monitoring"})) != 1
for: 5m
labels:
severity: critical

View File

@@ -0,0 +1,12 @@
groups:
- name: TestGraphiteBadGroup
interval: 2s
concurrency: 2
type: graphite
rules:
- alert: Conns
expr: filterSeries(sumSeries(host.receiver.interface.cons),'last','>', 500) by instance
for: 3m
annotations:
summary: Too high connection number for {{$labels.instance}}
description: "It is {{ $value }} connections for {{$labels.instance}}"

View File

@@ -2,6 +2,8 @@ groups:
- name: TestGroup
interval: 2s
concurrency: 2
extra_filter_labels:
job: victoriametrics
rules:
- alert: Conns
expr: sum(vm_tcplistener_conns) by(instance) > 1
@@ -17,6 +19,7 @@ groups:
(up == 1)
labels:
job: '{{ $labels.job }}'
dynamic: '{{ $x := query "up" | first | value }}{{ if eq 1.0 $x }}one{{ else }}unknown{{ end }}'
annotations:
description: Job {{ $labels.job }} is up!
summary: All instances up {{ range query "up" }}

View File

@@ -0,0 +1,30 @@
groups:
- name: TestGroup
interval: 2s
concurrency: 2
type: graphite
rules:
- alert: Conns
expr: filterSeries(sumSeries(host.receiver.interface.cons),'last','>', 500)
for: 3m
annotations:
summary: Too high connection number for {{$labels.instance}}
description: "It is {{ $value }} connections for {{$labels.instance}}"
- name: TestGroupPromMixed
interval: 2s
concurrency: 2
type: prometheus
rules:
- alert: Conns
expr: sum(vm_tcplistener_conns) by (instance) > 1
for: 3m
annotations:
summary: Too high connection number for {{$labels.instance}}
description: "It is {{ $value }} connections for {{$labels.instance}}"
- alert: HostDown
type: graphite
expr: filterSeries(sumSeries(host.receiver.interface.up),'last','=', 0)
for: 3m
annotations:
summary: Too high connection number for {{$labels.instance}}
description: "It is {{ $value }} connections for {{$labels.instance}}"

View File

@@ -1,6 +1,13 @@
package datasource
import "context"
import (
"context"
)
// QuerierBuilder builds Querier with given params.
type QuerierBuilder interface {
BuildWithParams(params QuerierParams) Querier
}
// Querier interface wraps Query method which
// executes given query and returns list of Metrics

View File

@@ -4,40 +4,59 @@ import (
"flag"
"fmt"
"net/http"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
var (
addr = flag.String("datasource.url", "", "Victoria Metrics or VMSelect url. Required parameter."+
" E.g. http://127.0.0.1:8428")
addr = flag.String("datasource.url", "", "VictoriaMetrics or vmselect url. Required parameter. "+
"E.g. http://127.0.0.1:8428")
appendTypePrefix = flag.Bool("datasource.appendTypePrefix", false, "Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.")
basicAuthUsername = flag.String("datasource.basicAuth.username", "", "Optional basic auth username for -datasource.url")
basicAuthPassword = flag.String("datasource.basicAuth.password", "", "Optional basic auth password for -datasource.url")
tlsInsecureSkipVerify = flag.Bool("datasource.tlsInsecureSkipVerify", false, "Whether to skip tls verification when connecting to -datasource.url")
tlsCertFile = flag.String("datasource.tlsCertFile", "", "Optional path to client-side TLS certificate file to use when connecting to -datasource.url")
tlsKeyFile = flag.String("datasource.tlsKeyFile", "", "Optional path to client-side TLS certificate key to use when connecting to -datasource.url")
tlsCAFile = flag.String("datasource.tlsCAFile", "", "Optional path to TLS CA file to use for verifying connections to -datasource.url. "+
"By default system CA is used")
tlsServerName = flag.String("datasource.tlsServerName", "", "Optional TLS server name to use for connections to -datasource.url. "+
"By default the server name from -datasource.url is used")
tlsCAFile = flag.String("datasource.tlsCAFile", "", `Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used`)
tlsServerName = flag.String("datasource.tlsServerName", "", `Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used`)
lookBack = flag.Duration("datasource.lookback", 0, "Lookback defines how far to look into past when evaluating queries. "+
"For example, if datasource.lookback=5m then param \"time\" with value now()-5m will be added to every query.")
maxIdleConnections = flag.Int("datasource.maxIdleConnections", 100, "Defines the number of idle (keep-alive connections) to configured datasource."+
"Consider to set this value equal to the value: groups_total * group.concurrency. Too low value may result into high number of sockets in TIME_WAIT state.")
lookBack = flag.Duration("datasource.lookback", 0, `Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
queryStep = flag.Duration("datasource.queryStep", 0, "queryStep defines how far a value can fallback to when evaluating queries. "+
"For example, if datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query."+
"If queryStep isn't specified, rule's evaluationInterval will be used instead.")
maxIdleConnections = flag.Int("datasource.maxIdleConnections", 100, `Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state.`)
roundDigits = flag.Int("datasource.roundDigits", 0, `Adds "round_digits" GET param to datasource requests. `+
`In VM "round_digits" limits the number of digits after the decimal point in response values.`)
)
// Init creates a Querier from provided flag values.
func Init() (Querier, error) {
func Init() (QuerierBuilder, error) {
if *addr == "" {
return nil, fmt.Errorf("datasource.url is empty")
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
tr.MaxIdleConns = *maxIdleConnections
c := &http.Client{Transport: tr}
return NewVMStorage(*addr, *basicAuthUsername, *basicAuthPassword, *lookBack, c), nil
tr.MaxIdleConnsPerHost = *maxIdleConnections
var rd string
if *roundDigits > 0 {
rd = fmt.Sprintf("%d", *roundDigits)
}
return &VMStorage{
c: &http.Client{Transport: tr},
basicAuthUser: *basicAuthUsername,
basicAuthPass: *basicAuthPassword,
datasourceURL: strings.TrimSuffix(*addr, "/"),
appendTypePrefix: *appendTypePrefix,
lookBack: *lookBack,
queryStep: *queryStep,
roundDigits: rd,
dataSourceType: NewPrometheusType(),
}, nil
}

View File

@@ -0,0 +1,89 @@
package datasource
import (
"fmt"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/graphiteql"
"github.com/VictoriaMetrics/metricsql"
)
const graphiteType = "graphite"
const prometheusType = "prometheus"
// Type represents data source type
type Type struct {
name string
}
// NewPrometheusType returns prometheus datasource type
func NewPrometheusType() Type {
return Type{name: prometheusType}
}
// NewGraphiteType returns graphite datasource type
func NewGraphiteType() Type {
return Type{name: graphiteType}
}
// NewRawType returns datasource type from raw string
// without validation.
func NewRawType(d string) Type {
return Type{name: d}
}
// Get returns datasource type
func (t *Type) Get() string {
return t.name
}
// Set changes datasource type
func (t *Type) Set(d Type) {
t.name = d.name
}
// String implements String interface with default value.
func (t Type) String() string {
if t.name == "" {
return prometheusType
}
return t.name
}
// ValidateExpr validates query expression with datasource ql.
func (t *Type) ValidateExpr(expr string) error {
switch t.name {
case graphiteType:
if _, err := graphiteql.Parse(expr); err != nil {
return fmt.Errorf("bad graphite expr: %q, err: %w", expr, err)
}
case "", prometheusType:
if _, err := metricsql.Parse(expr); err != nil {
return fmt.Errorf("bad prometheus expr: %q, err: %w", expr, err)
}
default:
return fmt.Errorf("unknown datasource type=%q", t.name)
}
return nil
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (t *Type) UnmarshalYAML(unmarshal func(interface{}) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
switch s {
case "":
s = prometheusType
case graphiteType, prometheusType:
default:
return fmt.Errorf("unknown datasource type=%q, want %q or %q", s, prometheusType, graphiteType)
}
t.name = s
return nil
}
// MarshalYAML implements the yaml.Unmarshaler interface.
func (t Type) MarshalYAML() (interface{}, error) {
return t.name, nil
}

View File

@@ -6,7 +6,6 @@ import (
"fmt"
"io/ioutil"
"net/http"
"net/url"
"strconv"
"strings"
"time"
@@ -46,39 +45,132 @@ func (r response) metrics() ([]Metric, error) {
return ms, nil
}
type graphiteResponse []graphiteResponseTarget
type graphiteResponseTarget struct {
Target string `json:"target"`
Tags map[string]string `json:"tags"`
DataPoints [][2]float64 `json:"datapoints"`
}
func (r graphiteResponse) metrics() []Metric {
var ms []Metric
for _, res := range r {
if len(res.DataPoints) < 1 {
continue
}
var m Metric
// add only last value to the result.
last := res.DataPoints[len(res.DataPoints)-1]
m.Value = last[0]
m.Timestamp = int64(last[1])
for k, v := range res.Tags {
m.AddLabel(k, v)
}
ms = append(ms, m)
}
return ms
}
// VMStorage represents vmstorage entity with ability to read and write metrics
type VMStorage struct {
c *http.Client
queryURL string
basicAuthUser string
basicAuthPass string
lookBack time.Duration
c *http.Client
datasourceURL string
basicAuthUser string
basicAuthPass string
appendTypePrefix bool
lookBack time.Duration
queryStep time.Duration
roundDigits string
dataSourceType Type
evaluationInterval time.Duration
extraLabels []string
}
const queryPath = "/api/v1/query?query="
const queryPath = "/api/v1/query"
const graphitePath = "/render"
const prometheusPrefix = "/prometheus"
const graphitePrefix = "/graphite"
// QuerierParams params for Querier.
type QuerierParams struct {
DataSourceType *Type
EvaluationInterval time.Duration
// see https://docs.victoriametrics.com/#prometheus-querying-api-enhancements
ExtraLabels map[string]string
}
// Clone makes clone of VMStorage, shares http client.
func (s *VMStorage) Clone() *VMStorage {
return &VMStorage{
c: s.c,
datasourceURL: s.datasourceURL,
basicAuthUser: s.basicAuthUser,
basicAuthPass: s.basicAuthPass,
lookBack: s.lookBack,
queryStep: s.queryStep,
appendTypePrefix: s.appendTypePrefix,
dataSourceType: s.dataSourceType,
}
}
// ApplyParams - changes given querier params.
func (s *VMStorage) ApplyParams(params QuerierParams) *VMStorage {
if params.DataSourceType != nil {
s.dataSourceType = *params.DataSourceType
}
s.evaluationInterval = params.EvaluationInterval
for k, v := range params.ExtraLabels {
s.extraLabels = append(s.extraLabels, fmt.Sprintf("%s=%s", k, v))
}
return s
}
// BuildWithParams - implements interface.
func (s *VMStorage) BuildWithParams(params QuerierParams) Querier {
return s.Clone().ApplyParams(params)
}
// NewVMStorage is a constructor for VMStorage
func NewVMStorage(baseURL, basicAuthUser, basicAuthPass string, lookBack time.Duration, c *http.Client) *VMStorage {
func NewVMStorage(baseURL, basicAuthUser, basicAuthPass string, lookBack time.Duration, queryStep time.Duration, appendTypePrefix bool, c *http.Client) *VMStorage {
return &VMStorage{
c: c,
basicAuthUser: basicAuthUser,
basicAuthPass: basicAuthPass,
queryURL: strings.TrimSuffix(baseURL, "/") + queryPath,
lookBack: lookBack,
c: c,
basicAuthUser: basicAuthUser,
basicAuthPass: basicAuthPass,
datasourceURL: strings.TrimSuffix(baseURL, "/"),
appendTypePrefix: appendTypePrefix,
lookBack: lookBack,
queryStep: queryStep,
dataSourceType: NewPrometheusType(),
}
}
// Query reads metrics from datasource by given query
// Query executes the given query and returns parsed response
func (s *VMStorage) Query(ctx context.Context, query string) ([]Metric, error) {
const (
statusSuccess, statusError, rtVector = "success", "error", "vector"
)
q := s.queryURL + url.QueryEscape(query)
if s.lookBack > 0 {
lookBack := time.Now().Add(-s.lookBack)
q += fmt.Sprintf("&time=%d", lookBack.Unix())
req, err := s.prepareReq(query, time.Now())
if err != nil {
return nil, err
}
req, err := http.NewRequest("POST", q, nil)
resp, err := s.do(ctx, req)
if err != nil {
return nil, err
}
defer func() {
_ = resp.Body.Close()
}()
parseFn := parsePrometheusResponse
if s.dataSourceType.name != prometheusType {
parseFn = parseGraphiteResponse
}
return parseFn(req, resp)
}
func (s *VMStorage) prepareReq(query string, timestamp time.Time) (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
if err != nil {
return nil, err
}
@@ -86,18 +178,88 @@ func (s *VMStorage) Query(ctx context.Context, query string) ([]Metric, error) {
if s.basicAuthPass != "" {
req.SetBasicAuth(s.basicAuthUser, s.basicAuthPass)
}
switch s.dataSourceType.name {
case "", prometheusType:
s.setPrometheusReqParams(req, query, timestamp)
case graphiteType:
s.setGraphiteReqParams(req, query, timestamp)
default:
return nil, fmt.Errorf("engine not found: %q", s.dataSourceType.name)
}
return req, nil
}
func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response, error) {
resp, err := s.c.Do(req.WithContext(ctx))
if err != nil {
return nil, fmt.Errorf("error getting response from %s: %w", req.URL, err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
body, _ := ioutil.ReadAll(resp.Body)
return nil, fmt.Errorf("datasource returns unexpected response code %d for %s. Response body %s", resp.StatusCode, req.URL, body)
_ = resp.Body.Close()
return nil, fmt.Errorf("unexpected response code %d for %s. Response body %s", resp.StatusCode, req.URL, body)
}
return resp, nil
}
func (s *VMStorage) setPrometheusReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
}
r.URL.Path += queryPath
q := r.URL.Query()
q.Set("query", query)
if s.lookBack > 0 {
timestamp = timestamp.Add(-s.lookBack)
}
if s.evaluationInterval > 0 {
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
timestamp = timestamp.Truncate(s.evaluationInterval)
// set step as evaluationInterval by default
q.Set("step", s.evaluationInterval.String())
}
q.Set("time", fmt.Sprintf("%d", timestamp.Unix()))
if s.queryStep > 0 {
// override step with user-specified value
q.Set("step", s.queryStep.String())
}
if s.roundDigits != "" {
q.Set("round_digits", s.roundDigits)
}
for _, l := range s.extraLabels {
q.Add("extra_label", l)
}
r.URL.RawQuery = q.Encode()
}
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += graphitePrefix
}
r.URL.Path += graphitePath
q := r.URL.Query()
q.Set("format", "json")
q.Set("target", query)
from := "-5min"
if s.lookBack > 0 {
lookBack := timestamp.Add(-s.lookBack)
from = strconv.FormatInt(lookBack.Unix(), 10)
}
q.Set("from", from)
q.Set("until", "now")
r.URL.RawQuery = q.Encode()
}
const (
statusSuccess, statusError, rtVector = "success", "error", "vector"
)
func parsePrometheusResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &response{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing metrics for %s: %w", req.URL, err)
return nil, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL, err)
}
if r.Status == statusError {
return nil, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL, r.ErrorType, r.Error)
@@ -110,3 +272,11 @@ func (s *VMStorage) Query(ctx context.Context, query string) ([]Metric, error) {
}
return r.metrics()
}
func parseGraphiteResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &graphiteResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL, err)
}
return r.metrics(), nil
}

View File

@@ -2,8 +2,10 @@ package datasource
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"reflect"
"strconv"
"testing"
"time"
@@ -14,6 +16,7 @@ var (
basicAuthName = "foo"
basicAuthPass = "bar"
query = "vm_rows"
queryRender = "constantLine(10)"
)
func TestVMSelectQuery(t *testing.T) {
@@ -22,6 +25,13 @@ func TestVMSelectQuery(t *testing.T) {
t.Errorf("should not be called")
})
c := -1
mux.HandleFunc("/render", func(w http.ResponseWriter, request *http.Request) {
c++
switch c {
case 7:
w.Write([]byte(`[{"target":"constantLine(10)","tags":{"name":"constantLine(10)"},"datapoints":[[10,1611758343],[10,1611758373],[10,1611758403]]}]`))
}
})
mux.HandleFunc("/api/v1/query", func(w http.ResponseWriter, r *http.Request) {
c++
if r.Method != http.MethodPost {
@@ -61,26 +71,31 @@ func TestVMSelectQuery(t *testing.T) {
srv := httptest.NewServer(mux)
defer srv.Close()
am := NewVMStorage(srv.URL, basicAuthName, basicAuthPass, time.Minute, srv.Client())
if _, err := am.Query(ctx, query); err == nil {
s := NewVMStorage(srv.URL, basicAuthName, basicAuthPass, time.Minute, 0, false, srv.Client())
p := NewPrometheusType()
pq := s.BuildWithParams(QuerierParams{DataSourceType: &p, EvaluationInterval: 15 * time.Second})
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected connection error got nil")
}
if _, err := am.Query(ctx, query); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected invalid response status error got nil")
}
if _, err := am.Query(ctx, query); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected response body error got nil")
}
if _, err := am.Query(ctx, query); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected error status got nil")
}
if _, err := am.Query(ctx, query); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected unknown status got nil")
}
if _, err := am.Query(ctx, query); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected non-vector resultType error got nil")
}
m, err := am.Query(ctx, query)
m, err := pq.Query(ctx, query)
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -92,10 +107,181 @@ func TestVMSelectQuery(t *testing.T) {
Timestamp: 1583786142,
Value: 13763,
}
if m[0].Timestamp != expected.Timestamp &&
m[0].Value != expected.Value &&
m[0].Labels[0].Value != expected.Labels[0].Value &&
m[0].Labels[0].Name != expected.Labels[0].Name {
if !reflect.DeepEqual(m[0], expected) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
g := NewGraphiteType()
gq := s.BuildWithParams(QuerierParams{DataSourceType: &g})
m, err = gq.Query(ctx, queryRender)
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}
expected = Metric{
Labels: []Label{{Value: "constantLine(10)", Name: "name"}},
Timestamp: 1611758403,
Value: 10,
}
if !reflect.DeepEqual(m[0], expected) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
}
func TestPrepareReq(t *testing.T) {
query := "up"
timestamp := time.Date(2001, 2, 3, 4, 5, 6, 0, time.UTC)
testCases := []struct {
name string
vm *VMStorage
checkFn func(t *testing.T, r *http.Request)
}{
{
"prometheus path",
&VMStorage{
dataSourceType: NewPrometheusType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, queryPath, r.URL.Path)
},
},
{
"prometheus prefix",
&VMStorage{
dataSourceType: NewPrometheusType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusPrefix+queryPath, r.URL.Path)
},
},
{
"graphite path",
&VMStorage{
dataSourceType: NewGraphiteType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, graphitePath, r.URL.Path)
},
},
{
"graphite prefix",
&VMStorage{
dataSourceType: NewGraphiteType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, graphitePrefix+graphitePath, r.URL.Path)
},
},
{
"default params",
&VMStorage{},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"basic auth",
&VMStorage{
basicAuthUser: "foo",
basicAuthPass: "bar",
},
func(t *testing.T, r *http.Request) {
u, p, _ := r.BasicAuth()
checkEqualString(t, "foo", u)
checkEqualString(t, "bar", p)
},
},
{
"lookback",
&VMStorage{
lookBack: time.Minute,
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&time=%d", query, timestamp.Add(-time.Minute).Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"evaluation interval",
&VMStorage{
evaluationInterval: 15 * time.Second,
},
func(t *testing.T, r *http.Request) {
evalInterval := 15 * time.Second
tt := timestamp.Truncate(evalInterval)
exp := fmt.Sprintf("query=%s&step=%v&time=%d", query, evalInterval, tt.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"lookback + evaluation interval",
&VMStorage{
lookBack: time.Minute,
evaluationInterval: 15 * time.Second,
},
func(t *testing.T, r *http.Request) {
evalInterval := 15 * time.Second
tt := timestamp.Add(-time.Minute)
tt = tt.Truncate(evalInterval)
exp := fmt.Sprintf("query=%s&step=%v&time=%d", query, evalInterval, tt.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"step override",
&VMStorage{
queryStep: time.Minute,
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&step=%v&time=%d", query, time.Minute, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"round digits",
&VMStorage{
roundDigits: "10",
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&round_digits=10&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"extra labels",
&VMStorage{
extraLabels: []string{
"env=prod",
"query=es=cape",
},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("extra_label=env%%3Dprod&extra_label=query%%3Des%%3Dcape&query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
req, err := tc.vm.prepareReq(query, timestamp)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tc.checkFn(t, req)
})
}
}
func checkEqualString(t *testing.T, exp, got string) {
t.Helper()
if got != exp {
t.Errorf("expected to get %q; got %q", exp, got)
}
}

View File

@@ -18,13 +18,15 @@ import (
// Group is an entity for grouping rules
type Group struct {
mu sync.RWMutex
Name string
File string
Rules []Rule
Interval time.Duration
Concurrency int
Checksum string
mu sync.RWMutex
Name string
File string
Rules []Rule
Type datasource.Type
Interval time.Duration
Concurrency int
Checksum string
ExtraFilterLabels map[string]string
doneCh chan struct{}
finishedCh chan struct{}
@@ -48,16 +50,19 @@ func newGroupMetrics(name, file string) *groupMetrics {
return m
}
func newGroup(cfg config.Group, defaultInterval time.Duration, labels map[string]string) *Group {
func newGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval time.Duration, labels map[string]string) *Group {
g := &Group{
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval,
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
updateCh: make(chan *Group),
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval,
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
ExtraFilterLabels: cfg.ExtraFilterLabels,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
updateCh: make(chan *Group),
}
g.metrics = newGroupMetrics(g.Name, g.File)
if g.Interval == 0 {
@@ -79,17 +84,17 @@ func newGroup(cfg config.Group, defaultInterval time.Duration, labels map[string
}
r.Labels[k] = v
}
rules[i] = g.newRule(r)
rules[i] = g.newRule(qb, r)
}
g.Rules = rules
return g
}
func (g *Group) newRule(rule config.Rule) Rule {
func (g *Group) newRule(qb datasource.QuerierBuilder, rule config.Rule) Rule {
if rule.Alert != "" {
return newAlertingRule(g, rule)
return newAlertingRule(qb, g, rule)
}
return newRecordingRule(g, rule)
return newRecordingRule(qb, g, rule)
}
// ID return unique group ID that consists of
@@ -99,11 +104,12 @@ func (g *Group) ID() uint64 {
hash.Write([]byte(g.File))
hash.Write([]byte("\xff"))
hash.Write([]byte(g.Name))
hash.Write([]byte(g.Type.Get()))
return hash.Sum64()
}
// Restore restores alerts state for group rules
func (g *Group) Restore(ctx context.Context, q datasource.Querier, lookback time.Duration, labels map[string]string) error {
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, lookback time.Duration, labels map[string]string) error {
for _, rule := range g.Rules {
rr, ok := rule.(*AlertingRule)
if !ok {
@@ -112,6 +118,9 @@ func (g *Group) Restore(ctx context.Context, q datasource.Querier, lookback time
if rr.For < 1 {
continue
}
// ignore g.ExtraFilterLabels on purpose, so it
// won't affect the restore procedure.
q := qb.BuildWithParams(datasource.QuerierParams{})
if err := rr.Restore(ctx, q, lookback, labels); err != nil {
return fmt.Errorf("error while restoring rule %q: %w", rule, err)
}
@@ -157,7 +166,9 @@ func (g *Group) updateWith(newGroup *Group) error {
for _, nr := range rulesRegistry {
newRules = append(newRules, nr)
}
g.Type = newGroup.Type
g.Concurrency = newGroup.Concurrency
g.ExtraFilterLabels = newGroup.ExtraFilterLabels
g.Checksum = newGroup.Checksum
g.Rules = newRules
return nil
@@ -185,7 +196,7 @@ func (g *Group) close() {
var skipRandSleepOnGroupStart bool
func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []notifier.Notifier, rw *remotewrite.Client) {
func (g *Group) start(ctx context.Context, nts []notifier.Notifier, rw *remotewrite.Client) {
defer func() { close(g.finishedCh) }()
// Spread group rules evaluation over time in order to reduce load on VictoriaMetrics.
@@ -209,7 +220,7 @@ func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []not
}
logger.Infof("group %q started; interval=%v; concurrency=%d", g.Name, g.Interval, g.Concurrency)
e := &executor{querier, nts, rw}
e := &executor{nts, rw}
t := time.NewTicker(g.Interval)
defer t.Stop()
for {
@@ -252,7 +263,6 @@ func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []not
}
type executor struct {
querier datasource.Querier
notifiers []notifier.Notifier
rw *remotewrite.Client
}
@@ -306,7 +316,7 @@ func (e *executor) exec(ctx context.Context, rule Rule, returnSeries bool, inter
execDuration.UpdateDuration(execStart)
}()
tss, err := rule.Exec(ctx, e.querier, returnSeries)
tss, err := rule.Exec(ctx, returnSeries)
if err != nil {
execErrors.Inc()
return fmt.Errorf("rule %q: failed to execute: %w", rule, err)

View File

@@ -6,6 +6,8 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
)
@@ -105,20 +107,32 @@ func TestUpdateWith(t *testing.T) {
{Record: "foo5"},
},
},
{
"update datasource type",
[]config.Rule{
{Alert: "foo1", Type: datasource.NewPrometheusType()},
{Alert: "foo3", Type: datasource.NewGraphiteType()},
},
[]config.Rule{
{Alert: "foo1", Type: datasource.NewGraphiteType()},
{Alert: "foo10", Type: datasource.NewPrometheusType()},
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
g := &Group{Name: "test"}
qb := &fakeQuerier{}
for _, r := range tc.currentRules {
r.ID = config.HashRule(r)
g.Rules = append(g.Rules, g.newRule(r))
g.Rules = append(g.Rules, g.newRule(qb, r))
}
ng := &Group{Name: "test"}
for _, r := range tc.newRules {
r.ID = config.HashRule(r)
ng.Rules = append(ng.Rules, ng.newRule(r))
ng.Rules = append(ng.Rules, ng.newRule(qb, r))
}
err := g.updateWith(ng)
@@ -156,11 +170,11 @@ func TestGroupStart(t *testing.T) {
t.Fatalf("failed to parse rules: %s", err)
}
const evalInterval = time.Millisecond
g := newGroup(groups[0], evalInterval, map[string]string{"cluster": "east-1"})
g.Concurrency = 2
fn := &fakeNotifier{}
fs := &fakeQuerier{}
fn := &fakeNotifier{}
g := newGroup(groups[0], fs, evalInterval, map[string]string{"cluster": "east-1"})
g.Concurrency = 2
const inst1, inst2, job = "foo", "bar", "baz"
m1 := metricWithLabels(t, "instance", inst1, "job", job)
@@ -195,7 +209,7 @@ func TestGroupStart(t *testing.T) {
fs.add(m1)
fs.add(m2)
go func() {
g.start(context.Background(), fs, []notifier.Notifier{fn}, nil)
g.start(context.Background(), []notifier.Notifier{fn}, nil)
close(finished)
}()

View File

@@ -38,6 +38,10 @@ func (fq *fakeQuerier) add(metrics ...datasource.Metric) {
fq.Unlock()
}
func (fq *fakeQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fq
}
func (fq *fakeQuerier) Query(_ context.Context, _ string) ([]datasource.Metric, error) {
fq.Lock()
defer fq.Unlock()
@@ -160,6 +164,9 @@ func compareAlertingRules(t *testing.T, a, b *AlertingRule) error {
if !reflect.DeepEqual(a.Labels, b.Labels) {
return fmt.Errorf("expected to have labels %#v; got %#v", a.Labels, b.Labels)
}
if a.Type.String() != b.Type.String() {
return fmt.Errorf("expected to have Type %#v; got %#v", a.Type.String(), b.Type.String())
}
return nil
}

View File

@@ -26,11 +26,11 @@ import (
)
var (
rulePath = flagutil.NewArray("rule", `Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
rulePath = flagutil.NewArray("rule", `Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
@@ -41,12 +41,13 @@ Rule files may contain %{ENV_VAR} placeholders, which are substituted by the cor
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|pathEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
externalLabels = flagutil.NewArray("external.label", "Optional label in the form 'name=value' to add to all generated recording rules and alerts. "+
"Pass multiple -label flags in order to add multiple label sets.")
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup.")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmalert. The rules file are validated. The `-rule` flag must be specified.")
)
@@ -76,6 +77,12 @@ func main() {
if err != nil {
logger.Fatalf("failed to init: %s", err)
}
// Register SIGHUP handler for config re-read just before manager.start call.
// This guarantees that the config will be re-read if the signal arrives during manager.start call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
if err := manager.start(ctx, *rulePath, *validateTemplates, *validateExpressions); err != nil {
logger.Fatalf("failed to start: %s", err)
}
@@ -84,9 +91,8 @@ func main() {
// init reload metrics with positive values to improve alerting conditions
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
sigHup := procutil.NewSighupChan()
for {
<-sigHup
<-sighupCh
configReloads.Inc()
logger.Infof("SIGHUP received. Going to reload rules %q ...", *rulePath)
if err := manager.update(ctx, *rulePath, *validateTemplates, *validateExpressions, false); err != nil {
@@ -140,10 +146,10 @@ func newManager(ctx context.Context) (*manager, error) {
}
manager := &manager{
groups: make(map[uint64]*Group),
querier: q,
notifiers: nts,
labels: map[string]string{},
groups: make(map[uint64]*Group),
querierBuilder: q,
notifiers: nts,
labels: map[string]string{},
}
rw, err := remotewrite.Init(ctx)
if err != nil {
@@ -218,7 +224,7 @@ func usage() {
const s = `
vmalert processes alerts and recording rules.
See the docs at https://victoriametrics.github.io/vmalert.html .
See the docs at https://docs.victoriametrics.com/vmalert.html .
`
flagutil.Usage(s)
}

View File

@@ -15,11 +15,12 @@ import (
// manager controls group states
type manager struct {
querier datasource.Querier
notifiers []notifier.Notifier
querierBuilder datasource.QuerierBuilder
notifiers []notifier.Notifier
rw *remotewrite.Client
rr datasource.Querier
// remote read builder.
rr datasource.QuerierBuilder
wg sync.WaitGroup
labels map[string]string
@@ -63,10 +64,13 @@ func (m *manager) close() {
m.wg.Wait()
}
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) {
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) error {
if restore && m.rr != nil {
err := group.Restore(ctx, m.rr, *remoteReadLookBack, m.labels)
if err != nil {
if !*remoteReadIgnoreRestoreErrors {
return fmt.Errorf("failed to restore state for group %q: %w", group.Name, err)
}
logger.Errorf("error while restoring state for group %q: %s", group.Name, err)
}
}
@@ -74,10 +78,11 @@ func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) {
m.wg.Add(1)
id := group.ID()
go func() {
group.start(ctx, m.querier, m.notifiers, m.rw)
group.start(ctx, m.notifiers, m.rw)
m.wg.Done()
}()
m.groups[id] = group
return nil
}
func (m *manager) update(ctx context.Context, path []string, validateTpl, validateExpr, restore bool) error {
@@ -89,7 +94,7 @@ func (m *manager) update(ctx context.Context, path []string, validateTpl, valida
groupsRegistry := make(map[uint64]*Group)
for _, cfg := range groupsCfg {
ng := newGroup(cfg, *evaluationInterval, m.labels)
ng := newGroup(cfg, m.querierBuilder, *evaluationInterval, m.labels)
groupsRegistry[ng.ID()] = ng
}
@@ -116,7 +121,9 @@ func (m *manager) update(ctx context.Context, path []string, validateTpl, valida
}
}
for _, ng := range groupsRegistry {
m.startGroup(ctx, ng, restore)
if err := m.startGroup(ctx, ng, restore); err != nil {
return err
}
}
m.groupsMu.Unlock()
@@ -140,11 +147,14 @@ func (g *Group) toAPI() APIGroup {
ag := APIGroup{
// encode as string to avoid rounding
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
File: g.File,
Interval: g.Interval.String(),
Concurrency: g.Concurrency,
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.String(),
Concurrency: g.Concurrency,
ExtraFilterLabels: g.ExtraFilterLabels,
}
for _, r := range g.Rules {
switch v := r.(type) {

View File

@@ -9,6 +9,8 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
)
@@ -35,9 +37,9 @@ func TestManagerEmptyRulesDir(t *testing.T) {
// Should be executed with -race flag
func TestManagerUpdateConcurrent(t *testing.T) {
m := &manager{
groups: make(map[uint64]*Group),
querier: &fakeQuerier{},
notifiers: []notifier.Notifier{&fakeNotifier{}},
groups: make(map[uint64]*Group),
querierBuilder: &fakeQuerier{},
notifiers: []notifier.Notifier{&fakeNotifier{}},
}
paths := []string{
"config/testdata/dir/rules0-good.rules",
@@ -106,6 +108,18 @@ func TestManagerUpdate(t *testing.T) {
Name: "ExampleAlertAlwaysFiring",
Expr: "sum by(job) (up == 1)",
}
ExampleAlertGraphite = &AlertingRule{
Name: "up graphite",
Expr: "filterSeries(time('host.1',20),'>','0')",
Type: datasource.NewGraphiteType(),
For: defaultEvalInterval,
}
ExampleAlertGraphite2 = &AlertingRule{
Name: "up",
Expr: "filterSeries(time('host.2',20),'>','0')",
Type: datasource.NewGraphiteType(),
For: defaultEvalInterval,
}
)
testCases := []struct {
@@ -122,6 +136,7 @@ func TestManagerUpdate(t *testing.T) {
{
File: "config/testdata/dir/rules1-good.rules",
Name: "duplicatedGroupDiffFiles",
Type: datasource.NewPrometheusType(),
Interval: defaultEvalInterval,
Rules: []Rule{
&AlertingRule{
@@ -146,12 +161,14 @@ func TestManagerUpdate(t *testing.T) {
{
File: "config/testdata/rules0-good.rules",
Name: "groupGorSingleAlert",
Type: datasource.NewPrometheusType(),
Rules: []Rule{VMRows},
Interval: defaultEvalInterval,
},
{
File: "config/testdata/rules0-good.rules",
Interval: defaultEvalInterval,
Type: datasource.NewPrometheusType(),
Name: "TestGroup", Rules: []Rule{
Conns,
ExampleAlertAlwaysFiring,
@@ -166,13 +183,16 @@ func TestManagerUpdate(t *testing.T) {
{
File: "config/testdata/rules0-good.rules",
Name: "groupGorSingleAlert",
Type: datasource.NewPrometheusType(),
Interval: defaultEvalInterval,
Rules: []Rule{VMRows},
},
{
File: "config/testdata/rules0-good.rules",
Interval: defaultEvalInterval,
Name: "TestGroup", Rules: []Rule{
Name: "TestGroup",
Type: datasource.NewPrometheusType(),
Rules: []Rule{
Conns,
ExampleAlertAlwaysFiring,
}},
@@ -186,12 +206,14 @@ func TestManagerUpdate(t *testing.T) {
{
File: "config/testdata/rules0-good.rules",
Name: "groupGorSingleAlert",
Type: datasource.NewPrometheusType(),
Interval: defaultEvalInterval,
Rules: []Rule{VMRows},
},
{
File: "config/testdata/rules0-good.rules",
Interval: defaultEvalInterval,
Type: datasource.NewPrometheusType(),
Name: "TestGroup", Rules: []Rule{
Conns,
ExampleAlertAlwaysFiring,
@@ -199,11 +221,28 @@ func TestManagerUpdate(t *testing.T) {
},
},
},
{
name: "update prometheus to graphite type",
initPath: "config/testdata/dir/rules-update0-good.rules",
updatePath: "config/testdata/dir/rules-update1-good.rules",
want: []*Group{
{
File: "config/testdata/dir/rules-update1-good.rules",
Interval: defaultEvalInterval,
Type: datasource.NewGraphiteType(),
Name: "TestUpdateGroup",
Rules: []Rule{
ExampleAlertGraphite2,
ExampleAlertGraphite,
},
},
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, cancel := context.WithCancel(context.TODO())
m := &manager{groups: make(map[uint64]*Group), querier: &fakeQuerier{}}
m := &manager{groups: make(map[uint64]*Group), querierBuilder: &fakeQuerier{}}
path := []string{tc.initPath}
if err := m.update(ctx, path, true, true, false); err != nil {
t.Fatalf("failed to complete initial rules update: %s", err)

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 8880
ENTRYPOINT ["/vmalert-prod"]
ARG TARGETARCH
COPY vmalert-${TARGETARCH}-prod ./vmalert-prod

View File

@@ -1,7 +1,6 @@
package notifier
import (
"flag"
"fmt"
"net/http"
@@ -11,10 +10,10 @@ import (
var (
addrs = flagutil.NewArray("notifier.url", "Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093")
basicAuthUsername = flagutil.NewArray("notifier.basicAuth.username", "Optional basic auth username for -datasource.url")
basicAuthPassword = flagutil.NewArray("notifier.basicAuth.password", "Optional basic auth password for -datasource.url")
basicAuthUsername = flagutil.NewArray("notifier.basicAuth.username", "Optional basic auth username for -notifier.url")
basicAuthPassword = flagutil.NewArray("notifier.basicAuth.password", "Optional basic auth password for -notifier.url")
tlsInsecureSkipVerify = flag.Bool("notifier.tlsInsecureSkipVerify", false, "Whether to skip tls verification when connecting to -notifier.url")
tlsInsecureSkipVerify = flagutil.NewArrayBool("notifier.tlsInsecureSkipVerify", "Whether to skip tls verification when connecting to -notifier.url")
tlsCertFile = flagutil.NewArray("notifier.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting to -notifier.url")
tlsKeyFile = flagutil.NewArray("notifier.tlsKeyFile", "Optional path to client-side TLS certificate key to use when connecting to -notifier.url")
tlsCAFile = flagutil.NewArray("notifier.tlsCAFile", "Optional path to TLS CA file to use for verifying connections to -notifier.url. "+
@@ -33,7 +32,7 @@ func Init(gen AlertURLGenerator) ([]Notifier, error) {
for i, addr := range *addrs {
cert, key := tlsCertFile.GetOptionalArg(i), tlsKeyFile.GetOptionalArg(i)
ca, serverName := tlsCAFile.GetOptionalArg(i), tlsServerName.GetOptionalArg(i)
tr, err := utils.Transport(addr, cert, key, ca, serverName, *tlsInsecureSkipVerify)
tr, err := utils.Transport(addr, cert, key, ca, serverName, tlsInsecureSkipVerify.GetOptionalArg(i))
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -28,44 +28,73 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
)
// metric is private copy of datasource.Metric,
// it is used for templating annotations,
// Labels as map simplifies templates evaluation.
type metric struct {
Labels map[string]string
Timestamp int64
Value float64
}
// datasourceMetricsToTemplateMetrics converts Metrics from datasource package to private copy for templating.
func datasourceMetricsToTemplateMetrics(ms []datasource.Metric) []metric {
mss := make([]metric, 0, len(ms))
for _, m := range ms {
labelsMap := make(map[string]string, len(m.Labels))
for _, labelValue := range m.Labels {
labelsMap[labelValue.Name] = labelValue.Value
}
mss = append(mss, metric{
Labels: labelsMap,
Timestamp: m.Timestamp,
Value: m.Value})
}
return mss
}
// QueryFn is used to wrap a call to datasource into simple-to-use function
// for templating functions.
type QueryFn func(query string) ([]datasource.Metric, error)
func funcsWithQuery(query QueryFn) textTpl.FuncMap {
fm := make(textTpl.FuncMap)
for k, fn := range tmplFunc {
fm[k] = fn
}
fm["query"] = func(q string) ([]datasource.Metric, error) {
return query(q)
}
return fm
}
var tmplFunc textTpl.FuncMap
// InitTemplateFunc initiates template helper functions
func InitTemplateFunc(externalURL *url.URL) {
tmplFunc = textTpl.FuncMap{
"args": func(args ...interface{}) map[string]interface{} {
result := make(map[string]interface{})
for i, a := range args {
result[fmt.Sprintf("arg%d", i)] = a
}
return result
},
/* Strings */
// reReplaceAll ReplaceAllString returns a copy of src, replacing matches of the Regexp with
// the replacement string repl. Inside repl, $ signs are interpreted as in Expand,
// so for instance $1 represents the text of the first submatch.
// alias for https://golang.org/pkg/regexp/#Regexp.ReplaceAllString
"reReplaceAll": func(pattern, repl, text string) string {
re := regexp.MustCompile(pattern)
return re.ReplaceAllString(text, repl)
},
"safeHtml": func(text string) htmlTpl.HTML {
return htmlTpl.HTML(text)
},
"match": regexp.MatchString,
"title": strings.Title,
// match reports whether the string s
// contains any match of the regular expression pattern.
// alias for https://golang.org/pkg/regexp/#MatchString
"match": regexp.MatchString,
// title returns a copy of the string s with all Unicode letters
// that begin words mapped to their Unicode title case.
// alias for https://golang.org/pkg/strings/#Title
"title": strings.Title,
// toUpper returns s with all Unicode letters mapped to their upper case.
// alias for https://golang.org/pkg/strings/#ToUpper
"toUpper": strings.ToUpper,
// toLower returns s with all Unicode letters mapped to their lower case.
// alias for https://golang.org/pkg/strings/#ToLower
"toLower": strings.ToLower,
/* Numbers */
// humanize converts given number to a human readable format
// by adding metric prefixes https://en.wikipedia.org/wiki/Metric_prefix
"humanize": func(v float64) string {
if v == 0 || math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -91,6 +120,8 @@ func InitTemplateFunc(externalURL *url.URL) {
}
return fmt.Sprintf("%.4g%s", v, prefix)
},
// humanize1024 converts given number to a human readable format with 1024 as base
"humanize1024": func(v float64) string {
if math.Abs(v) <= 1 || math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -105,6 +136,8 @@ func InitTemplateFunc(externalURL *url.URL) {
}
return fmt.Sprintf("%.4g%s", v, prefix)
},
// humanizeDuration converts given seconds to a human readable duration
"humanizeDuration": func(v float64) string {
if math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -145,9 +178,13 @@ func InitTemplateFunc(externalURL *url.URL) {
}
return fmt.Sprintf("%.4g%ss", v, prefix)
},
// humanizePercentage converts given ratio value to a fraction of 100
"humanizePercentage": func(v float64) string {
return fmt.Sprintf("%.4g%%", v*100)
},
// humanizeTimestamp converts given timestamp to a human readable time equivalent
"humanizeTimestamp": func(v float64) string {
if math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -155,42 +192,115 @@ func InitTemplateFunc(externalURL *url.URL) {
t := TimeFromUnixNano(int64(v * 1e9)).Time().UTC()
return fmt.Sprint(t)
},
"pathPrefix": func() string {
return externalURL.Path
},
/* URLs */
// externalURL returns value of `external.url` flag
"externalURL": func() string {
return externalURL.String()
},
// pathPrefix returns a Path segment from the URL value in `external.url` flag
"pathPrefix": func() string {
return externalURL.Path
},
// pathEscape escapes the string so it can be safely placed inside a URL path segment,
// replacing special characters (including /) with %XX sequences as needed.
// alias for https://golang.org/pkg/net/url/#PathEscape
"pathEscape": func(u string) string {
return url.PathEscape(u)
},
// queryEscape escapes the string so it can be safely placed
// inside a URL query.
// alias for https://golang.org/pkg/net/url/#QueryEscape
"queryEscape": func(q string) string {
return url.QueryEscape(q)
},
// crlfEscape replaces new line chars to skip URL encoding.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
"crlfEscape": func(q string) string {
q = strings.Replace(q, "\n", `\n`, -1)
return strings.Replace(q, "\r", `\r`, -1)
},
// quotesEscape escapes quote char
"quotesEscape": func(q string) string {
return strings.Replace(q, `"`, `\"`, -1)
},
// query function supposed to be substituted at funcsWithQuery().
// it is present here only for validation purposes, when there is no
// provided datasource.
"query": func(q string) ([]datasource.Metric, error) {
return nil, nil
// query executes the MetricsQL/PromQL query against
// configured `datasource.url` address.
// For example, {{ query "foo" | first | value }} will
// execute "/api/v1/query?query=foo" request and will return
// the first value in response.
"query": func(q string) ([]metric, error) {
// query function supposed to be substituted at funcsWithQuery().
// it is present here only for validation purposes, when there is no
// provided datasource.
//
// return non-empty slice to pass validation with chained functions in template
// see issue #989 for details
return []metric{{}}, nil
},
"first": func(metrics []datasource.Metric) (datasource.Metric, error) {
// first returns the first by order element from the given metrics list.
// usually used alongside with `query` template function.
"first": func(metrics []metric) (metric, error) {
if len(metrics) > 0 {
return metrics[0], nil
}
return datasource.Metric{}, errors.New("first() called on vector with no elements")
return metric{}, errors.New("first() called on vector with no elements")
},
"label": func(label string, m datasource.Metric) string {
return m.Label(label)
// label returns the value of the given label name for the given metric.
// usually used alongside with `query` template function.
"label": func(label string, m metric) string {
return m.Labels[label]
},
"value": func(m datasource.Metric) float64 {
// value returns the value of the given metric.
// usually used alongside with `query` template function.
"value": func(m metric) float64 {
return m.Value
},
/* Helpers */
// Converts a list of objects to a map with keys arg0, arg1 etc.
// This is intended to allow multiple arguments to be passed to templates.
"args": func(args ...interface{}) map[string]interface{} {
result := make(map[string]interface{})
for i, a := range args {
result[fmt.Sprintf("arg%d", i)] = a
}
return result
},
// safeHtml marks string as HTML not requiring auto-escaping.
"safeHtml": func(text string) htmlTpl.HTML {
return htmlTpl.HTML(text)
},
}
}
func funcsWithQuery(query QueryFn) textTpl.FuncMap {
fm := make(textTpl.FuncMap)
for k, fn := range tmplFunc {
fm[k] = fn
}
fm["query"] = func(q string) ([]metric, error) {
result, err := query(q)
if err != nil {
return nil, err
}
return datasourceMetricsToTemplateMetrics(result), nil
}
return fm
}
// Time is the number of milliseconds since the epoch
// (1970-01-01 00:00 UTC) excluding leap seconds.
type Time int64

View File

@@ -3,8 +3,8 @@ package main
import (
"context"
"fmt"
"hash/fnv"
"sort"
"strings"
"sync"
"time"
@@ -18,12 +18,15 @@ import (
// to evaluate configured Expression and
// return TimeSeries as result.
type RecordingRule struct {
Type datasource.Type
RuleID uint64
Name string
Expr string
Labels map[string]string
GroupID uint64
q datasource.Querier
// guard status fields
mu sync.RWMutex
// stores last moment of time Exec was called
@@ -51,15 +54,22 @@ func (rr *RecordingRule) ID() uint64 {
return rr.RuleID
}
func newRecordingRule(group *Group, cfg config.Rule) *RecordingRule {
func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *RecordingRule {
rr := &RecordingRule{
Type: cfg.Type,
RuleID: cfg.ID,
Name: cfg.Record,
Expr: cfg.Expr,
Labels: cfg.Labels,
GroupID: group.ID(),
metrics: &recordingRuleMetrics{},
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: &cfg.Type,
EvaluationInterval: group.Interval,
ExtraLabels: group.ExtraFilterLabels,
}),
}
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
rr.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
func() float64 {
@@ -79,13 +89,12 @@ func (rr *RecordingRule) Close() {
}
// Exec executes RecordingRule expression via the given Querier.
func (rr *RecordingRule) Exec(ctx context.Context, q datasource.Querier, series bool) ([]prompbmarshal.TimeSeries, error) {
func (rr *RecordingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal.TimeSeries, error) {
if !series {
return nil, nil
}
qMetrics, err := q.Query(ctx, rr.Expr)
qMetrics, err := rr.q.Query(ctx, rr.Expr)
rr.mu.Lock()
defer rr.mu.Unlock()
@@ -95,33 +104,38 @@ func (rr *RecordingRule) Exec(ctx context.Context, q datasource.Querier, series
return nil, fmt.Errorf("failed to execute query %q: %w", rr.Expr, err)
}
duplicates := make(map[uint64]prompbmarshal.TimeSeries, len(qMetrics))
duplicates := make(map[string]struct{}, len(qMetrics))
var tss []prompbmarshal.TimeSeries
for _, r := range qMetrics {
ts := rr.toTimeSeries(r, rr.lastExecTime)
h := hashTimeSeries(ts)
if _, ok := duplicates[h]; ok {
ts := rr.toTimeSeries(r, time.Unix(r.Timestamp, 0))
key := stringifyLabels(ts)
if _, ok := duplicates[key]; ok {
rr.lastExecError = errDuplicate
return nil, errDuplicate
return nil, fmt.Errorf("original metric %v; resulting labels %q: %w", r, key, errDuplicate)
}
duplicates[h] = ts
duplicates[key] = struct{}{}
tss = append(tss, ts)
}
return tss, nil
}
func hashTimeSeries(ts prompbmarshal.TimeSeries) uint64 {
hash := fnv.New64a()
func stringifyLabels(ts prompbmarshal.TimeSeries) string {
labels := ts.Labels
sort.Slice(labels, func(i, j int) bool {
return labels[i].Name < labels[j].Name
})
for _, l := range labels {
hash.Write([]byte(l.Name))
hash.Write([]byte(l.Value))
hash.Write([]byte("\xff"))
if len(labels) > 1 {
sort.Slice(labels, func(i, j int) bool {
return labels[i].Name < labels[j].Name
})
}
return hash.Sum64()
b := strings.Builder{}
for i, l := range labels {
b.WriteString(l.Name)
b.WriteString("=")
b.WriteString(l.Value)
if i != len(labels)-1 {
b.WriteString(",")
}
}
return b.String()
}
func (rr *RecordingRule) toTimeSeries(m datasource.Metric, timestamp time.Time) prompbmarshal.TimeSeries {
@@ -138,8 +152,6 @@ func (rr *RecordingRule) toTimeSeries(m datasource.Metric, timestamp time.Time)
}
// UpdateWith copies all significant fields.
// alerts state isn't copied since
// it should be updated in next 2 Execs
func (rr *RecordingRule) UpdateWith(r Rule) error {
nr, ok := r.(*RecordingRule)
if !ok {
@@ -147,6 +159,7 @@ func (rr *RecordingRule) UpdateWith(r Rule) error {
}
rr.Expr = nr.Expr
rr.Labels = nr.Labels
rr.q = nr.q
return nil
}
@@ -162,6 +175,7 @@ func (rr *RecordingRule) RuleAPI() APIRecordingRule {
ID: fmt.Sprintf("%d", rr.ID()),
GroupID: fmt.Sprintf("%d", rr.GroupID),
Name: rr.Name,
Type: rr.Type.String(),
Expression: rr.Expr,
LastError: lastErr,
LastExec: rr.lastExecTime,

View File

@@ -76,7 +76,8 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tss, err := tc.rule.Exec(context.TODO(), fq, true)
tc.rule.q = fq
tss, err := tc.rule.Exec(context.TODO(), true)
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
}
@@ -95,8 +96,8 @@ func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
fq := &fakeQuerier{}
expErr := "connection reset by peer"
fq.setErr(errors.New(expErr))
_, err := rr.Exec(context.TODO(), fq, true)
rr.q = fq
_, err := rr.Exec(context.TODO(), true)
if err == nil {
t.Fatalf("expected to get err; got nil")
}
@@ -111,7 +112,7 @@ func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "foo"))
fq.add(metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "bar"))
_, err = rr.Exec(context.TODO(), fq, true)
_, err = rr.Exec(context.TODO(), true)
if err == nil {
t.Fatalf("expected to get err; got nil")
}

View File

@@ -10,9 +10,9 @@ import (
)
var (
addr = flag.String("remoteRead.url", "", "Optional URL to Victoria Metrics or VMSelect that will be used to restore alerts"+
" state. This configuration makes sense only if `vmalert` was configured with `remoteWrite.url` before and has been successfully persisted its state."+
" E.g. http://127.0.0.1:8428")
addr = flag.String("remoteRead.url", "", "Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts "+
"state. This configuration makes sense only if `vmalert` was configured with `remoteWrite.url` before and has been successfully persisted its state. "+
"E.g. http://127.0.0.1:8428")
basicAuthUsername = flag.String("remoteRead.basicAuth.username", "", "Optional basic auth username for -remoteRead.url")
basicAuthPassword = flag.String("remoteRead.basicAuth.password", "", "Optional basic auth password for -remoteRead.url")
tlsInsecureSkipVerify = flag.Bool("remoteRead.tlsInsecureSkipVerify", false, "Whether to skip tls verification when connecting to -remoteRead.url")
@@ -26,7 +26,7 @@ var (
// Init creates a Querier from provided flag values.
// Returns nil if addr flag wasn't set.
func Init() (datasource.Querier, error) {
func Init() (datasource.QuerierBuilder, error) {
if *addr == "" {
return nil, nil
}
@@ -35,5 +35,5 @@ func Init() (datasource.Querier, error) {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
c := &http.Client{Transport: tr}
return datasource.NewVMStorage(*addr, *basicAuthUsername, *basicAuthPassword, 0, c), nil
return datasource.NewVMStorage(*addr, *basicAuthUsername, *basicAuthPassword, 0, 0, false, c), nil
}

View File

@@ -10,8 +10,8 @@ import (
)
var (
addr = flag.String("remoteWrite.url", "", "Optional URL to Victoria Metrics or VMInsert where to persist alerts state"+
" and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428")
addr = flag.String("remoteWrite.url", "", "Optional URL to VictoriaMetrics or vminsert where to persist alerts state "+
"and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428")
basicAuthUsername = flag.String("remoteWrite.basicAuth.username", "", "Optional basic auth username for -remoteWrite.url")
basicAuthPassword = flag.String("remoteWrite.basicAuth.password", "", "Optional basic auth password for -remoteWrite.url")

View File

@@ -94,7 +94,7 @@ func NewClient(ctx context.Context, cfg Config) (*Client, error) {
Timeout: cfg.WriteTimeout,
Transport: cfg.Transport,
},
addr: strings.TrimSuffix(cfg.Addr, "/") + writePath,
addr: strings.TrimSuffix(cfg.Addr, "/"),
baUser: cfg.BasicAuthUser,
baPass: cfg.BasicAuthPass,
flushInterval: cfg.FlushInterval,
@@ -231,6 +231,7 @@ func (c *Client) send(ctx context.Context, data []byte) error {
if c.baPass != "" {
req.SetBasicAuth(c.baUser, c.baPass)
}
req.URL.Path += writePath
resp, err := c.c.Do(req.WithContext(ctx))
if err != nil {
return fmt.Errorf("error while sending request to %s: %w; Data len %d(%d)",

View File

@@ -4,7 +4,6 @@ import (
"context"
"errors"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
@@ -18,7 +17,7 @@ type Rule interface {
// Exec executes the rule with given context
// and Querier. If returnSeries is true, Exec
// may return TimeSeries as result of execution
Exec(ctx context.Context, q datasource.Querier, returnSeries bool) ([]prompbmarshal.TimeSeries, error)
Exec(ctx context.Context, returnSeries bool) ([]prompbmarshal.TimeSeries, error)
// UpdateWith performs modification of current Rule
// with fields of the given Rule.
UpdateWith(Rule) error

View File

@@ -13,6 +13,7 @@ func TestTLSConfig(t *testing.T) {
}
if tlsCfg == nil {
t.Errorf("expected tlsConfig to be set, got nil")
return
}
if tlsCfg.ServerName != serverName {
t.Errorf("unexpected ServerName, want %s, got %s", serverName, tlsCfg.ServerName)

View File

@@ -17,22 +17,19 @@ type requestHandler struct {
m *manager
}
var pathList = [][]string{
{"/api/v1/groups", "list all loaded groups and rules"},
{"/api/v1/alerts", "list all active alerts"},
{"/api/v1/groupID/alertID/status", "get alert status by ID"},
// /metrics is served by httpserver by default
{"/metrics", "list of application metrics"},
{"/-/reload", "reload configuration"},
}
func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/":
for _, path := range pathList {
p, doc := path[0], path[1]
fmt.Fprintf(w, "<a href='%s'>%q</a> - %s<br/>", p, p, doc)
if r.Method != "GET" {
return false
}
httpserver.WriteAPIHelp(w, [][2]string{
{"/api/v1/groups", "list all loaded groups and rules"},
{"/api/v1/alerts", "list all active alerts"},
{"/api/v1/groupID/alertID/status", "get alert status by ID"},
{"/metrics", "list of application metrics"},
{"/-/reload", "reload configuration"},
})
return true
case "/api/v1/groups":
data, err := rh.listGroups()

View File

@@ -20,19 +20,22 @@ type APIAlert struct {
// APIGroup represents Group for WEB view
type APIGroup struct {
Name string `json:"name"`
ID string `json:"id"`
File string `json:"file"`
Interval string `json:"interval"`
Concurrency int `json:"concurrency"`
AlertingRules []APIAlertingRule `json:"alerting_rules"`
RecordingRules []APIRecordingRule `json:"recording_rules"`
Name string `json:"name"`
Type string `json:"type"`
ID string `json:"id"`
File string `json:"file"`
Interval string `json:"interval"`
Concurrency int `json:"concurrency"`
ExtraFilterLabels map[string]string `json:"extra_filter_labels"`
AlertingRules []APIAlertingRule `json:"alerting_rules"`
RecordingRules []APIRecordingRule `json:"recording_rules"`
}
// APIAlertingRule represents AlertingRule for WEB view
type APIAlertingRule struct {
ID string `json:"id"`
Name string `json:"name"`
Type string `json:"type"`
GroupID string `json:"group_id"`
Expression string `json:"expression"`
For string `json:"for"`
@@ -46,6 +49,7 @@ type APIAlertingRule struct {
type APIRecordingRule struct {
ID string `json:"id"`
Name string `json:"name"`
Type string `json:"type"`
GroupID string `json:"group_id"`
Expression string `json:"expression"`
LastError string `json:"last_error"`

View File

@@ -77,3 +77,9 @@ vmauth-local-with-goarch:
vmauth-pure:
APP_NAME=vmauth $(MAKE) app-local-pure
vmauth-windows-amd64:
GOARCH=amd64 APP_NAME=vmauth $(MAKE) app-local-windows-with-goarch
vmauth-windows-amd64-prod:
APP_NAME=vmauth $(MAKE) app-via-docker-windows-amd64

View File

@@ -1,11 +1,11 @@
## vmauth
# vmauth
`vmauth` is a simple auth proxy and router for [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics).
It reads username and password from [Basic Auth headers](https://en.wikipedia.org/wiki/Basic_access_authentication),
matches them against configs pointed by `-auth.config` command-line flag and proxies incoming HTTP requests to the configured per-user `url_prefix` on successful match.
### Quick start
## Quick start
Just download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases), unpack it
and pass the following flag to `vmauth` binary in order to start authorizing and routing requests:
@@ -17,16 +17,17 @@ and pass the following flag to `vmauth` binary in order to start authorizing and
After that `vmauth` starts accepting HTTP requests on port `8427` and routing them according to the provided [-auth.config](#auth-config).
The port can be modified via `-httpListenAddr` command-line flag.
The auth config can be reloaded by passing `SIGHUP` signal to `vmauth`.
The auth config can be reloaded either by passing `SIGHUP` signal to `vmauth` or by querying `/-/reload` http endpoint.
Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags).
Pass `-help` to `vmauth` in order to see all the supported command-line flags with their descriptions.
Feel free [contacting us](mailto:info@victoriametrics.com) if you need customized auth proxy for VictoriaMetrics with the support of LDAP, SSO, RBAC, SAML, accounting, limits, etc.
Feel free [contacting us](mailto:info@victoriametrics.com) if you need customized auth proxy for VictoriaMetrics with the support of LDAP, SSO, RBAC, SAML,
accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.com/vmgateway.html).
### Auth config
## Auth config
Auth config is represented in the following simple `yml` format:
@@ -36,39 +37,64 @@ Auth config is represented in the following simple `yml` format:
# Usernames must be unique.
users:
# Requests with the 'Authorization: Bearer XXXX' header are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- bearer_token: "XXXX"
url_prefix: "http://localhost:8428"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query
# will be proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is routed to http://vmselect:8481/select/123/prometheus/api/v1/select
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8481/select/123/prometheus/api/v1/select
- username: "cluster-select-account-123"
password: "***"
url_prefix: "http://vmselect:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# All the reuqests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is routed to http://vminsert:8480/insert/42/prometheus/api/v1/write
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "cluster-insert-account-42"
password: "***"
url_prefix: "http://vminsert:8480/insert/42/prometheus"
# A single user for querying and inserting data:
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to http://vmselect:8481/select/42/prometheus.
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8480/select/42/prometheus/api/v1/query
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "foobar"
url_map:
- src_paths: ["/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^/]+/values"]
url_prefix: "http://vmselect:8481/select/42/prometheus"
- src_paths: ["/api/v1/write"]
url_prefix: "http://vminsert:8480/insert/42/prometheus"
```
The config may contain `%{ENV_VAR}` placeholders, which are substituted by the corresponding `ENV_VAR` environment variable values.
This may be useful for passing secrets to the config.
### Security
## Security
Do not transfer Basic Auth headers in plaintext over untrusted networks. Enable https. This can be done by passing the following `-tls*` command-line flags to `vmauth`:
@@ -83,31 +109,33 @@ Do not transfer Basic Auth headers in plaintext over untrusted networks. Enable
Alternatively, [https termination proxy](https://en.wikipedia.org/wiki/TLS_termination_proxy) may be put in front of `vmauth`.
It is recommended protecting `/-/reload` endpoint with `-reloadAuthKey` command-line flag, so external users couldn't trigger config reload.
### Monitoring
## Monitoring
`vmauth` exports various metrics in Prometheus exposition format at `http://vmauth-host:8427/metrics` page. It is recommended setting up regular scraping of this page
either via [vmagent](https://victoriametrics.github.io/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
either via [vmagent](https://docs.victoriametrics.com/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
### How to build from sources
## How to build from sources
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmauth` is located in `vmutils-*` archives there.
#### Development build
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make vmauth` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmauth` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmauth` binary and puts it into the `bin` folder.
#### Production build
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmauth-prod` from the root folder of the repository.
2. Run `make vmauth-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmauth-prod` binary and puts it into the `bin` folder.
#### Building docker images
### Building docker images
Run `make package-vmauth`. It builds `victoriametrics/vmauth:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
@@ -121,7 +149,7 @@ ROOT_IMAGE=scratch make package-vmauth
```
### Profiling
## Profiling
`vmauth` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):
@@ -142,7 +170,7 @@ The command for collecting CPU profile waits for 30 seconds before returning.
The collected profiles may be analyzed with [go tool pprof](https://github.com/google/pprof).
### Advanced usage
## Advanced usage
Pass `-help` command-line arg to `vmauth` in order to see all the configuration options:
@@ -151,55 +179,67 @@ Pass `-help` command-line arg to `vmauth` in order to see all the configuration
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmauth.html .
See the docs at https://docs.victoriametrics.com/vmauth.html .
-auth.config string
Path to auth config. See https://victoriametrics.github.io/vmauth.html for details on the format of this auth config
Path to auth config. See https://docs.victoriametrics.com/vmauth.html for details on the format of this auth config
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8427")
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit (default 10)
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-memory.allowedBytes value
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxIdleConnsPerBackend int
The maximum number of idle connections vmauth can open per each backend host (default 100)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-reloadAuthKey string
Auth key for /-/reload http endpoint. It must be passed as authKey=...
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version

View File

@@ -1,10 +1,13 @@
package main
import (
"encoding/base64"
"flag"
"fmt"
"io/ioutil"
"net/url"
"os"
"regexp"
"strings"
"sync"
"sync/atomic"
@@ -17,7 +20,7 @@ import (
)
var (
authConfigPath = flag.String("auth.config", "", "Path to auth config. See https://victoriametrics.github.io/vmauth.html "+
authConfigPath = flag.String("auth.config", "", "Path to auth config. See https://docs.victoriametrics.com/vmauth.html "+
"for details on the format of this auth config")
)
@@ -28,17 +31,91 @@ type AuthConfig struct {
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Username string `yaml:"username"`
Password string `yaml:"password"`
URLPrefix string `yaml:"url_prefix"`
BearerToken string `yaml:"bearer_token"`
Username string `yaml:"username"`
Password string `yaml:"password"`
URLPrefix *yamlURL `yaml:"url_prefix"`
URLMap []URLMap `yaml:"url_map"`
requests *metrics.Counter
}
// URLMap is a mapping from source paths to target urls.
type URLMap struct {
SrcPaths []*SrcPath `yaml:"src_paths"`
URLPrefix *yamlURL `yaml:"url_prefix"`
}
// SrcPath represents an src path
type SrcPath struct {
sOriginal string
re *regexp.Regexp
}
type yamlURL struct {
u *url.URL
}
func (yu *yamlURL) UnmarshalYAML(f func(interface{}) error) error {
var s string
if err := f(&s); err != nil {
return err
}
u, err := url.Parse(s)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", s, err)
}
yu.u = u
return nil
}
func (yu *yamlURL) MarshalYAML() (interface{}, error) {
return yu.u.String(), nil
}
func (sp *SrcPath) match(s string) bool {
prefix, ok := sp.re.LiteralPrefix()
if ok {
// Fast path - literal match
return s == prefix
}
if !strings.HasPrefix(s, prefix) {
return false
}
return sp.re.MatchString(s)
}
// UnmarshalYAML implements yaml.Unmarshaler
func (sp *SrcPath) UnmarshalYAML(f func(interface{}) error) error {
var s string
if err := f(&s); err != nil {
return err
}
sAnchored := "^(?:" + s + ")$"
re, err := regexp.Compile(sAnchored)
if err != nil {
return fmt.Errorf("cannot build regexp from %q: %w", s, err)
}
sp.sOriginal = s
sp.re = re
return nil
}
// MarshalYAML implements yaml.Marshaler.
func (sp *SrcPath) MarshalYAML() (interface{}, error) {
return sp.sOriginal, nil
}
func initAuthConfig() {
if len(*authConfigPath) == 0 {
logger.Fatalf("missing required `-auth.config` command-line flag")
}
// Register SIGHUP handler for config re-read just before readAuthConfig call.
// This guarantees that the config will be re-read if the signal arrives during readAuthConfig call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
m, err := readAuthConfig(*authConfigPath)
if err != nil {
logger.Fatalf("cannot load auth config from `-auth.config=%s`: %s", *authConfigPath, err)
@@ -48,7 +125,7 @@ func initAuthConfig() {
authConfigWG.Add(1)
go func() {
defer authConfigWG.Done()
authConfigReloader()
authConfigReloader(sighupCh)
}()
}
@@ -57,8 +134,7 @@ func stopAuthConfig() {
authConfigWG.Wait()
}
func authConfigReloader() {
sighupCh := procutil.NewSighupChan()
func authConfigReloader(sighupCh <-chan os.Signal) {
for {
select {
case <-stopCh:
@@ -103,29 +179,86 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
if len(uis) == 0 {
return nil, fmt.Errorf("`users` section cannot be empty in AuthConfig")
}
m := make(map[string]*UserInfo, len(uis))
byAuthToken := make(map[string]*UserInfo, len(uis))
byUsername := make(map[string]bool, len(uis))
byBearerToken := make(map[string]bool, len(uis))
for i := range uis {
ui := &uis[i]
if m[ui.Username] != nil {
if ui.BearerToken == "" && ui.Username == "" {
return nil, fmt.Errorf("either bearer_token or username must be set")
}
if ui.BearerToken != "" && ui.Username != "" {
return nil, fmt.Errorf("bearer_token=%q and username=%q cannot be set simultaneously", ui.BearerToken, ui.Username)
}
if byBearerToken[ui.BearerToken] {
return nil, fmt.Errorf("duplicate bearer_token found; bearer_token: %q", ui.BearerToken)
}
if byUsername[ui.Username] {
return nil, fmt.Errorf("duplicate username found; username: %q", ui.Username)
}
urlPrefix := ui.URLPrefix
// Remove trailing '/' from urlPrefix
for strings.HasSuffix(urlPrefix, "/") {
urlPrefix = urlPrefix[:len(urlPrefix)-1]
authToken := getAuthToken(ui.BearerToken, ui.Username, ui.Password)
if byAuthToken[authToken] != nil {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", authToken, ui.BearerToken, ui.Username)
}
// Validate urlPrefix
target, err := url.Parse(urlPrefix)
if err != nil {
return nil, fmt.Errorf("invalid `url_prefix: %q`: %w", urlPrefix, err)
if ui.URLPrefix != nil {
urlPrefix, err := sanitizeURLPrefix(ui.URLPrefix.u)
if err != nil {
return nil, err
}
ui.URLPrefix.u = urlPrefix
}
if target.Scheme != "http" && target.Scheme != "https" {
return nil, fmt.Errorf("unsupported scheme for `url_prefix: %q`: %q; must be `http` or `https`", urlPrefix, target.Scheme)
for _, e := range ui.URLMap {
if len(e.SrcPaths) == 0 {
return nil, fmt.Errorf("missing `src_paths` in `url_map`")
}
if e.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix` in `url_map`")
}
urlPrefix, err := sanitizeURLPrefix(e.URLPrefix.u)
if err != nil {
return nil, err
}
e.URLPrefix.u = urlPrefix
}
ui.URLPrefix = urlPrefix
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, ui.Username))
m[ui.Username] = ui
if len(ui.URLMap) == 0 && ui.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix`")
}
if ui.BearerToken != "" {
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(`vmauth_user_requests_total{username="bearer_token"}`)
byBearerToken[ui.BearerToken] = true
}
if ui.Username != "" {
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, ui.Username))
byUsername[ui.Username] = true
}
byAuthToken[authToken] = ui
}
return m, nil
return byAuthToken, nil
}
func getAuthToken(bearerToken, username, password string) string {
if bearerToken != "" {
return "Bearer " + bearerToken
}
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
}
func sanitizeURLPrefix(urlPrefix *url.URL) (*url.URL, error) {
// Remove trailing '/' from urlPrefix
for strings.HasSuffix(urlPrefix.Path, "/") {
urlPrefix.Path = urlPrefix.Path[:len(urlPrefix.Path)-1]
}
// Validate urlPrefix
if urlPrefix.Scheme != "http" && urlPrefix.Scheme != "https" {
return nil, fmt.Errorf("unsupported scheme for `url_prefix: %q`: %q; must be `http` or `https`", urlPrefix, urlPrefix.Scheme)
}
if urlPrefix.Host == "" {
return nil, fmt.Errorf("missing hostname in `url_prefix %q`", urlPrefix.Host)
}
return urlPrefix, nil
}

View File

@@ -1,8 +1,13 @@
package main
import (
"reflect"
"bytes"
"fmt"
"net/url"
"regexp"
"testing"
"gopkg.in/yaml.v2"
)
func TestParseAuthConfigFailure(t *testing.T) {
@@ -46,6 +51,32 @@ users:
- username: foo
url_prefix: //bar
`)
f(`
users:
- username: foo
url_prefix: http:///bar
`)
f(`
users:
- username: foo
url_prefix: [bar]
`)
// Username and bearer_token in a single config
f(`
users:
- username: foo
bearer_token: bbb
url_prefix: http://foo.bar
`)
// Bearer_token and password in a single config
f(`
users:
- password: foo
bearer_token: bbb
url_prefix: http://foo.bar
`)
// Duplicate users
f(`
@@ -57,6 +88,51 @@ users:
- username: foo
url_prefix: https://sss.sss
`)
// Duplicate bearer_tokens
f(`
users:
- bearer_token: foo
url_prefix: http://foo.bar
- username: bar
url_prefix: http://xxx.yyy
- bearer_token: foo
url_prefix: https://sss.sss
`)
// Missing url_prefix in url_map
f(`
users:
- username: a
url_map:
- src_paths: ["/foo/bar"]
`)
// Invalid url_prefix in url_map
f(`
users:
- username: a
url_map:
- src_paths: ["/foo/bar"]
url_prefix: foo.bar
`)
// Missing src_paths in url_map
f(`
users:
- username: a
url_map:
- url_prefix: http://foobar
`)
// Invalid regexp in src_path.
f(`
users:
- username: a
url_map:
- src_paths: ['fo[obar']
url_prefix: http://foobar
`)
}
func TestParseAuthConfigSuccess(t *testing.T) {
@@ -67,8 +143,8 @@ func TestParseAuthConfigSuccess(t *testing.T) {
t.Fatalf("unexpected error: %s", err)
}
removeMetrics(m)
if !reflect.DeepEqual(m, expectedAuthConfig) {
t.Fatalf("unexpected auth config\ngot\n%v\nwant\n%v", m, expectedAuthConfig)
if err := areEqualConfigs(m, expectedAuthConfig); err != nil {
t.Fatal(err)
}
}
@@ -79,10 +155,10 @@ users:
password: bar
url_prefix: http://aaa:343/bbb
`, map[string]*UserInfo{
"foo": {
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: "http://aaa:343/bbb",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
},
})
@@ -94,15 +170,51 @@ users:
- username: bar
url_prefix: https://bar/x///
`, map[string]*UserInfo{
"foo": {
getAuthToken("", "foo", ""): {
Username: "foo",
URLPrefix: "http://foo",
URLPrefix: mustParseURL("http://foo"),
},
"bar": {
getAuthToken("", "bar", ""): {
Username: "bar",
URLPrefix: "https://bar/x",
URLPrefix: mustParseURL("https://bar/x"),
},
})
// non-empty URLMap
f(`
users:
- bearer_token: foo
url_map:
- src_paths: ["/api/v1/query","/api/v1/query_range","/api/v1/label/[^./]+/.+"]
url_prefix: http://vmselect/select/0/prometheus
- src_paths: ["/api/v1/write"]
url_prefix: http://vminsert/insert/0/prometheus
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
BearerToken: "foo",
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURL("http://vminsert/insert/0/prometheus"),
},
},
},
})
}
func getSrcPaths(paths []string) []*SrcPath {
var sps []*SrcPath
for _, path := range paths {
sps = append(sps, &SrcPath{
sOriginal: path,
re: regexp.MustCompile("^(?:" + path + ")$"),
})
}
return sps
}
func removeMetrics(m map[string]*UserInfo) {
@@ -110,3 +222,28 @@ func removeMetrics(m map[string]*UserInfo) {
info.requests = nil
}
}
func areEqualConfigs(a, b map[string]*UserInfo) error {
aData, err := yaml.Marshal(a)
if err != nil {
return fmt.Errorf("cannot marshal a: %w", err)
}
bData, err := yaml.Marshal(b)
if err != nil {
return fmt.Errorf("cannot marshal b: %w", err)
}
if !bytes.Equal(aData, bData) {
return fmt.Errorf("unexpected configs;\ngot\n%s\nwant\n%s", aData, bData)
}
return nil
}
func mustParseURL(u string) *yamlURL {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
return &yamlURL{
u: pu,
}
}

View File

@@ -2,30 +2,54 @@
# Usernames must be unique.
users:
# Requests with the 'Authorization: Bearer XXXX' header are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- bearer_token: "XXXX"
url_prefix: "http://localhost:8428"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query
# will be proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is routed to http://vmselect:8481/select/123/prometheus/api/v1/select
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8481/select/123/prometheus/api/v1/select
- username: "cluster-select-account-123"
password: "***"
url_prefix: "http://vmselect:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# All the reuqests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is routed to http://vminsert:8480/insert/42/prometheus/api/v1/write
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "cluster-insert-account-42"
password: "***"
url_prefix: "http://vminsert:8480/insert/42/prometheus"
# A single user for querying and inserting data:
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to http://vmselect:8481/select/42/prometheus.
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8480/select/42/prometheus/api/v1/query
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "foobar"
url_map:
- src_paths: ["/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^/]+/values"]
url_prefix: "http://vmselect:8481/select/42/prometheus"
- src_paths: ["/api/v1/write"]
url_prefix: "http://vminsert:8480/insert/42/prometheus"

View File

@@ -14,10 +14,13 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
)
func main() {
@@ -47,30 +50,43 @@ func main() {
}
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
username, password, ok := r.BasicAuth()
if !ok {
switch r.URL.Path {
case "/-/reload":
authKey := r.FormValue("authKey")
if authKey != *reloadAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -reloadAuthKey command line flag", authKey)
return true
}
configReloadRequests.Inc()
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
return true
}
authToken := r.Header.Get("Authorization")
if authToken == "" {
w.Header().Set("WWW-Authenticate", `Basic realm="Restricted"`)
http.Error(w, "missing `Authorization: Basic *` header", http.StatusUnauthorized)
http.Error(w, "missing `Authorization` request header", http.StatusUnauthorized)
return true
}
ac := authConfig.Load().(map[string]*UserInfo)
info := ac[username]
if info == nil || info.Password != password {
httpserver.Errorf(w, r, "cannot find the provided username %q or password in config", username)
ui := ac[authToken]
if ui == nil {
httpserver.Errorf(w, r, "cannot find the provided auth token %q in config", authToken)
return true
}
info.requests.Inc()
targetURL := createTargetURL(info.URLPrefix, r.URL)
if _, err := url.Parse(targetURL); err != nil {
httpserver.Errorf(w, r, "invalid targetURL=%q: %s", targetURL, err)
ui.requests.Inc()
targetURL, err := createTargetURL(ui, r.URL)
if err != nil {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
return true
}
r.Header.Set("vm-target-url", targetURL)
r.Header.Set("vm-target-url", targetURL.String())
reverseProxy.ServeHTTP(w, r)
return true
}
var configReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`)
var reverseProxy = &httputil.ReverseProxy{
Director: func(r *http.Request) {
targetURL := r.Header.Get("vm-target-url")
@@ -86,6 +102,7 @@ var reverseProxy = &httputil.ReverseProxy{
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
return tr
}(),
FlushInterval: time.Second,
@@ -96,7 +113,7 @@ func usage() {
const s = `
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmauth.html .
See the docs at https://docs.victoriametrics.com/vmauth.html .
`
flagutil.Usage(s)
}

View File

@@ -0,0 +1,12 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 8427
ENTRYPOINT ["/vmauth-prod"]
ARG TARGETARCH
COPY vmauth-${TARGETARCH}-prod ./vmauth-prod

View File

@@ -1,16 +1,51 @@
package main
import (
"fmt"
"net/url"
"path"
"strings"
)
func createTargetURL(prefix string, u *url.URL) string {
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
targetURL := *uiURL
targetURL.Path += requestURI.Path
requestParams := requestURI.Query()
// fast path
if len(requestParams) == 0 {
return &targetURL
}
// merge query parameters from requests.
uiParams := targetURL.Query()
for k, v := range requestParams {
// skip clashed query params from original request
if exist := uiParams.Get(k); len(exist) > 0 {
continue
}
for i := range v {
uiParams.Add(k, v[i])
}
}
targetURL.RawQuery = uiParams.Encode()
return &targetURL
}
func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, error) {
u := *uOrig
// Prevent from attacks with using `..` in r.URL.Path
u.Path = path.Clean(u.Path)
if !strings.HasPrefix(u.Path, "/") {
u.Path = "/" + u.Path
}
return prefix + u.RequestURI()
for _, e := range ui.URLMap {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return mergeURLs(e.URLPrefix.u, &u), nil
}
}
}
if ui.URLPrefix != nil {
return mergeURLs(ui.URLPrefix.u, &u), nil
}
return nil, fmt.Errorf("missing route for %q", u.String())
}

View File

@@ -5,22 +5,109 @@ import (
"testing"
)
func TestCreateTargetURL(t *testing.T) {
f := func(prefix, requestURI, expectedTarget string) {
func TestCreateTargetURLSuccess(t *testing.T) {
f := func(ui *UserInfo, requestURI, expectedTarget string) {
t.Helper()
u, err := url.Parse(requestURI)
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target := createTargetURL(prefix, u)
if target != expectedTarget {
target, err := createTargetURL(ui, u)
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
if target.String() != expectedTarget {
t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
}
}
f("http://foo.bar", "", "http://foo.bar/.")
f("http://foo.bar", "/", "http://foo.bar/")
f("http://foo.bar", "a/b?c=d", "http://foo.bar/a/b?c=d")
f("https://sss:3894/x/y", "/z", "https://sss:3894/x/y/z")
f("https://sss:3894/x/y", "/../../aaa", "https://sss:3894/x/y/aaa")
f("https://sss:3894/x/y", "/./asd/../../aaa?a=d&s=s/../d", "https://sss:3894/x/y/aaa?a=d&s=s/../d")
// Simple routing with `url_prefix`
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "", "http://foo.bar/.")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "/", "http://foo.bar/")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "a/b?c=d", "http://foo.bar/a/b?c=d")
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/z", "https://sss:3894/x/y/z")
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/../../aaa", "https://sss:3894/x/y/aaa")
f(&UserInfo{
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/./asd/../../aaa?a=d&s=s/../d", "https://sss:3894/x/y/aaa?a=d&s=s%2F..%2Fd")
// Complex routing with `url_map`
ui := &UserInfo{
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURL("http://vminsert/0/prometheus"),
},
},
URLPrefix: mustParseURL("http://default-server"),
}
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up")
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write")
f(ui, "/api/v1/query_range", "http://default-server/api/v1/query_range")
// Complex routing regexp paths in `url_map`
ui = &UserInfo{
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query(_range)?", "/api/v1/label/[^/]+/values"}),
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURL("http://vminsert/0/prometheus"),
},
},
URLPrefix: mustParseURL("http://default-server"),
}
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up")
f(ui, "/api/v1/query_range?query=up", "http://vmselect/0/prometheus/api/v1/query_range?query=up")
f(ui, "/api/v1/label/foo/values", "http://vmselect/0/prometheus/api/v1/label/foo/values")
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write")
f(ui, "/api/v1/foo/bar", "http://default-server/api/v1/foo/bar")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar?extra_label=team=dev"),
}, "/api/v1/query", "http://foo.bar/api/v1/query?extra_label=team=dev")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar?extra_label=team=mobile"),
}, "/api/v1/query?extra_label=team=dev", "http://foo.bar/api/v1/query?extra_label=team%3Dmobile")
}
func TestCreateTargetURLFailure(t *testing.T) {
f := func(ui *UserInfo, requestURI string) {
t.Helper()
u, err := url.Parse(requestURI)
if err != nil {
t.Fatalf("cannot parse %q: %s", requestURI, err)
}
target, err := createTargetURL(ui, u)
if err == nil {
t.Fatalf("expecting non-nil error")
}
if target != nil {
t.Fatalf("unexpected target=%q; want empty string", target)
}
}
f(&UserInfo{}, "/foo/bar")
f(&UserInfo{
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: mustParseURL("http://foobar/baz"),
},
},
}, "/api/v1/write")
}

View File

@@ -1,6 +1,6 @@
## vmbackup
# vmbackup
`vmbackup` creates VictoriaMetrics data backups from [instant snapshots](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
`vmbackup` creates VictoriaMetrics data backups from [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
Supported storage systems for backups:
@@ -15,17 +15,17 @@ data between the existing backup and new backup. It saves time and costs on data
Backup process can be interrupted at any time. It is automatically resumed from the interruption point when restarting `vmbackup` with the same args.
Backed up data can be restored with [vmrestore](https://victoriametrics.github.io/vmrestore.html).
Backed up data can be restored with [vmrestore](https://docs.victoriametrics.com/vmrestore.html).
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
See also [vmbackuper](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) tool built on top of `vmbackup`. This tool simplifies
See also [vmbackupmanager](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) tool built on top of `vmbackup`. This tool simplifies
creation of hourly, daily, weekly and monthly backups.
### Use cases
## Use cases
#### Regular backups
### Regular backups
Regular backup can be performed with the following command:
@@ -34,13 +34,13 @@ vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-
```
* `</path/to/victoria-metrics-data>` - path to VictoriaMetrics data pointed by `-storageDataPath` command-line flag in single-node VictoriaMetrics or in cluster `vmstorage`.
There is no need to stop VictoriaMetrics for creating backups, since they are performed from immutable [instant snapshots](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
There is no need to stop VictoriaMetrics for creating backups, since they are performed from immutable [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots). `vmbackup` can create the snapshot on itself if `-snapshot.createURL` command-line flag is set to an url for creating snapshots. In this case `-snapshotName` flag isn't needed.
* `<bucket>` is an already existing name for [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets).
* `<path/to/new/backup>` is the destination path where new backup will be placed.
#### Regular backups with server-side copy from existing backup
### Regular backups with server-side copy from existing backup
If the destination GCS bucket already contains the previous backup at `-origin` path, then new backup can be sped up
with the following command:
@@ -52,7 +52,7 @@ vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-
It saves time and network bandwidth costs by performing server-side copy for the shared data from the `-origin` to `-dst`.
#### Incremental backups
### Incremental backups
Incremental backups performed if `-dst` points to an already existing backup. In this case only new data uploaded to remote storage.
It saves time and network bandwidth costs when working with big backups:
@@ -62,7 +62,7 @@ vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-
```
#### Smart backups
### Smart backups
Smart backups mean storing full daily backups into `YYYYMMDD` folders and creating incremental hourly backup into `latest` folder:
@@ -72,7 +72,7 @@ Smart backups mean storing full daily backups into `YYYYMMDD` folders and creati
vmbackup -snapshotName=<latest-snapshot> -dst=gcs://<bucket>/latest
```
Where `<latest-snapshot>` is the latest [snapshot](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
Where `<latest-snapshot>` is the latest [snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
The command will upload only changed data to `gcs://<bucket>/latest`.
* Run the following command once a day:
@@ -89,10 +89,10 @@ or from any day (`YYYYMMDD` backups). Note that hourly backup shouldn't run when
Do not forget removing old snapshots and backups when they are no longer needed for saving storage costs.
See also [vmbackuper tool](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for automating smart backups.
See also [vmbackupmanager tool](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for automating smart backups.
### How does it work?
## How does it work?
The backup algorithm is the following:
@@ -118,16 +118,16 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
`vmbackup` can work improperly or slowly when these properties are violated.
### Troubleshooting
## Troubleshooting
* If the backup is slow, then try setting higher value for `-concurrency` flag. This will increase the number of concurrent workers that upload data to backup storage.
* If `vmbackup` eats all the network bandwidth, then set `-maxBytesPerSecond` to the desired value.
* If `vmbackup` has been interrupted due to temporary error, then just restart it with the same args. It will resume the backup process.
* Backups created from [single-node VictoriaMetrics](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html) cannot be restored
at [cluster VictoriaMetrics](https://victoriametrics.github.io/Cluster-VictoriaMetrics.html) and vice versa.
* Backups created from [single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) cannot be restored
at [cluster VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) and vice versa.
### Advanced usage
## Advanced usage
* Obtaining credentials from a file.
@@ -191,30 +191,36 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Prefix for environment variables if -envflag.enable is set
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit (default 10)
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-maxBytesPerSecond value
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxBytesPerSecond size
The maximum upload speed. There is no limit if it is set to 0
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes value
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Supports the following optional suffixes for values: KB, MB, GB, KiB, MiB, GiB (default 0)
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-origin string
Optional origin directory on the remote storage with old backup for server-side copying when performing full backup. This speeds up full backups
-snapshot.createURL string
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snaphsot/create
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set
-snapshot.deleteURL string
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snaphosts will be automatically deleted. Example: http://victoriametrics:8428/snaphsot/delete
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete
-snapshotName string
Name for the snapshot to backup. See https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots
Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set
-storageDataPath string
Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data")
-version
@@ -222,24 +228,24 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
```
### How to build from sources
## How to build from sources
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - see `vmutils-*` archives there.
#### Development build
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.13.
2. Run `make vmbackup` from the root folder of the repository.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmbackup` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmbackup` binary and puts it into the `bin` folder.
#### Production build
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmbackup-prod` from the root folder of the repository.
2. Run `make vmbackup-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmbackup-prod` binary and puts it into the `bin` folder.
#### Building docker images
### Building docker images
Run `make package-vmbackup`. It builds `victoriametrics/vmbackup:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.

View File

@@ -19,11 +19,11 @@ import (
var (
storageDataPath = flag.String("storageDataPath", "victoria-metrics-data", "Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage")
snapshotName = flag.String("snapshotName", "", "Name for the snapshot to backup. See https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots")
snapshotName = flag.String("snapshotName", "", "Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set")
snapshotCreateURL = flag.String("snapshot.createURL", "", "VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. "+
"Example: http://victoriametrics:8428/snaphsot/create")
"Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set")
snapshotDeleteURL = flag.String("snapshot.deleteURL", "", "VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. "+
"All created snaphosts will be automatically deleted. Example: http://victoriametrics:8428/snaphsot/delete")
"All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete")
dst = flag.String("dst", "", "Where to put the backup on the remote storage. "+
"Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir\n"+
"-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded")
@@ -41,7 +41,9 @@ func main() {
logger.Init()
if len(*snapshotCreateURL) > 0 {
logger.Infof("Snapshots enabled")
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", *snapshotCreateURL)
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
@@ -99,7 +101,7 @@ func usage() {
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3
or local filesystem. Backed up data can be restored with vmrestore.
See the docs at https://victoriametrics.github.io/vbackup.html .
See the docs at https://docs.victoriametrics.com/vmbackup.html .
`
flagutil.Usage(s)
}

View File

@@ -0,0 +1,11 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
ENTRYPOINT ["/vmbackup-prod"]
ARG TARGETARCH
COPY vmbackup-${TARGETARCH}-prod ./vmbackup-prod

View File

@@ -0,0 +1,146 @@
## vmbackupmanager
VictoriaMetrics backup manager
This service automates regular backup procedures. It supports the following backup intervals: **hourly**, **daily**, **weekly** and **monthly**. Multiple backup intervals may be configured simultaneously. I.e. the backup manager creates hourly backups every hour, while it creates daily backups every day, etc. Backup manager must have read access to the storage data, so best practice is to install it on the same machine (or as a sidecar) where the storage node is installed.
The backup service makes a backup every hour and puts it to the latest folder and then copies data to the folders which represent the backup intervals (hourly, daily, weekly and monthly)
The required flags for running the service are as follows:
* -eula - should be true and means that you have the legal right to run a backup manager. That can either be a signed contract or an email with confirmation to run the service in a trial period
* -storageDataPath - path to VictoriaMetrics or vmstorage data path to make backup from
* -snapshot.createURL - VictoriaMetrics creates snapshot URL which will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create
* -dst - backup destination at s3, gcs or local filesystem
* -credsFilePath - path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. See [https://cloud.google.com/iam/docs/creating-managing-service-account-keys](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and [https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html](https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html)
Backup schedule is controlled by the following flags:
* -disableHourly - disable hourly run. Default false
* -disableDaily - disable daily run. Default false
* -disableWeekly - disable weekly run. Default false
* -disableMonthly - disable monthly run. Default false
By default, all flags are turned on and Backup Manager backups data every hour for every interval (hourly, daily, weekly and monthly).
The backup manager creates the following directory hierarchy at **-dst**:
* /latest/ - contains the latest backup
* /hourly/ - contains hourly backups. Each backup is named as *YYYY-MM-DD:HH*
* /daily/ - contains daily backups. Each backup is named as *YYYY-MM-DD*
* /weekly/ - contains weekly backups. Each backup is named as *YYYY-WW*
* /monthly/ - contains monthly backups. Each backup is named as *YYYY-MM*
To get the full list of supported flags please run the following command:
```console
./vmbackupmanager --help
```
The service creates a **full** backup each run. This means that the system can be restored fully from any particular backup using vmrestore. Backup manager uploads only the data that has been changed or created since the most recent backup (incremental backup).
*Please take into account that the first backup upload could take a significant amount of time as it needs to upload all of the data.*
There are two flags which could help with performance tuning:
* -maxBytesPerSecond - the maximum upload speed. There is no limit if it is set to 0
* -concurrency - The number of concurrent workers. Higher concurrency may improve upload speed (default 10)
### Example of Usage
GCS and cluster version. You need to have a credentials file in json format with following structure
credentials.json
```json
{
"type": "service_account",
"project_id": "<project>",
"private_key_id": "",
"private_key": "-----BEGIN PRIVATE KEY-----\-----END PRIVATE KEY-----\n",
"client_email": test@<project>.iam.gserviceaccount.com",
"client_id": "",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%40<project>.iam.gserviceaccount.com"
}
```
Backup manager launched with the following configuration:
```console
export NODE_IP=192.168.0.10
export VMSTORAGE_ENDPOINT=http://127.0.0.1:8428
./vmbackupmanager -dst=gcs://vmstorage-data/$NODE_IP -credsFilePath=credentials.json -storageDataPath=/vmstorage-data -snapshot.createURL=$VMSTORAGE_ENDPOINT/snapshot/create -eula
```
Expected logs in vmbackupmanager:
```console
info lib/backup/actions/backup.go:131 server-side copied 81 out of 81 parts from GCS{bucket: "vmstorage-data", dir: "192.168.0.10//latest/"} to GCS{bucket: "vmstorage-data", dir: "192.168.0.10//weekly/2020-34/"} in 2.549833008s
info lib/backup/actions/backup.go:169 backed up 853315 bytes in 2.882 seconds; deleted 0 bytes; server-side copied 853315 bytes; uploaded 0 bytes
```
Expected logs in vmstorage:
```console
info VictoriaMetrics/lib/storage/table.go:146 creating table snapshot of "/vmstorage-data/data"...
info VictoriaMetrics/lib/storage/storage.go:311 deleting snapshot "/vmstorage-data/snapshots/20200818201959-162C760149895DDA"...
info VictoriaMetrics/lib/storage/storage.go:319 deleted snapshot "/vmstorage-data/snapshots/20200818201959-162C760149895DDA" in 0.169 seconds
```
The result on the GCS bucket
- The root folder
![root](root.png)
- The latest folder
![latest](latest.png)
## Backup Retention Policy
Backup retention policy is controlled by:
* -keepLastHourly - keep the last N hourly backups. Disabled by default
* -keepLastDaily - keep the last N daily backups. Disabled by default
* -keepLastWeekly - keep the last N weekly backups. Disabled by default
* -keepLastMonthly - keep the last N monthly backups. Disabled by default
*Note*: 0 value in every keepLast flag results into deletion ALL backups for particular type (hourly, daily, weekly and monthly)
Lets assume we have a backup manager collecting daily backups for the past 10 days.
![daily](rp_daily_1.png)
We enable backup retention policy for backup manager by using following configuration:
```console
export NODE_IP=192.168.0.10
export VMSTORAGE_ENDPOINT=http://127.0.0.1:8428
./vmbackupmanager -dst=gcs://vmstorage-data/$NODE_IP -credsFilePath=credentials.json -storageDataPath=/vmstorage-data -snapshot.createURL=$VMSTORAGE_ENDPOINT/snapshot/create
-keepLastDaily=3 -eula
```
Expected logs in backup manager on start:
```console
info lib/logger/flag.go:20 flag "keepLastDaily" = "3"
```
Expected logs in backup manager during retention cycle:
```console
info app/vmbackupmanager/retention.go:106 daily backups to delete [daily/2021-02-13 daily/2021-02-12 daily/2021-02-11 daily/2021-02-10 daily/2021-02-09 daily/2021-02-08 daily/2021-02-07]
```
The result on the GCS bucket. We see only 3 daily backups:
![daily](rp_daily_2.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

79
app/vmctl/Makefile Normal file
View File

@@ -0,0 +1,79 @@
# All these commands must run from repository root.
vmctl:
APP_NAME=vmctl $(MAKE) app-local
vmctl-race:
APP_NAME=vmctl RACE=-race $(MAKE) app-local
vmctl-prod:
APP_NAME=vmctl $(MAKE) app-via-docker
vmctl-pure-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-pure
vmctl-amd64-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-amd64
vmctl-arm-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-arm
vmctl-arm64-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-arm64
vmctl-ppc64le-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-ppc64le
vmctl-386-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-386
package-vmctl:
APP_NAME=vmctl $(MAKE) package-via-docker
package-vmctl-pure:
APP_NAME=vmctl $(MAKE) package-via-docker-pure
package-vmctl-amd64:
APP_NAME=vmctl $(MAKE) package-via-docker-amd64
package-vmctl-arm:
APP_NAME=vmctl $(MAKE) package-via-docker-arm
package-vmctl-arm64:
APP_NAME=vmctl $(MAKE) package-via-docker-arm64
package-vmctl-ppc64le:
APP_NAME=vmctl $(MAKE) package-via-docker-ppc64le
package-vmctl-386:
APP_NAME=vmctl $(MAKE) package-via-docker-386
publish-vmctl:
APP_NAME=vmctl $(MAKE) publish-via-docker
vmctl-amd64:
CGO_ENABLED=1 GOARCH=amd64 $(MAKE) vmctl-local-with-goarch
vmctl-arm:
CGO_ENABLED=0 GOARCH=arm $(MAKE) vmctl-local-with-goarch
vmctl-arm64:
CGO_ENABLED=0 GOARCH=arm64 $(MAKE) vmctl-local-with-goarch
vmctl-ppc64le:
CGO_ENABLED=0 GOARCH=ppc64le $(MAKE) vmctl-local-with-goarch
vmctl-386:
CGO_ENABLED=0 GOARCH=386 $(MAKE) vmctl-local-with-goarch
vmctl-local-with-goarch:
APP_NAME=vmctl $(MAKE) app-local-with-goarch
vmctl-pure:
APP_NAME=vmctl $(MAKE) app-local-pure
vmctl-windows-amd64:
GOARCH=amd64 APP_NAME=vmctl $(MAKE) app-local-windows-with-goarch
vmctl-windows-amd64-prod:
APP_NAME=vmctl $(MAKE) app-via-docker-windows-amd64

554
app/vmctl/README.md Normal file
View File

@@ -0,0 +1,554 @@
# vmctl
VictoriaMetrics command-line tool
Features:
- [x] Prometheus: migrate data from Prometheus to VictoriaMetrics using snapshot API
- [x] Thanos: migrate data from Thanos to VictoriaMetrics
- [ ] ~~Prometheus: migrate data from Prometheus to VictoriaMetrics by query~~(discarded)
- [x] InfluxDB: migrate data from InfluxDB to VictoriaMetrics
- [x] OpenTSDB: migrate data from OpenTSDB to VictoriaMetrics
- [ ] Storage Management: data re-balancing between nodes
## Articles
* [How to migrate data from Prometheus](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043)
* [How to migrate data from Prometheus. Filtering and modifying time series](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-filtering-and-modifying-time-series-6d40cea4bf21)
## How to build
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmctl` is located in `vmutils-*` archives there.
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmctl` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl` binary and puts it into the `bin` folder.
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmctl-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-prod` binary and puts it into the `bin` folder.
### Building docker images
Run `make package-vmctl`. It builds `victoriametrics/vmctl:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmctl`.
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
```bash
ROOT_IMAGE=scratch make package-vmctl
```
### ARM build
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
#### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
2. Run `make vmctl-arm` or `make vmctl-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm` or `vmctl-arm64` binary respectively and puts it into the `bin` folder.
#### Production ARM build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmctl-arm-prod` or `make vmctl-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm-prod` or `vmctl-arm64-prod` binary respectively and puts it into the `bin` folder.
## Migrating data from OpenTSDB
`vmctl` supports the `opentsdb` mode to migrate data from OpenTSDB to VictoriaMetrics time-series database.
See `./vmctl opentsdb --help` for details and full list of flags.
*OpenTSDB migration is not possible without a functioning [meta](http://opentsdb.net/docs/build/html/user_guide/metadata.html) table to search for metrics/series.*
OpenTSDB migration works like so:
1. Find metrics based on selected filters (or the default filter set ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'])
* e.g. `curl -Ss "http://opentsdb:4242/api/suggest?type=metrics&q=sys"`
2. Find series associated with each returned metric
* e.g. `curl -Ss "http://opentsdb:4242/api/search/lookup?m=system.load5&limit=1000000"`
3. Download data for each series in chunks defined in the CLI switches
* e.g. `-retention=sum-1m-avg:1h:90d` ==
* `curl -Ss "http://opentsdb:4242/api/query?start=1h-ago&end=now&m=sum:1m-avg-none:system.load5\{host=host1\}"`
* `curl -Ss "http://opentsdb:4242/api/query?start=2h-ago&end=1h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
* `curl -Ss "http://opentsdb:4242/api/query?start=3h-ago&end=2h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
* ...
* `curl -Ss "http://opentsdb:4242/api/query?start=2160h-ago&end=2159h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
This means that we must stream data from OpenTSDB to VictoriaMetrics in chunks. This is where concurrency for OpenTSDB comes in. We can query multiple chunks at once, but we shouldn't perform too many chunks at a time to avoid overloading the OpenTSDB cluster.
```
$ bin/vmctl opentsdb --otsdb-addr http://opentsdb:4242/ --otsdb-retentions sum-1m-avg:1h:1d --otsdb-filters system --otsdb-normalize --vm-addr http://victoria/
OpenTSDB import mode
2021/04/09 11:52:50 Will collect data starting at TS 1617990770
2021/04/09 11:52:50 Loading all metrics from OpenTSDB for filters: [system]
Found 9 metrics to import. Continue? [Y/n]
2021/04/09 11:52:51 Starting work on system.load1
23 / 402200 [>____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________] 0.01% 2 p/s
```
### Retention strings
Starting with a relatively simple retention string (`sum-1m-avg:1h:30d`), let's describe how this is converted into actual queries.
There are two essential parts of a retention string:
1. [aggregation](#aggregation)
2. [windows/time ranges](#windows)
#### Aggregation
Retention strings essentially define the two levels of aggregation for our collected series.
`sum-1m-avg` would become:
* First order: `sum`
* Second order: `1m-avg-none`
##### First Order Aggregations
First-order aggregation addresses how to aggregate any un-mentioned tags.
This is, conceptually, directly opposite to how PromQL deals with tags. In OpenTSDB, if a tag isn't explicitly mentioned, all values assocaited with that tag will be aggregated.
It is recommended to use `sum` for the first aggregation because it is relatively quick and should not cause any changes to the incoming data (because we collect each individual series).
##### Second Order Aggregations
Second-order aggregation (`1m-avg` in our example) defines any windowing that should occur before returning the data
It is recommended to match the stat collection interval so we again avoid transforming incoming data.
We do not allow for defining the "null value" portion of the rollup window (e.g. in the aggreagtion, `1m-avg-none`, the user cannot change `none`), as the goal of this tool is to avoid modifying incoming data.
#### Windows
There are two important windows we define in a retention string:
1. the "chunk" range of each query
2. The time range we will be querying on with that "chunk"
From our example, our windows are `1h:30d`.
##### Window "chunks"
The window `1h` means that each individual query to OpenTSDB should only span 1 hour of time (e.g. `start=2h-ago&end=1h-ago`).
It is important to ensure this window somewhat matches the row size in HBase to help improve query times.
For example, if the query is hitting a rollup table with a 4 hour row size, we should set a chunk size of a multiple of 4 hours (e.g. `4h`, `8h`, etc.) to avoid requesting data across row boundaries. Landing on row boundaries allows for more consistent request times to HBase.
The default table created in HBase for OpenTSDB has a 1 hour row size, so if you aren't sure on a correct row size to use, `1h` is a reasonable choice.
##### Time range
The time range `30d` simply means we are asking for the last 30 days of data. This time range can be written using `h`, `d`, `w`, or `y`. (We can't use `m` for month because it already means `minute` in time parsing).
#### Results of retention string
The resultant queries that will be created, based on our example retention string of `sum-1m-avg:1h:30d` look like this:
```
http://opentsdb:4242/api/query?start=1h-ago&end=now&m=sum:1m-avg-none:<series>
http://opentsdb:4242/api/query?start=2h-ago&end=1h-ago&m=sum:1m-avg-none:<series>
http://opentsdb:4242/api/query?start=3h-ago&end=2h-ago&m=sum:1m-avg-none:<series>
...
http://opentsdb:4242/api/query?start=721h-ago&end=720h-ago&m=sum:1m-avg-none:<series>
```
Chunking the data like this means each individual query returns faster, so we can start populating data into VictoriaMetrics quicker.
### Restarting OpenTSDB migrations
One important note for OpenTSDB migration: Queries/HBase scans can "get stuck" within OpenTSDB itself. This can cause instability and performance issues within an OpenTSDB cluster, so stopping the migrator to deal with it may be necessary. Because of this, we provide the timstamp we started collecting data from at thebeginning of the run. You can stop and restart the importer using this "hard timestamp" to ensure you collect data from the same time range over multiple runs.
## Migrating data from InfluxDB (1.x)
`vmctl` supports the `influx` mode to migrate data from InfluxDB to VictoriaMetrics time-series database.
See `./vmctl influx --help` for details and full list of flags.
To use migration tool please specify the InfluxDB address `--influx-addr`, the database `--influx-database` and VictoriaMetrics address `--vm-addr`.
Flag `--vm-addr` for single-node VM is usually equal to `--httpListenAddr`, and for cluster version
is equal to `--httpListenAddr` flag of vminsert component. Please note, that vmctl performs initial readiness check for the given address
by checking `/health` endpoint. For cluster version it is additionally required to specify the `--vm-account-id` flag.
See more details for cluster version [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
As soon as required flags are provided and all endpoints are accessible, `vmctl` will start the InfluxDB scheme exploration.
Basically, it just fetches all fields and timeseries from the provided database and builds up registry of all available timeseries.
Then `vmctl` sends fetch requests for each timeseries to InfluxDB one by one and pass results to VM importer.
VM importer then accumulates received samples in batches and sends import requests to VM.
The importing process example for local installation of InfluxDB(`http://localhost:8086`)
and single-node VictoriaMetrics(`http://localhost:8428`):
```
./vmctl influx --influx-database benchmark
InfluxDB import mode
2020/01/18 20:47:11 Exploring scheme for database "benchmark"
2020/01/18 20:47:11 fetching fields: command: "show field keys"; database: "benchmark"; retention: "autogen"
2020/01/18 20:47:11 found 10 fields
2020/01/18 20:47:11 fetching series: command: "show series "; database: "benchmark"; retention: "autogen"
Found 40000 timeseries to import. Continue? [Y/n] y
40000 / 40000 [-----------------------------------------------------------------------------------------------------------------------------------------------] 100.00% 21 p/s
2020/01/18 21:19:00 Import finished!
2020/01/18 21:19:00 VictoriaMetrics importer stats:
idle duration: 13m51.461434876s;
time spent while importing: 17m56.923899847s;
total samples: 345600000;
samples/s: 320914.04;
total bytes: 5.9 GB;
bytes/s: 5.4 MB;
import requests: 40001;
2020/01/18 21:19:00 Total time: 31m48.467044016s
```
### Data mapping
Vmctl maps Influx data the same way as VictoriaMetrics does by using the following rules:
* `influx-database` arg is mapped into `db` label value unless `db` tag exists in the Influx line.
* Field names are mapped to time series names prefixed with {measurement}{separator} value,
where {separator} equals to _ by default.
It can be changed with `--influx-measurement-field-separator` command-line flag.
* Field values are mapped to time series values.
* Tags are mapped to Prometheus labels format as-is.
For example, the following Influx line:
```
foo,tag1=value1,tag2=value2 field1=12,field2=40
```
is converted into the following Prometheus format data points:
```
foo_field1{tag1="value1", tag2="value2"} 12
foo_field2{tag1="value1", tag2="value2"} 40
```
### Configuration
The configuration flags should contain self-explanatory descriptions.
### Filtering
The filtering consists of two parts: timeseries and time.
The first step of application is to select all available timeseries
for given database and retention. User may specify additional filtering
condition via `--influx-filter-series` flag. For example:
```
./vmctl influx --influx-database benchmark \
--influx-filter-series "on benchmark from cpu where hostname='host_1703'"
InfluxDB import mode
2020/01/26 14:23:29 Exploring scheme for database "benchmark"
2020/01/26 14:23:29 fetching fields: command: "show field keys"; database: "benchmark"; retention: "autogen"
2020/01/26 14:23:29 found 12 fields
2020/01/26 14:23:29 fetching series: command: "show series on benchmark from cpu where hostname='host_1703'"; database: "benchmark"; retention: "autogen"
Found 10 timeseries to import. Continue? [Y/n]
```
The timeseries select query would be following:
`fetching series: command: "show series on benchmark from cpu where hostname='host_1703'"; database: "benchmark"; retention: "autogen"`
The second step of filtering is a time filter and it applies when fetching the datapoints from Influx.
Time filtering may be configured with two flags:
* --influx-filter-time-start
* --influx-filter-time-end
Here's an example of importing timeseries for one day only:
`./vmctl influx --influx-database benchmark --influx-filter-series "where hostname='host_1703'" --influx-filter-time-start "2020-01-01T10:07:00Z" --influx-filter-time-end "2020-01-01T15:07:00Z"`
Please see more about time filtering [here](https://docs.influxdata.com/influxdb/v1.7/query_language/schema_exploration#filter-meta-queries-by-time).
## Migrating data from InfluxDB (2.x)
Migrating data from InfluxDB v2.x is not supported yet ([#32](https://github.com/VictoriaMetrics/vmctl/issues/32)).
You may find useful a 3rd party solution for this - https://github.com/jonppe/influx_to_victoriametrics.
## Migrating data from Prometheus
`vmctl` supports the `prometheus` mode for migrating data from Prometheus to VictoriaMetrics time-series database.
Migration is based on reading Prometheus snapshot, which is basically a hard-link to Prometheus data files.
See `./vmctl prometheus --help` for details and full list of flags. Also see Prometheus related articles [here](#articles).
To use migration tool please specify the file path to Prometheus snapshot `--prom-snapshot` (see how to make a snapshot [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data)) and VictoriaMetrics address `--vm-addr`.
Please note, that `vmctl` *do not make a snapshot from Prometheus*, it uses an already prepared snapshot. More about Prometheus snapshots may be found [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data) and [here](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043).
Flag `--vm-addr` for single-node VM is usually equal to `--httpListenAddr`, and for cluster version
is equal to `--httpListenAddr` flag of vminsert component. Please note, that vmctl performs initial readiness check for the given address
by checking `/health` endpoint. For cluster version it is additionally required to specify the `--vm-account-id` flag.
See more details for cluster version [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
As soon as required flags are provided and all endpoints are accessible, `vmctl` will start the Prometheus snapshot exploration.
Basically, it just fetches all available blocks in provided snapshot and read the metadata. It also does initial filtering by time
if flags `--prom-filter-time-start` or `--prom-filter-time-end` were set. The exploration procedure prints some stats from read blocks.
Please note that stats are not taking into account timeseries or samples filtering. This will be done during importing process.
The importing process takes the snapshot blocks revealed from Explore procedure and processes them one by one
accumulating timeseries and samples. Please note, that `vmctl` relies on responses from Influx on this stage,
so ensure that Explore queries are executed without errors or limits. Please see this
[issue](https://github.com/VictoriaMetrics/vmctl/issues/30) for details.
The data processed in chunks and then sent to VM.
The importing process example for local installation of Prometheus
and single-node VictoriaMetrics(`http://localhost:8428`):
```
./vmctl prometheus --prom-snapshot=/path/to/snapshot \
--vm-concurrency=1 \
--vm-batch-size=200000 \
--prom-concurrency=3
Prometheus import mode
Prometheus snapshot stats:
blocks found: 14;
blocks skipped: 0;
min time: 1581288163058 (2020-02-09T22:42:43Z);
max time: 1582409128139 (2020-02-22T22:05:28Z);
samples: 32549106;
series: 27289.
Found 14 blocks to import. Continue? [Y/n] y
14 / 14 [-------------------------------------------------------------------------------------------] 100.00% 0 p/s
2020/02/23 15:50:03 Import finished!
2020/02/23 15:50:03 VictoriaMetrics importer stats:
idle duration: 6.152953029s;
time spent while importing: 44.908522491s;
total samples: 32549106;
samples/s: 724786.84;
total bytes: 669.1 MB;
bytes/s: 14.9 MB;
import requests: 323;
import requests retries: 0;
2020/02/23 15:50:03 Total time: 51.077451066s
```
### Data mapping
VictoriaMetrics has very similar data model to Prometheus and supports [RemoteWrite integration](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage).
So no data changes will be applied.
### Configuration
The configuration flags should contain self-explanatory descriptions.
### Filtering
The filtering consists of three parts: by timeseries and time.
Filtering by time may be configured via flags `--prom-filter-time-start` and `--prom-filter-time-end`
in in RFC3339 format. This filter applied twice: to drop blocks out of range and to filter timeseries in blocks with
overlapping time range.
Example of applying time filter:
```
./vmctl prometheus --prom-snapshot=/path/to/snapshot \
--prom-filter-time-start=2020-02-07T00:07:01Z \
--prom-filter-time-end=2020-02-11T00:07:01Z
Prometheus import mode
Prometheus snapshot stats:
blocks found: 2;
blocks skipped: 12;
min time: 1581288163058 (2020-02-09T22:42:43Z);
max time: 1581328800000 (2020-02-10T10:00:00Z);
samples: 1657698;
series: 3930.
Found 2 blocks to import. Continue? [Y/n] y
```
Please notice, that total amount of blocks in provided snapshot is 14, but only 2 of them were in provided
time range. So other 12 blocks were marked as `skipped`. The amount of samples and series is not taken into account,
since this is heavy operation and will be done during import process.
Filtering by timeseries is configured with following flags:
* `--prom-filter-label` - the label name, e.g. `__name__` or `instance`;
* `--prom-filter-label-value` - the regular expression to filter the label value. By default matches all `.*`
For example:
```
./vmctl prometheus --prom-snapshot=/path/to/snapshot \
--prom-filter-label="__name__" \
--prom-filter-label-value="promhttp.*" \
--prom-filter-time-start=2020-02-07T00:07:01Z \
--prom-filter-time-end=2020-02-11T00:07:01Z
Prometheus import mode
Prometheus snapshot stats:
blocks found: 2;
blocks skipped: 12;
min time: 1581288163058 (2020-02-09T22:42:43Z);
max time: 1581328800000 (2020-02-10T10:00:00Z);
samples: 1657698;
series: 3930.
Found 2 blocks to import. Continue? [Y/n] y
14 / 14 [------------------------------------------------------------------------------------------------------------------------------------------------------] 100.00% ? p/s
2020/02/23 15:51:07 Import finished!
2020/02/23 15:51:07 VictoriaMetrics importer stats:
idle duration: 0s;
time spent while importing: 37.415461ms;
total samples: 10128;
samples/s: 270690.24;
total bytes: 195.2 kB;
bytes/s: 5.2 MB;
import requests: 2;
import requests retries: 0;
2020/02/23 15:51:07 Total time: 7.153158218s
```
## Migrating data from Thanos
Thanos uses the same storage engine as Prometheus and the data layout on-disk should be the same. That means
`vmctl` in mode `prometheus` may be used for Thanos historical data migration as well.
These instructions may vary based on the details of your Thanos configuration.
Please read carefully and verify as you go. We assume you're using Thanos Sidecar on your Prometheus pods,
and that you have a separate Thanos Store installation.
### Current data
1. For now, keep your Thanos Sidecar and Thanos-related Prometheus configuration, but add this to also stream
metrics to VictoriaMetrics:
```
remote_write:
- url: http://victoria-metrics:8428/api/v1/write
```
2. Make sure VM is running, of course. Now check the logs to make sure that Prometheus is sending and VM is receiving.
In Prometheus, make sure there are no errors. On the VM side, you should see messages like this:
```
2020-04-27T18:38:46.474Z info VictoriaMetrics/lib/storage/partition.go:207 creating a partition "2020_04" with smallPartsPath="/victoria-metrics-data/data/small/2020_04", bigPartsPath="/victoria-metrics-data/data/big/2020_04"
2020-04-27T18:38:46.506Z info VictoriaMetrics/lib/storage/partition.go:222 partition "2020_04" has been created
```
3. Now just wait. Within two hours, Prometheus should finish its current data file and hand it off to Thanos Store for long term
storage.
### Historical data
Let's assume your data is stored on S3 served by minio. You first need to copy that out to a local filesystem,
then import it into VM using `vmctl` in `prometheus` mode.
1. Copy data from minio.
1. Run the `minio/mc` Docker container.
1. `mc config host add minio http://minio:9000 accessKey secretKey`, substituting appropriate values for the last 3 items.
1. `mc cp -r minio/prometheus thanos-data`
1. Import using `vmctl`.
1. Follow the [instructions](#how-to-build) to compile `vmctl` on your machine.
1. Use [prometheus](#migrating-data-from-prometheus) mode to import data:
```
vmctl prometheus --prom-snapshot thanos-data --vm-addr http://victoria-metrics:8428
```
## Migrating data from VictoriaMetrics
### Native protocol
The [native binary protocol](https://docs.victoriametrics.com/#how-to-export-data-in-native-format)
was introduced in [1.42.0 release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.42.0)
and provides the most efficient way to migrate data between VM instances: single to single, cluster to cluster,
single to cluster and vice versa. Please note that both instances (source and destination) should be of v1.42.0
or higher.
See `./vmctl vm-native --help` for details and full list of flags.
In this mode `vmctl` acts as a proxy between two VM instances, where time series filtering is done by "source" (`src`)
and processing is done by "destination" (`dst`). Because of that, `vmctl` doesn't actually know how much data will be
processed and can't show the progress bar. It will show the current processing speed and total number of processed bytes:
```
./vmctl vm-native --vm-native-src-addr=http://localhost:8528 \
--vm-native-dst-addr=http://localhost:8428 \
--vm-native-filter-match='{job="vmagent"}' \
--vm-native-filter-time-start='2020-01-01T20:07:00Z'
VictoriaMetrics Native import mode
Initing export pipe from "http://localhost:8528" with filters:
filter: match[]={job="vmagent"}
Initing import process to "http://localhost:8428":
Total: 336.75 KiB ↖ Speed: 454.46 KiB p/s
2020/10/13 17:04:59 Total time: 952.143376ms
```
Importing tips:
1. Migrating all the metrics from one VM to another may collide with existing application metrics
(prefixed with `vm_`) at destination and lead to confusion when using
[official Grafana dashboards](https://grafana.com/orgs/victoriametrics/dashboards).
To avoid such situation try to filter out VM process metrics via `--vm-native-filter-match` flag.
2. Migration is a backfilling process, so it is recommended to read
[Backfilling tips](https://github.com/VictoriaMetrics/VictoriaMetrics#backfilling) section.
3. `vmctl` doesn't provide relabeling or other types of labels management in this mode.
Instead, use [relabeling in VictoriaMetrics](https://github.com/VictoriaMetrics/vmctl/issues/4#issuecomment-683424375).
## Tuning
### Influx mode
The flag `--influx-concurrency` controls how many concurrent requests may be sent to InfluxDB while fetching
timeseries. Please set it wisely to avoid InfluxDB overwhelming.
The flag `--influx-chunk-size` controls the max amount of datapoints to return in single chunk from fetch requests.
Please see more details [here](https://docs.influxdata.com/influxdb/v1.7/guides/querying_data/#chunking).
The chunk size is used to control InfluxDB memory usage, so it won't OOM on processing large timeseries with
billions of datapoints.
### Prometheus mode
The flag `--prom-concurrency` controls how many concurrent readers will be reading the blocks in snapshot.
Since snapshots are just files on disk it would be hard to overwhelm the system. Please go with value equal
to number of free CPU cores.
### VictoriaMetrics importer
The flag `--vm-concurrency` controls the number of concurrent workers that process the input from InfluxDB query results.
Please note that each import request can load up to a single vCPU core on VictoriaMetrics. So try to set it according
to allocated CPU resources of your VictoriMetrics installation.
The flag `--vm-batch-size` controls max amount of samples collected before sending the import request.
For example, if `--influx-chunk-size=500` and `--vm-batch-size=2000` then importer will process not more
than 4 chunks before sending the request.
### Importer stats
After successful import `vmctl` prints some statistics for details.
The important numbers to watch are following:
- `idle duration` - shows time that importer spent while waiting for data from InfluxDB/Prometheus
to fill up `--vm-batch-size` batch size. Value shows total duration across all workers configured
via `--vm-concurrency`. High value may be a sign of too slow InfluxDB/Prometheus fetches or too
high `--vm-concurrency` value. Try to improve it by increasing `--<mode>-concurrency` value or
decreasing `--vm-concurrency` value.
- `import requests` - shows how many import requests were issued to VM server.
The import request is issued once the batch size(`--vm-batch-size`) is full and ready to be sent.
Please prefer big batch sizes (50k-500k) to improve performance.
- `import requests retries` - shows number of unsuccessful import requests. Non-zero value may be
a sign of network issues or VM being overloaded. See the logs during import for error messages.
### Silent mode
By default `vmctl` waits confirmation from user before starting the import. If this is unwanted
behavior and no user interaction required - pass `-s` flag to enable "silence" mode:
```
-s Whether to run in silent mode. If set to true no confirmation prompts will appear. (default: false)
```
### Significant figures
`vmctl` allows to limit the number of [significant figures](https://en.wikipedia.org/wiki/Significant_figures)
before importing. For example, the average value for response size is `102.342305` bytes and it has 9 significant figures.
If you ask a human to pronounce this value then with high probability value will be rounded to first 4 or 5 figures
because the rest aren't really that important to mention. In most cases, such a high precision is too much.
Moreover, such values may be just a result of [floating point arithmetic](https://en.wikipedia.org/wiki/Floating-point_arithmetic),
create a [false precision](https://en.wikipedia.org/wiki/False_precision) and result into bad compression ratio
according to [information theory](https://en.wikipedia.org/wiki/Information_theory).
`vmctl` provides the following flags for improving data compression:
* `--vm-round-digits` flag for rounding processed values to the given number of decimal digits after the point.
For example, `--vm-round-digits=2` would round `1.2345` to `1.23`. By default the rounding is disabled.
* `--vm-significant-figures` flag for limiting the number of significant figures in processed values. It takes no effect if set
to 0 (by default), but set `--vm-significant-figures=5` and `102.342305` will be rounded to `102.34`.
The most common case for using these flags is to improve data compression for time series storing aggregation
results such as `average`, `rate`, etc.
### Adding extra labels
`vmctl` allows to add extra labels to all imported series. It can be achived with flag `--vm-extra-label label=value`.
If multiple labels needs to be added, set flag for each label, for example, `--vm-extra-label label1=value1 --vm-extra-label label2=value2`.
If timeseries already have label, that must be added with `--vm-extra-label` flag, flag has priority and will override label value from timeseries.

View File

@@ -0,0 +1,6 @@
ARG base_image
FROM $base_image
ENTRYPOINT ["/vmctl-prod"]
ARG src_binary
COPY $src_binary ./vmctl-prod

364
app/vmctl/flags.go Normal file
View File

@@ -0,0 +1,364 @@
package main
import (
"fmt"
"github.com/urfave/cli/v2"
)
const (
globalSilent = "s"
)
var (
globalFlags = []cli.Flag{
&cli.BoolFlag{
Name: globalSilent,
Value: false,
Usage: "Whether to run in silent mode. If set to true no confirmation prompts will appear.",
},
}
)
const (
vmAddr = "vm-addr"
vmUser = "vm-user"
vmPassword = "vm-password"
vmAccountID = "vm-account-id"
vmConcurrency = "vm-concurrency"
vmCompress = "vm-compress"
vmBatchSize = "vm-batch-size"
vmSignificantFigures = "vm-significant-figures"
vmRoundDigits = "vm-round-digits"
vmExtraLabel = "vm-extra-label"
)
var (
vmFlags = []cli.Flag{
&cli.StringFlag{
Name: vmAddr,
Value: "http://localhost:8428",
Usage: "VictoriaMetrics address to perform import requests. \n" +
"Should be the same as --httpListenAddr value for single-node version or vminsert component. \n" +
"Please note, that `vmctl` performs initial readiness check for the given address by checking `/health` endpoint.",
},
&cli.StringFlag{
Name: vmUser,
Usage: "VictoriaMetrics username for basic auth",
EnvVars: []string{"VM_USERNAME"},
},
&cli.StringFlag{
Name: vmPassword,
Usage: "VictoriaMetrics password for basic auth",
EnvVars: []string{"VM_PASSWORD"},
},
&cli.StringFlag{
Name: vmAccountID,
Usage: "AccountID is an arbitrary 32-bit integer identifying namespace for data ingestion (aka tenant). \n" +
"It is possible to set it as accountID:projectID, where projectID is also arbitrary 32-bit integer. \n" +
"If projectID isn't set, then it equals to 0",
},
&cli.UintFlag{
Name: vmConcurrency,
Usage: "Number of workers concurrently performing import requests to VM",
Value: 2,
},
&cli.BoolFlag{
Name: vmCompress,
Value: true,
Usage: "Whether to apply gzip compression to import requests",
},
&cli.IntFlag{
Name: vmBatchSize,
Value: 200e3,
Usage: "How many samples importer collects before sending the import request to VM",
},
&cli.IntFlag{
Name: vmSignificantFigures,
Value: 0,
Usage: "The number of significant figures to leave in metric values before importing. " +
"See https://en.wikipedia.org/wiki/Significant_figures. Zero value saves all the significant figures. " +
"This option may be used for increasing on-disk compression level for the stored metrics. " +
"See also --vm-round-digits option",
},
&cli.IntFlag{
Name: vmRoundDigits,
Value: 100,
Usage: "Round metric values to the given number of decimal digits after the point. " +
"This option may be used for increasing on-disk compression level for the stored metrics",
},
&cli.StringSliceFlag{
Name: vmExtraLabel,
Value: nil,
Usage: "Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag" +
"will have priority. Flag can be set multiple times, to add few additional labels.",
},
}
)
const (
otsdbAddr = "otsdb-addr"
otsdbConcurrency = "otsdb-concurrency"
otsdbQueryLimit = "otsdb-query-limit"
otsdbOffsetDays = "otsdb-offset-days"
otsdbHardTSStart = "otsdb-hard-ts-start"
otsdbRetentions = "otsdb-retentions"
otsdbFilters = "otsdb-filters"
otsdbNormalize = "otsdb-normalize"
otsdbMsecsTime = "otsdb-msecstime"
)
var (
otsdbFlags = []cli.Flag{
&cli.StringFlag{
Name: otsdbAddr,
Value: "http://localhost:4242",
Required: true,
Usage: "OpenTSDB server addr",
},
&cli.IntFlag{
Name: otsdbConcurrency,
Usage: "Number of concurrently running fetch queries to OpenTSDB per metric",
Value: 1,
},
&cli.StringSliceFlag{
Name: otsdbRetentions,
Value: nil,
Required: true,
Usage: "Retentions patterns to collect on. Each pattern should describe the aggregation performed " +
"for the query, the row size (in HBase) that will define how long each individual query is, " +
"and the time range to query for. e.g. sum-1m-avg:1h:3d. " +
"The first time range defined should be a multiple of the row size in HBase. " +
"e.g. if the row size is 2 hours, 4h is good, 5h less so. We want each query to land on unique rows.",
},
&cli.StringSliceFlag{
Name: otsdbFilters,
Value: cli.NewStringSlice("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"),
Usage: "Filters to process for discovering metrics in OpenTSDB",
},
&cli.Int64Flag{
Name: otsdbOffsetDays,
Usage: "Days to offset our 'starting' point for collecting data from OpenTSDB",
Value: 0,
},
&cli.Int64Flag{
Name: otsdbHardTSStart,
Usage: "A specific timestamp to start from, will override using an offset",
Value: 0,
},
/*
because the defaults are set *extremely* low in OpenTSDB (10-25 results), we will
set a larger default limit, but still allow a user to increase/decrease it
*/
&cli.IntFlag{
Name: otsdbQueryLimit,
Usage: "Result limit on meta queries to OpenTSDB (affects both metric name and tag value queries, recommended to use a value exceeding your largest series)",
Value: 100e3,
},
&cli.BoolFlag{
Name: otsdbMsecsTime,
Value: false,
Usage: "Whether OpenTSDB is writing values in milliseconds or seconds",
},
&cli.BoolFlag{
Name: otsdbNormalize,
Value: false,
Usage: "Whether to normalize all data received to lower case before forwarding to VictoriaMetrics",
},
}
)
const (
influxAddr = "influx-addr"
influxUser = "influx-user"
influxPassword = "influx-password"
influxDB = "influx-database"
influxRetention = "influx-retention-policy"
influxChunkSize = "influx-chunk-size"
influxConcurrency = "influx-concurrency"
influxFilterSeries = "influx-filter-series"
influxFilterTimeStart = "influx-filter-time-start"
influxFilterTimeEnd = "influx-filter-time-end"
influxMeasurementFieldSeparator = "influx-measurement-field-separator"
)
var (
influxFlags = []cli.Flag{
&cli.StringFlag{
Name: influxAddr,
Value: "http://localhost:8086",
Usage: "Influx server addr",
},
&cli.StringFlag{
Name: influxUser,
Usage: "Influx user",
EnvVars: []string{"INFLUX_USERNAME"},
},
&cli.StringFlag{
Name: influxPassword,
Usage: "Influx user password",
EnvVars: []string{"INFLUX_PASSWORD"},
},
&cli.StringFlag{
Name: influxDB,
Usage: "Influx database",
Required: true,
},
&cli.StringFlag{
Name: influxRetention,
Usage: "Influx retention policy",
Value: "autogen",
},
&cli.IntFlag{
Name: influxChunkSize,
Usage: "The chunkSize defines max amount of series to be returned in one chunk",
Value: 10e3,
},
&cli.IntFlag{
Name: influxConcurrency,
Usage: "Number of concurrently running fetch queries to InfluxDB",
Value: 1,
},
&cli.StringFlag{
Name: influxFilterSeries,
Usage: "Influx filter expression to select series. E.g. \"from cpu where arch='x86' AND hostname='host_2753'\".\n" +
"See for details https://docs.influxdata.com/influxdb/v1.7/query_language/schema_exploration#show-series",
},
&cli.StringFlag{
Name: influxFilterTimeStart,
Usage: "The time filter to select timeseries with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'",
},
&cli.StringFlag{
Name: influxFilterTimeEnd,
Usage: "The time filter to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'",
},
&cli.StringFlag{
Name: influxMeasurementFieldSeparator,
Usage: "The {separator} symbol used to concatenate {measurement} and {field} names into series name {measurement}{separator}{field}.",
Value: "_",
},
}
)
const (
promSnapshot = "prom-snapshot"
promConcurrency = "prom-concurrency"
promFilterTimeStart = "prom-filter-time-start"
promFilterTimeEnd = "prom-filter-time-end"
promFilterLabel = "prom-filter-label"
promFilterLabelValue = "prom-filter-label-value"
)
var (
promFlags = []cli.Flag{
&cli.StringFlag{
Name: promSnapshot,
Usage: "Path to Prometheus snapshot. Pls see for details https://www.robustperception.io/taking-snapshots-of-prometheus-data",
Required: true,
},
&cli.IntFlag{
Name: promConcurrency,
Usage: "Number of concurrently running snapshot readers",
Value: 1,
},
&cli.StringFlag{
Name: promFilterTimeStart,
Usage: "The time filter in RFC3339 format to select timeseries with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'",
},
&cli.StringFlag{
Name: promFilterTimeEnd,
Usage: "The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'",
},
&cli.StringFlag{
Name: promFilterLabel,
Usage: "Prometheus label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.",
},
&cli.StringFlag{
Name: promFilterLabelValue,
Usage: fmt.Sprintf("Prometheus regular expression to filter label from %q flag.", promFilterLabel),
Value: ".*",
},
}
)
const (
vmNativeFilterMatch = "vm-native-filter-match"
vmNativeFilterTimeStart = "vm-native-filter-time-start"
vmNativeFilterTimeEnd = "vm-native-filter-time-end"
vmNativeSrcAddr = "vm-native-src-addr"
vmNativeSrcUser = "vm-native-src-user"
vmNativeSrcPassword = "vm-native-src-password"
vmNativeDstAddr = "vm-native-dst-addr"
vmNativeDstUser = "vm-native-dst-user"
vmNativeDstPassword = "vm-native-dst-password"
)
var (
vmNativeFlags = []cli.Flag{
&cli.StringFlag{
Name: vmNativeFilterMatch,
Usage: "Time series selector to match series for export. For example, select {instance!=\"localhost\"} will " +
"match all series with \"instance\" label different to \"localhost\".\n" +
" See more details here https://github.com/VictoriaMetrics/VictoriaMetrics#how-to-export-data-in-native-format",
Value: `{__name__!=""}`,
},
&cli.StringFlag{
Name: vmNativeFilterTimeStart,
Usage: "The time filter may contain either unix timestamp in seconds or RFC3339 values. E.g. '2020-01-01T20:07:00Z'",
},
&cli.StringFlag{
Name: vmNativeFilterTimeEnd,
Usage: "The time filter may contain either unix timestamp in seconds or RFC3339 values. E.g. '2020-01-01T20:07:00Z'",
},
&cli.StringFlag{
Name: vmNativeSrcAddr,
Usage: "VictoriaMetrics address to perform export from. \n" +
" Should be the same as --httpListenAddr value for single-node version or vmselect component." +
" If exporting from cluster version - include the tenet token in address.",
Required: true,
},
&cli.StringFlag{
Name: vmNativeSrcUser,
Usage: "VictoriaMetrics username for basic auth",
EnvVars: []string{"VM_NATIVE_SRC_USERNAME"},
},
&cli.StringFlag{
Name: vmNativeSrcPassword,
Usage: "VictoriaMetrics password for basic auth",
EnvVars: []string{"VM_NATIVE_SRC_PASSWORD"},
},
&cli.StringFlag{
Name: vmNativeDstAddr,
Usage: "VictoriaMetrics address to perform import to. \n" +
" Should be the same as --httpListenAddr value for single-node version or vminsert component." +
" If importing into cluster version - include the tenet token in address.",
Required: true,
},
&cli.StringFlag{
Name: vmNativeDstUser,
Usage: "VictoriaMetrics username for basic auth",
EnvVars: []string{"VM_NATIVE_DST_USERNAME"},
},
&cli.StringFlag{
Name: vmNativeDstPassword,
Usage: "VictoriaMetrics password for basic auth",
EnvVars: []string{"VM_NATIVE_DST_PASSWORD"},
},
&cli.StringSliceFlag{
Name: vmExtraLabel,
Value: nil,
Usage: "Extra labels, that will be added to imported timeseries. In case of collision, label value defined by flag" +
"will have priority. Flag can be set multiple times, to add few additional labels.",
},
}
)
func mergeFlags(flags ...[]cli.Flag) []cli.Flag {
var result []cli.Flag
for _, f := range flags {
result = append(result, f...)
}
return result
}

146
app/vmctl/influx.go Normal file
View File

@@ -0,0 +1,146 @@
package main
import (
"fmt"
"io"
"log"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/cheggaaa/pb/v3"
)
type influxProcessor struct {
ic *influx.Client
im *vm.Importer
cc int
separator string
}
func newInfluxProcessor(ic *influx.Client, im *vm.Importer, cc int, separator string) *influxProcessor {
if cc < 1 {
cc = 1
}
return &influxProcessor{
ic: ic,
im: im,
cc: cc,
separator: separator,
}
}
func (ip *influxProcessor) run(silent bool) error {
series, err := ip.ic.Explore()
if err != nil {
return fmt.Errorf("explore query failed: %s", err)
}
if len(series) < 1 {
return fmt.Errorf("found no timeseries to import")
}
question := fmt.Sprintf("Found %d timeseries to import. Continue?", len(series))
if !silent && !prompt(question) {
return nil
}
bar := pb.StartNew(len(series))
seriesCh := make(chan *influx.Series)
errCh := make(chan error)
ip.im.ResetStats()
var wg sync.WaitGroup
wg.Add(ip.cc)
for i := 0; i < ip.cc; i++ {
go func() {
defer wg.Done()
for s := range seriesCh {
if err := ip.do(s); err != nil {
errCh <- fmt.Errorf("request failed for %q.%q: %s", s.Measurement, s.Field, err)
return
}
bar.Increment()
}
}()
}
// any error breaks the import
for _, s := range series {
select {
case infErr := <-errCh:
return fmt.Errorf("influx error: %s", infErr)
case vmErr := <-ip.im.Errors():
return fmt.Errorf("Import process failed: \n%s", wrapErr(vmErr))
case seriesCh <- s:
}
}
close(seriesCh)
wg.Wait()
ip.im.Close()
// drain import errors channel
for vmErr := range ip.im.Errors() {
return fmt.Errorf("Import process failed: \n%s", wrapErr(vmErr))
}
bar.Finish()
log.Println("Import finished!")
log.Print(ip.im.Stats())
return nil
}
const dbLabel = "db"
func (ip *influxProcessor) do(s *influx.Series) error {
cr, err := ip.ic.FetchDataPoints(s)
if err != nil {
return fmt.Errorf("failed to fetch datapoints: %s", err)
}
defer func() {
_ = cr.Close()
}()
var name string
if s.Measurement != "" {
name = fmt.Sprintf("%s%s%s", s.Measurement, ip.separator, s.Field)
} else {
name = s.Field
}
labels := make([]vm.LabelPair, len(s.LabelPairs))
var containsDBLabel bool
for i, lp := range s.LabelPairs {
if lp.Name == dbLabel {
containsDBLabel = true
break
}
labels[i] = vm.LabelPair{
Name: lp.Name,
Value: lp.Value,
}
}
if !containsDBLabel {
labels = append(labels, vm.LabelPair{
Name: dbLabel,
Value: ip.ic.Database(),
})
}
for {
time, values, err := cr.Next()
if err != nil {
if err == io.EOF {
return nil
}
return err
}
// skip empty results
if len(time) < 1 {
continue
}
ip.im.Input() <- &vm.TimeSeries{
Name: name,
LabelPairs: labels,
Timestamps: time,
Values: values,
}
}
}

362
app/vmctl/influx/influx.go Normal file
View File

@@ -0,0 +1,362 @@
package influx
import (
"fmt"
"io"
"log"
"strings"
"time"
influx "github.com/influxdata/influxdb/client/v2"
)
// Client represents a wrapper over
// influx HTTP client
type Client struct {
influx.Client
database string
retention string
chunkSize int
filterSeries string
filterTime string
}
// Config contains fields required
// for Client configuration
type Config struct {
Addr string
Username string
Password string
Database string
Retention string
ChunkSize int
Filter Filter
}
// Filter contains configuration for filtering
// the timeseries
type Filter struct {
Series string
TimeStart string
TimeEnd string
}
// Series holds the time series
type Series struct {
Measurement string
Field string
LabelPairs []LabelPair
}
var valueEscaper = strings.NewReplacer(`\`, `\\`, `'`, `\'`)
func (s Series) fetchQuery(timeFilter string) string {
f := &strings.Builder{}
fmt.Fprintf(f, "select %q from %q", s.Field, s.Measurement)
if len(s.LabelPairs) > 0 || len(timeFilter) > 0 {
f.WriteString(" where")
}
for i, pair := range s.LabelPairs {
pairV := valueEscaper.Replace(pair.Value)
fmt.Fprintf(f, " %q::tag='%s'", pair.Name, pairV)
if i != len(s.LabelPairs)-1 {
f.WriteString(" and")
}
}
if len(timeFilter) > 0 {
if len(s.LabelPairs) > 0 {
f.WriteString(" and")
}
fmt.Fprintf(f, " %s", timeFilter)
}
return f.String()
}
// LabelPair is the key-value record
// of time series label
type LabelPair struct {
Name string
Value string
}
// NewClient creates and returns influx client
// configured with passed Config
func NewClient(cfg Config) (*Client, error) {
c := influx.HTTPConfig{
Addr: cfg.Addr,
Username: cfg.Username,
Password: cfg.Password,
InsecureSkipVerify: true,
}
hc, err := influx.NewHTTPClient(c)
if err != nil {
return nil, fmt.Errorf("failed to establish conn: %s", err)
}
if _, _, err := hc.Ping(time.Second); err != nil {
return nil, fmt.Errorf("ping failed: %s", err)
}
chunkSize := cfg.ChunkSize
if chunkSize < 1 {
chunkSize = 10e3
}
client := &Client{
Client: hc,
database: cfg.Database,
retention: cfg.Retention,
chunkSize: chunkSize,
filterTime: timeFilter(cfg.Filter.TimeStart, cfg.Filter.TimeEnd),
filterSeries: cfg.Filter.Series,
}
return client, nil
}
// Database returns database name
func (c Client) Database() string {
return c.database
}
func timeFilter(start, end string) string {
if start == "" && end == "" {
return ""
}
var tf string
if start != "" {
tf = fmt.Sprintf("time >= '%s'", start)
}
if end != "" {
if tf != "" {
tf += " and "
}
tf += fmt.Sprintf("time <= '%s'", end)
}
return tf
}
// Explore checks the existing data schema in influx
// by checking available fields and series,
// which unique combination represents all possible
// time series existing in database.
// The explore required to reduce the load on influx
// by querying field of the exact time series at once,
// instead of fetching all of the values over and over.
//
// May contain non-existing time series.
func (c *Client) Explore() ([]*Series, error) {
log.Printf("Exploring scheme for database %q", c.database)
mFields, err := c.fieldsByMeasurement()
if err != nil {
return nil, fmt.Errorf("failed to get field keys: %s", err)
}
series, err := c.getSeries()
if err != nil {
return nil, fmt.Errorf("failed to get series: %s", err)
}
var iSeries []*Series
for _, s := range series {
fields, ok := mFields[s.Measurement]
if !ok {
return nil, fmt.Errorf("can't find field keys for measurement %q", s.Measurement)
}
for _, field := range fields {
is := &Series{
Measurement: s.Measurement,
Field: field,
LabelPairs: s.LabelPairs,
}
iSeries = append(iSeries, is)
}
}
return iSeries, nil
}
// ChunkedResponse is a wrapper over influx.ChunkedResponse.
// Used for better memory usage control while iterating
// over huge time series.
type ChunkedResponse struct {
cr *influx.ChunkedResponse
iq influx.Query
field string
}
// Close closes cr.
func (cr *ChunkedResponse) Close() error {
return cr.cr.Close()
}
// Next reads the next part/chunk of time series.
// Returns io.EOF when time series was read entirely.
func (cr *ChunkedResponse) Next() ([]int64, []float64, error) {
resp, err := cr.cr.NextResponse()
if err != nil {
return nil, nil, err
}
if resp.Error() != nil {
return nil, nil, fmt.Errorf("response error for %s: %s", cr.iq.Command, resp.Error())
}
if len(resp.Results) != 1 {
return nil, nil, fmt.Errorf("unexpected number of results in response: %d", len(resp.Results))
}
results, err := parseResult(resp.Results[0])
if err != nil {
return nil, nil, err
}
if len(results) < 1 {
return nil, nil, nil
}
r := results[0]
const key = "time"
timestamps, ok := r.values[key]
if !ok {
return nil, nil, fmt.Errorf("response doesn't contain field %q", key)
}
fieldValues, ok := r.values[cr.field]
if !ok {
return nil, nil, fmt.Errorf("response doesn't contain filed %q", cr.field)
}
values := make([]float64, len(fieldValues))
for i, fv := range fieldValues {
v, err := toFloat64(fv)
if err != nil {
return nil, nil, fmt.Errorf("failed to convert value %q.%v to float64: %s",
cr.field, v, err)
}
values[i] = v
}
ts := make([]int64, len(results[0].values[key]))
for i, v := range timestamps {
t, err := parseDate(v.(string))
if err != nil {
return nil, nil, err
}
ts[i] = t
}
return ts, values, nil
}
// FetchDataPoints performs SELECT request to fetch
// datapoints for particular field.
func (c *Client) FetchDataPoints(s *Series) (*ChunkedResponse, error) {
iq := influx.Query{
Command: s.fetchQuery(c.filterTime),
Database: c.database,
RetentionPolicy: c.retention,
Chunked: true,
ChunkSize: 1e4,
}
cr, err := c.QueryAsChunk(iq)
if err != nil {
return nil, fmt.Errorf("query %q err: %s", iq.Command, err)
}
return &ChunkedResponse{cr, iq, s.Field}, nil
}
func (c *Client) fieldsByMeasurement() (map[string][]string, error) {
q := influx.Query{
Command: "show field keys",
Database: c.database,
RetentionPolicy: c.retention,
}
log.Printf("fetching fields: %s", stringify(q))
qValues, err := c.do(q)
if err != nil {
return nil, fmt.Errorf("error while executing query %q: %s", q.Command, err)
}
var total int
var skipped int
const fKey = "fieldKey"
const fType = "fieldType"
result := make(map[string][]string, len(qValues))
for _, qv := range qValues {
types := qv.values[fType]
fields := qv.values[fKey]
values := make([]string, 0)
for key, field := range fields {
if types[key].(string) == "string" {
skipped++
continue
}
values = append(values, field.(string))
total++
}
result[qv.name] = values
}
if skipped > 0 {
log.Printf("found %d fields; skipped %d non-numeric fields", total, skipped)
} else {
log.Printf("found %d fields", total)
}
return result, nil
}
func (c *Client) getSeries() ([]*Series, error) {
com := "show series"
if c.filterSeries != "" {
com = fmt.Sprintf("%s %s", com, c.filterSeries)
}
q := influx.Query{
Command: com,
Database: c.database,
RetentionPolicy: c.retention,
Chunked: true,
ChunkSize: c.chunkSize,
}
log.Printf("fetching series: %s", stringify(q))
cr, err := c.QueryAsChunk(q)
if err != nil {
return nil, fmt.Errorf("error while executing query %q: %s", q.Command, err)
}
const key = "key"
var result []*Series
for {
resp, err := cr.NextResponse()
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
if resp.Error() != nil {
return nil, fmt.Errorf("response error for query %q: %s", q.Command, resp.Error())
}
qValues, err := parseResult(resp.Results[0])
if err != nil {
return nil, err
}
for _, qv := range qValues {
for _, v := range qv.values[key] {
s := &Series{}
if err := s.unmarshal(v.(string)); err != nil {
return nil, err
}
result = append(result, s)
}
}
}
log.Printf("found %d series", len(result))
return result, nil
}
func (c *Client) do(q influx.Query) ([]queryValues, error) {
res, err := c.Query(q)
if err != nil {
return nil, fmt.Errorf("query %q err: %s", q.Command, err)
}
if len(res.Results) < 1 {
return nil, fmt.Errorf("exploration query %q returned 0 results", q.Command)
}
return parseResult(res.Results[0])
}

View File

@@ -0,0 +1,127 @@
package influx
import "testing"
func TestFetchQuery(t *testing.T) {
testCases := []struct {
s Series
timeFilter string
expected string
}{
{
s: Series{
Measurement: "cpu",
Field: "value",
LabelPairs: []LabelPair{
{
Name: "foo",
Value: "bar",
},
},
},
expected: `select "value" from "cpu" where "foo"::tag='bar'`,
},
{
s: Series{
Measurement: "cpu",
Field: "value",
LabelPairs: []LabelPair{
{
Name: "foo",
Value: "bar",
},
{
Name: "baz",
Value: "qux",
},
},
},
expected: `select "value" from "cpu" where "foo"::tag='bar' and "baz"::tag='qux'`,
},
{
s: Series{
Measurement: "cpu",
Field: "value",
LabelPairs: []LabelPair{
{
Name: "foo",
Value: "b'ar",
},
},
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where "foo"::tag='b\'ar' and time >= now()`,
},
{
s: Series{
Measurement: "cpu",
Field: "value",
LabelPairs: []LabelPair{
{
Name: "name",
Value: `dev-mapper-centos\x2dswap.swap`,
},
{
Name: "state",
Value: "dev-mapp'er-c'en'tos",
},
},
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where "name"::tag='dev-mapper-centos\\x2dswap.swap' and "state"::tag='dev-mapp\'er-c\'en\'tos' and time >= now()`,
},
{
s: Series{
Measurement: "cpu",
Field: "value",
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where time >= now()`,
},
{
s: Series{
Measurement: "cpu",
Field: "value",
},
expected: `select "value" from "cpu"`,
},
}
for _, tc := range testCases {
query := tc.s.fetchQuery(tc.timeFilter)
if query != tc.expected {
t.Fatalf("got: \n%s;\nexpected: \n%s", query, tc.expected)
}
}
}
func TestTimeFilter(t *testing.T) {
testCases := []struct {
start string
end string
expected string
}{
{
start: "2020-01-01T20:07:00Z",
end: "2020-01-01T21:07:00Z",
expected: "time >= '2020-01-01T20:07:00Z' and time <= '2020-01-01T21:07:00Z'",
},
{
expected: "",
},
{
start: "2020-01-01T20:07:00Z",
expected: "time >= '2020-01-01T20:07:00Z'",
},
{
end: "2020-01-01T21:07:00Z",
expected: "time <= '2020-01-01T21:07:00Z'",
},
}
for _, tc := range testCases {
f := timeFilter(tc.start, tc.end)
if f != tc.expected {
t.Fatalf("got: \n%q;\nexpected: \n%q", f, tc.expected)
}
}
}

191
app/vmctl/influx/parser.go Normal file
View File

@@ -0,0 +1,191 @@
package influx
import (
"encoding/json"
"fmt"
"strconv"
"strings"
"time"
influx "github.com/influxdata/influxdb/client/v2"
)
type queryValues struct {
name string
values map[string][]interface{}
}
func parseResult(r influx.Result) ([]queryValues, error) {
if len(r.Err) > 0 {
return nil, fmt.Errorf("result error: %s", r.Err)
}
qValues := make([]queryValues, len(r.Series))
for i, row := range r.Series {
values := make(map[string][]interface{}, len(row.Values))
for _, value := range row.Values {
for idx, v := range value {
key := row.Columns[idx]
values[key] = append(values[key], v)
}
}
qValues[i] = queryValues{
name: row.Name,
values: values,
}
}
return qValues, nil
}
func toFloat64(v interface{}) (float64, error) {
switch i := v.(type) {
case json.Number:
return i.Float64()
case float64:
return i, nil
case float32:
return float64(i), nil
case int64:
return float64(i), nil
case int32:
return float64(i), nil
case int:
return float64(i), nil
case uint64:
return float64(i), nil
case uint32:
return float64(i), nil
case uint:
return float64(i), nil
case string:
return strconv.ParseFloat(i, 64)
default:
return 0, fmt.Errorf("unexpected value type %v", i)
}
}
func parseDate(dateStr string) (int64, error) {
startTime, err := time.Parse(time.RFC3339, dateStr)
if err != nil {
return 0, fmt.Errorf("cannot parse %q: %s", dateStr, err)
}
return startTime.UnixNano() / 1e6, nil
}
func stringify(q influx.Query) string {
return fmt.Sprintf("command: %q; database: %q; retention: %q",
q.Command, q.Database, q.RetentionPolicy)
}
func (s *Series) unmarshal(v string) error {
noEscapeChars := strings.IndexByte(v, '\\') < 0
n := nextUnescapedChar(v, ',', noEscapeChars)
if n < 0 {
s.Measurement = unescapeTagValue(v, noEscapeChars)
return nil
}
s.Measurement = unescapeTagValue(v[:n], noEscapeChars)
var err error
s.LabelPairs, err = unmarshalTags(v[n+1:], noEscapeChars)
if err != nil {
return fmt.Errorf("failed to unmarhsal tags: %s", err)
}
return nil
}
func unmarshalTags(s string, noEscapeChars bool) ([]LabelPair, error) {
var result []LabelPair
for {
lp := LabelPair{}
n := nextUnescapedChar(s, ',', noEscapeChars)
if n < 0 {
if err := lp.unmarshal(s, noEscapeChars); err != nil {
return nil, err
}
if len(lp.Name) == 0 || len(lp.Value) == 0 {
return nil, nil
}
result = append(result, lp)
return result, nil
}
if err := lp.unmarshal(s[:n], noEscapeChars); err != nil {
return nil, err
}
s = s[n+1:]
if len(lp.Name) == 0 || len(lp.Value) == 0 {
continue
}
result = append(result, lp)
}
}
func (lp *LabelPair) unmarshal(s string, noEscapeChars bool) error {
n := nextUnescapedChar(s, '=', noEscapeChars)
if n < 0 {
return fmt.Errorf("missing tag value for %q", s)
}
lp.Name = unescapeTagValue(s[:n], noEscapeChars)
lp.Value = unescapeTagValue(s[n+1:], noEscapeChars)
return nil
}
func unescapeTagValue(s string, noEscapeChars bool) string {
if noEscapeChars {
// Fast path - no escape chars.
return s
}
n := strings.IndexByte(s, '\\')
if n < 0 {
return s
}
// Slow path. Remove escape chars.
dst := make([]byte, 0, len(s))
for {
dst = append(dst, s[:n]...)
s = s[n+1:]
if len(s) == 0 {
return string(append(dst, '\\'))
}
ch := s[0]
if ch != ' ' && ch != ',' && ch != '=' && ch != '\\' {
dst = append(dst, '\\')
}
dst = append(dst, ch)
s = s[1:]
n = strings.IndexByte(s, '\\')
if n < 0 {
return string(append(dst, s...))
}
}
}
func nextUnescapedChar(s string, ch byte, noEscapeChars bool) int {
if noEscapeChars {
// Fast path: just search for ch in s, since s has no escape chars.
return strings.IndexByte(s, ch)
}
sOrig := s
again:
n := strings.IndexByte(s, ch)
if n < 0 {
return -1
}
if n == 0 {
return len(sOrig) - len(s) + n
}
if s[n-1] != '\\' {
return len(sOrig) - len(s) + n
}
nOrig := n
slashes := 0
for n > 0 && s[n-1] == '\\' {
slashes++
n--
}
if slashes&1 == 0 {
return len(sOrig) - len(s) + nOrig
}
s = s[nOrig+1:]
goto again
}

View File

@@ -0,0 +1,60 @@
package influx
import (
"reflect"
"testing"
)
func TestSeries_Unmarshal(t *testing.T) {
tag := func(name, value string) LabelPair {
return LabelPair{
Name: name,
Value: value,
}
}
series := func(measurement string, lp ...LabelPair) Series {
return Series{
Measurement: measurement,
LabelPairs: lp,
}
}
testCases := []struct {
got string
want Series
}{
{
got: "cpu",
want: series("cpu"),
},
{
got: "cpu,host=localhost",
want: series("cpu", tag("host", "localhost")),
},
{
got: "cpu,host=localhost,instance=instance",
want: series("cpu", tag("host", "localhost"), tag("instance", "instance")),
},
{
got: `fo\,bar\=baz,x\=\b=\\a\,\=\q\ `,
want: series("fo,bar=baz", tag(`x=\b`, `\a,=\q `)),
},
{
got: "cpu,host=192.168.0.1,instance=fe80::fdc8:5e36:c2c6:baac%utun1",
want: series("cpu", tag("host", "192.168.0.1"), tag("instance", "fe80::fdc8:5e36:c2c6:baac%utun1")),
},
{
got: `cpu,db=db1,host=localhost,server=host\=localhost\ user\=user\ `,
want: series("cpu", tag("db", "db1"),
tag("host", "localhost"), tag("server", "host=localhost user=user ")),
},
}
for _, tc := range testCases {
s := Series{}
if err := s.unmarshal(tc.got); err != nil {
t.Fatalf("%q: unmarshal err: %s", tc.got, err)
}
if !reflect.DeepEqual(s, tc.want) {
t.Fatalf("%q: expected\n%#v\nto be equal\n%#v", tc.got, s, tc.want)
}
}
}

192
app/vmctl/main.go Normal file
View File

@@ -0,0 +1,192 @@
package main
import (
"fmt"
"log"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/urfave/cli/v2"
)
func main() {
start := time.Now()
app := &cli.App{
Name: "vmctl",
Usage: "VictoriaMetrics command-line tool",
Version: buildinfo.Version,
Commands: []*cli.Command{
{
Name: "opentsdb",
Usage: "Migrate timeseries from OpenTSDB",
Flags: mergeFlags(globalFlags, otsdbFlags, vmFlags),
Action: func(c *cli.Context) error {
fmt.Println("OpenTSDB import mode")
oCfg := opentsdb.Config{
Addr: c.String(otsdbAddr),
Limit: c.Int(otsdbQueryLimit),
Offset: c.Int64(otsdbOffsetDays),
HardTS: c.Int64(otsdbHardTSStart),
Retentions: c.StringSlice(otsdbRetentions),
Filters: c.StringSlice(otsdbFilters),
Normalize: c.Bool(otsdbNormalize),
MsecsTime: c.Bool(otsdbMsecsTime),
}
otsdbClient, err := opentsdb.NewClient(oCfg)
if err != nil {
return fmt.Errorf("failed to create opentsdb client: %s", err)
}
vmCfg := initConfigVM(c)
importer, err := vm.NewImporter(vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
otsdbProcessor := newOtsdbProcessor(otsdbClient, importer, c.Int(otsdbConcurrency))
return otsdbProcessor.run(c.Bool(globalSilent))
},
},
{
Name: "influx",
Usage: "Migrate timeseries from InfluxDB",
Flags: mergeFlags(globalFlags, influxFlags, vmFlags),
Action: func(c *cli.Context) error {
fmt.Println("InfluxDB import mode")
iCfg := influx.Config{
Addr: c.String(influxAddr),
Username: c.String(influxUser),
Password: c.String(influxPassword),
Database: c.String(influxDB),
Retention: c.String(influxRetention),
Filter: influx.Filter{
Series: c.String(influxFilterSeries),
TimeStart: c.String(influxFilterTimeStart),
TimeEnd: c.String(influxFilterTimeEnd),
},
ChunkSize: c.Int(influxChunkSize),
}
influxClient, err := influx.NewClient(iCfg)
if err != nil {
return fmt.Errorf("failed to create influx client: %s", err)
}
vmCfg := initConfigVM(c)
importer, err := vm.NewImporter(vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
processor := newInfluxProcessor(influxClient, importer,
c.Int(influxConcurrency), c.String(influxMeasurementFieldSeparator))
return processor.run(c.Bool(globalSilent))
},
},
{
Name: "prometheus",
Usage: "Migrate timeseries from Prometheus",
Flags: mergeFlags(globalFlags, promFlags, vmFlags),
Action: func(c *cli.Context) error {
fmt.Println("Prometheus import mode")
vmCfg := initConfigVM(c)
importer, err := vm.NewImporter(vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
promCfg := prometheus.Config{
Snapshot: c.String(promSnapshot),
Filter: prometheus.Filter{
TimeMin: c.String(promFilterTimeStart),
TimeMax: c.String(promFilterTimeEnd),
Label: c.String(promFilterLabel),
LabelValue: c.String(promFilterLabelValue),
},
}
cl, err := prometheus.NewClient(promCfg)
if err != nil {
return fmt.Errorf("failed to create prometheus client: %s", err)
}
pp := prometheusProcessor{
cl: cl,
im: importer,
cc: c.Int(promConcurrency),
}
return pp.run(c.Bool(globalSilent))
},
},
{
Name: "vm-native",
Usage: "Migrate time series between VictoriaMetrics installations via native binary format",
Flags: vmNativeFlags,
Action: func(c *cli.Context) error {
fmt.Println("VictoriaMetrics Native import mode")
if c.String(vmNativeFilterMatch) == "" {
return fmt.Errorf("flag %q can't be empty", vmNativeFilterMatch)
}
p := vmNativeProcessor{
filter: filter{
match: c.String(vmNativeFilterMatch),
timeStart: c.String(vmNativeFilterTimeStart),
timeEnd: c.String(vmNativeFilterTimeEnd),
},
src: &vmNativeClient{
addr: strings.Trim(c.String(vmNativeSrcAddr), "/"),
user: c.String(vmNativeSrcUser),
password: c.String(vmNativeSrcPassword),
},
dst: &vmNativeClient{
addr: strings.Trim(c.String(vmNativeDstAddr), "/"),
user: c.String(vmNativeDstUser),
password: c.String(vmNativeDstPassword),
extraLabels: c.StringSlice(vmExtraLabel),
},
}
return p.run()
},
},
},
}
c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-c
fmt.Println("\r- Execution cancelled")
os.Exit(0)
}()
err := app.Run(os.Args)
if err != nil {
log.Fatal(err)
}
log.Printf("Total time: %v", time.Since(start))
}
func initConfigVM(c *cli.Context) vm.Config {
return vm.Config{
Addr: c.String(vmAddr),
User: c.String(vmUser),
Password: c.String(vmPassword),
Concurrency: uint8(c.Int(vmConcurrency)),
Compress: c.Bool(vmCompress),
AccountID: c.String(vmAccountID),
BatchSize: c.Int(vmBatchSize),
SignificantFigures: c.Int(vmSignificantFigures),
RoundDigits: c.Int(vmRoundDigits),
ExtraLabels: c.StringSlice(vmExtraLabel),
}
}

View File

@@ -0,0 +1,11 @@
# See https://medium.com/on-docker/use-multi-stage-builds-to-inject-ca-certs-ad1e8f01de1b
ARG certs_image
ARG root_image
FROM $certs_image as certs
RUN apk --update --no-cache add ca-certificates
FROM $root_image
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
ENTRYPOINT ["/vmctl-prod"]
ARG TARGETARCH
COPY vmctl-${TARGETARCH}-prod ./vmctl-prod

164
app/vmctl/opentsdb.go Normal file
View File

@@ -0,0 +1,164 @@
package main
import (
"fmt"
"log"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/cheggaaa/pb/v3"
)
type otsdbProcessor struct {
oc *opentsdb.Client
im *vm.Importer
otsdbcc int
}
type queryObj struct {
Series opentsdb.Meta
Rt opentsdb.RetentionMeta
Tr opentsdb.TimeRange
StartTime int64
}
func newOtsdbProcessor(oc *opentsdb.Client, im *vm.Importer, otsdbcc int) *otsdbProcessor {
if otsdbcc < 1 {
otsdbcc = 1
}
return &otsdbProcessor{
oc: oc,
im: im,
otsdbcc: otsdbcc,
}
}
func (op *otsdbProcessor) run(silent bool) error {
log.Println("Loading all metrics from OpenTSDB for filters: ", op.oc.Filters)
var metrics []string
for _, filter := range op.oc.Filters {
q := fmt.Sprintf("%s/api/suggest?type=metrics&q=%s&max=%d", op.oc.Addr, filter, op.oc.Limit)
m, err := op.oc.FindMetrics(q)
if err != nil {
return fmt.Errorf("metric discovery failed for %q: %s", q, err)
}
metrics = append(metrics, m...)
}
if len(metrics) < 1 {
return fmt.Errorf("found no timeseries to import with filters %q", op.oc.Filters)
}
question := fmt.Sprintf("Found %d metrics to import. Continue?", len(metrics))
if !silent && !prompt(question) {
return nil
}
op.im.ResetStats()
var startTime int64
if op.oc.HardTS != 0 {
startTime = op.oc.HardTS
} else {
startTime = time.Now().Unix()
}
queryRanges := 0
// pre-calculate the number of query ranges we'll be processing
for _, rt := range op.oc.Retentions {
queryRanges += len(rt.QueryRanges)
}
for _, metric := range metrics {
log.Println(fmt.Sprintf("Starting work on %s", metric))
serieslist, err := op.oc.FindSeries(metric)
if err != nil {
return fmt.Errorf("couldn't retrieve series list for %s : %s", metric, err)
}
/*
Create channels for collecting/processing series and errors
We'll create them per metric to reduce pressure against OpenTSDB
Limit the size of seriesCh so we can't get too far ahead of actual processing
*/
seriesCh := make(chan queryObj, op.otsdbcc)
errCh := make(chan error)
// we're going to make serieslist * queryRanges queries, so we should represent that in the progress bar
bar := pb.StartNew(len(serieslist) * queryRanges)
var wg sync.WaitGroup
wg.Add(op.otsdbcc)
for i := 0; i < op.otsdbcc; i++ {
go func() {
defer wg.Done()
for s := range seriesCh {
if err := op.do(s); err != nil {
errCh <- fmt.Errorf("couldn't retrieve series for %s : %s", metric, err)
return
}
bar.Increment()
}
}()
}
/*
Loop through all series for this metric, processing all retentions and time ranges
requested. This loop is our primary "collect data from OpenTSDB loop" and should
be async, sending data to VictoriaMetrics over time.
The idea with having the select at the inner-most loop is to ensure quick
short-circuiting on error.
*/
for _, series := range serieslist {
for _, rt := range op.oc.Retentions {
for _, tr := range rt.QueryRanges {
select {
case otsdbErr := <-errCh:
return fmt.Errorf("opentsdb error: %s", otsdbErr)
case vmErr := <-op.im.Errors():
return fmt.Errorf("Import process failed: \n%s", wrapErr(vmErr))
case seriesCh <- queryObj{
Tr: tr, StartTime: startTime,
Series: series, Rt: opentsdb.RetentionMeta{
FirstOrder: rt.FirstOrder, SecondOrder: rt.SecondOrder, AggTime: rt.AggTime}}:
}
}
}
}
// Drain channels per metric
close(seriesCh)
wg.Wait()
close(errCh)
// check for any lingering errors on the query side
for otsdbErr := range errCh {
return fmt.Errorf("Import process failed: \n%s", otsdbErr)
}
bar.Finish()
log.Print(op.im.Stats())
}
op.im.Close()
for vmErr := range op.im.Errors() {
return fmt.Errorf("Import process failed: \n%s", wrapErr(vmErr))
}
log.Println("Import finished!")
log.Print(op.im.Stats())
return nil
}
func (op *otsdbProcessor) do(s queryObj) error {
start := s.StartTime - s.Tr.Start
end := s.StartTime - s.Tr.End
data, err := op.oc.GetData(s.Series, s.Rt, start, end)
if err != nil {
return fmt.Errorf("failed to collect data for %v in %v:%v :: %v", s.Series, s.Rt, s.Tr, err)
}
if len(data.Timestamps) < 1 || len(data.Values) < 1 {
return nil
}
labels := make([]vm.LabelPair, len(data.Tags))
for k, v := range data.Tags {
labels = append(labels, vm.LabelPair{Name: k, Value: v})
}
op.im.Input() <- &vm.TimeSeries{
Name: data.Metric,
LabelPairs: labels,
Timestamps: data.Timestamps,
Values: data.Values,
}
return nil
}

View File

@@ -0,0 +1,349 @@
package opentsdb
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
"strings"
"time"
)
// Retention objects contain meta data about what to query for our run
type Retention struct {
/*
OpenTSDB has two levels of aggregation,
First, we aggregate any un-mentioned tags into the last result
Second, we aggregate into buckets over time
To simulate this with config, we have
FirstOrder (e.g. sum/avg/max/etc.)
SecondOrder (e.g. sum/avg/max/etc.)
AggTime (e.g. 1m/10m/1d/etc.)
This will build into m=<FirstOrder>:<AggTime>-<SecondOrder>-none:
Or an example: m=sum:1m-avg-none
*/
FirstOrder string
SecondOrder string
AggTime string
// The actual ranges will will attempt to query (as offsets from now)
QueryRanges []TimeRange
}
// RetentionMeta objects exist to pass smaller subsets (only one retention range) of a full Retention object around
type RetentionMeta struct {
FirstOrder string
SecondOrder string
AggTime string
}
// Client object holds general config about how queries should be performed
type Client struct {
Addr string
// The meta query limit for series returned
Limit int
Retentions []Retention
Filters []string
Normalize bool
HardTS int64
}
// Config contains fields required
// for Client configuration
type Config struct {
Addr string
Limit int
Offset int64
HardTS int64
Retentions []string
Filters []string
Normalize bool
MsecsTime bool
}
// TimeRange contains data about time ranges to query
type TimeRange struct {
Start int64
End int64
}
// MetaResults contains return data from search series lookup queries
type MetaResults struct {
Type string `json:"type"`
Results []Meta `json:"results"`
//metric string
//tags interface{}
//limit int
//time int
//startIndex int
//totalResults int
}
// Meta A meta object about a metric
// only contain the tags/etc. and no data
type Meta struct {
//tsuid string
Metric string `json:"metric"`
Tags map[string]string `json:"tags"`
}
// Metric holds the time series data
type Metric struct {
Metric string
Tags map[string]string
Timestamps []int64
Values []float64
}
// ExpressionOutput contains results from actual data queries
type ExpressionOutput struct {
Outputs []qoObj `json:"outputs"`
Query interface{} `json:"query"`
}
// QoObj contains actual timeseries data from the returned data query
type qoObj struct {
ID string `json:"id"`
Alias string `json:"alias"`
Dps [][]float64 `json:"dps"`
//dpsMeta interface{}
//meta interface{}
}
// Expression objects format our data queries
/*
All of the following structs are to build a OpenTSDB expression object
*/
type Expression struct {
Time timeObj `json:"time"`
Filters []filterObj `json:"filters"`
Metrics []metricObj `json:"metrics"`
// this just needs to be an empty object, so the value doesn't matter
Expressions []int `json:"expressions"`
Outputs []outputObj `json:"outputs"`
}
type timeObj struct {
Start int64 `json:"start"`
End int64 `json:"end"`
Aggregator string `json:"aggregator"`
Downsampler dSObj `json:"downsampler"`
}
type dSObj struct {
Interval string `json:"interval"`
Aggregator string `json:"aggregator"`
FillPolicy fillObj `json:"fillPolicy"`
}
type fillObj struct {
// we'll always hard-code to NaN here, so we don't need value
Policy string `json:"policy"`
}
type filterObj struct {
Tags []tagObj `json:"tags"`
ID string `json:"id"`
}
type tagObj struct {
Type string `json:"type"`
Tagk string `json:"tagk"`
Filter string `json:"filter"`
GroupBy bool `json:"groupBy"`
}
type metricObj struct {
ID string `json:"id"`
Metric string `json:"metric"`
Filter string `json:"filter"`
FillPolicy fillObj `json:"fillPolicy"`
}
type outputObj struct {
ID string `json:"id"`
Alias string `json:"alias"`
}
/* End expression object structs */
var (
exprOutput = outputObj{ID: "a", Alias: "query"}
exprFillPolicy = fillObj{Policy: "nan"}
)
// FindMetrics discovers all metrics that OpenTSDB knows about (given a filter)
// e.g. /api/suggest?type=metrics&q=system&max=100000
func (c Client) FindMetrics(q string) ([]string, error) {
resp, err := http.Get(q)
if err != nil {
return nil, fmt.Errorf("failed to send GET request to %q: %s", q, err)
}
if resp.StatusCode != 200 {
return nil, fmt.Errorf("Bad return from OpenTSDB: %q: %v", resp.StatusCode, resp)
}
defer func() { _ = resp.Body.Close() }()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("could not retrieve metric data from %q: %s", q, err)
}
var metriclist []string
err = json.Unmarshal(body, &metriclist)
if err != nil {
return nil, fmt.Errorf("failed to read response from %q: %s", q, err)
}
return metriclist, nil
}
// FindSeries discovers all series associated with a metric
// e.g. /api/search/lookup?m=system.load5&limit=1000000
func (c Client) FindSeries(metric string) ([]Meta, error) {
q := fmt.Sprintf("%s/api/search/lookup?m=%s&limit=%d", c.Addr, metric, c.Limit)
resp, err := http.Get(q)
if err != nil {
return nil, fmt.Errorf("failed to set GET request to %q: %s", q, err)
}
if resp.StatusCode != 200 {
return nil, fmt.Errorf("Bad return from OpenTSDB: %q: %v", resp.StatusCode, resp)
}
defer func() { _ = resp.Body.Close() }()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("could not retrieve series data from %q: %s", q, err)
}
var results MetaResults
err = json.Unmarshal(body, &results)
if err != nil {
return nil, fmt.Errorf("failed to read response from %q: %s", q, err)
}
return results.Results, nil
}
// GetData actually retrieves data for a series at a specified time range
func (c Client) GetData(series Meta, rt RetentionMeta, start int64, end int64) (Metric, error) {
/*
Here we build the actual exp query we'll send to OpenTSDB
This is comprised of a number of different settings. We hard-code
a few to simplify the JSON object creation.
There are examples queries available, so not too much detail here...
*/
expr := Expression{}
expr.Outputs = []outputObj{exprOutput}
expr.Metrics = append(expr.Metrics, metricObj{ID: "a", Metric: series.Metric,
Filter: "f1", FillPolicy: exprFillPolicy})
expr.Time = timeObj{Start: start, End: end, Aggregator: rt.FirstOrder,
Downsampler: dSObj{Interval: rt.AggTime,
Aggregator: rt.SecondOrder,
FillPolicy: exprFillPolicy}}
var TagList []tagObj
for k, v := range series.Tags {
/*
every tag should be a literal_or because that's the closest to a full "==" that
this endpoint allows for
*/
TagList = append(TagList, tagObj{Type: "literal_or", Tagk: k,
Filter: v, GroupBy: true})
}
expr.Filters = append(expr.Filters, filterObj{ID: "f1", Tags: TagList})
// "expressions" is required in the query object or we get a 5xx, so force it to exist
expr.Expressions = make([]int, 0)
inputData, err := json.Marshal(expr)
if err != nil {
return Metric{}, fmt.Errorf("failed to marshal query JSON %s", err)
}
q := fmt.Sprintf("%s/api/query/exp", c.Addr)
resp, err := http.Post(q, "application/json", bytes.NewBuffer(inputData))
if err != nil {
return Metric{}, fmt.Errorf("failed to send GET request to %q: %s", q, err)
}
if resp.StatusCode != 200 {
return Metric{}, fmt.Errorf("Bad return from OpenTSDB: %q: %v", resp.StatusCode, resp)
}
defer func() { _ = resp.Body.Close() }()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
return Metric{}, fmt.Errorf("could not retrieve series data from %q: %s", q, err)
}
var output ExpressionOutput
err = json.Unmarshal(body, &output)
if err != nil {
return Metric{}, fmt.Errorf("failed to unmarshal response from %q: %s", q, err)
}
if len(output.Outputs) < 1 {
// no results returned...return an empty object without error
return Metric{}, nil
}
data := Metric{}
data.Metric = series.Metric
data.Tags = series.Tags
/*
We evaluate data for correctness before formatting the actual values
to skip a little bit of time if the series has invalid formatting
First step is to enforce Prometheus' data model
*/
data, err = modifyData(data, c.Normalize)
if err != nil {
return Metric{}, fmt.Errorf("invalid series data from %q: %s", q, err)
}
/*
Convert data from OpenTSDB's output format ([[ts,val],[ts,val]...])
to VictoriaMetrics format: {"timestamps": [ts,ts,ts...], "values": [val,val,val...]}
The nasty part here is that because an object in each array
can be a float64, we have to initially cast _all_ objects that way
then convert the timestamp back to something reasonable.
*/
for _, tsobj := range output.Outputs[0].Dps {
data.Timestamps = append(data.Timestamps, int64(tsobj[0]))
data.Values = append(data.Values, tsobj[1])
}
return data, nil
}
// NewClient creates and returns OpenTSDB client
// configured with passed Config
func NewClient(cfg Config) (*Client, error) {
var retentions []Retention
offsetPrint := int64(time.Now().Unix())
if cfg.MsecsTime {
// 1000000 == Nanoseconds -> Milliseconds difference
offsetPrint = int64(time.Now().UnixNano() / 1000000)
}
if cfg.HardTS > 0 {
/*
HardTS is a specific timestamp we'll be starting at.
Just present that if it is defined
*/
offsetPrint = cfg.HardTS
} else if cfg.Offset > 0 {
/*
Our "offset" is the number of days we should step
back before starting to scan for data
*/
if cfg.MsecsTime {
offsetPrint = offsetPrint - (cfg.Offset * 24 * 60 * 60 * 1000)
} else {
offsetPrint = offsetPrint - (cfg.Offset * 24 * 60 * 60)
}
}
log.Println(fmt.Sprintf("Will collect data starting at TS %v", offsetPrint))
for _, r := range cfg.Retentions {
ret, err := convertRetention(r, cfg.Offset, cfg.MsecsTime)
if err != nil {
return &Client{}, fmt.Errorf("Couldn't parse retention %q :: %v", r, err)
}
retentions = append(retentions, ret)
}
client := &Client{
Addr: strings.Trim(cfg.Addr, "/"),
Retentions: retentions,
Limit: cfg.Limit,
Filters: cfg.Filters,
Normalize: cfg.Normalize,
HardTS: cfg.HardTS,
}
return client, nil
}

Some files were not shown because too many files have changed in this diff Show More