Compare commits

...

700 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
9f19649672 docs/CHANGELOG.md: cut v1.66.2 2021-09-23 22:53:36 +03:00
Aliaksandr Valialkin
718eca33ab lib/storage: properly handle {__name__=~"prefix(suffix1|suffix2)",other_label="..."} queries
They were broken in the commit 00cbb099b6

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1644
2021-09-23 21:48:51 +03:00
Aliaksandr Valialkin
c5bb95a417 docs: make docs-sync 2021-09-23 20:51:35 +03:00
Aliaksandr Valialkin
f5896b7420 docs/CHANGELOG.md: document 0e35fc9538
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1641
2021-09-23 20:46:23 +03:00
Roman Khavronenko
9dc4d16664 app/vmctl: fix misleading comment about cluster version for native mode (#1648)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1637
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-09-23 17:57:25 +03:00
Roman Khavronenko
0e35fc9538 app/vmalert: remove unnecessary omitempty tag for interval param (#1649)
`omitempty` tag resulted into skipping this param on marshaling,
which was used as a checksum for groups configuration. Since on
config reload checksums are compared before applying changes,
any change to `interval` only didn't trigger config reload.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1641
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2021-09-23 17:55:59 +03:00
Aaron France
6f061dab19 fix: typo in vmagent.md (#1642) 2021-09-23 16:51:02 +03:00
Aliaksandr Valialkin
176348cbcc vendor: make vendor-update 2021-09-23 15:05:27 +03:00
Aliaksandr Valialkin
a0313c046b lib/promscrape: add vm_promscrape_max_scrape_size_exceeded_errors_total metric for counting of the failed scrapes due to the exceeded response size
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1639
2021-09-23 14:47:54 +03:00
Aliaksandr Valialkin
d5c2741e8f docs/CaseStudies.md: add a case study for Grammarly 2021-09-23 13:11:48 +03:00
Aliaksandr Valialkin
73cd74075d docs/Articles.md: add https://cer6erus.medium.com/superset-bi-with-victoria-metrics-a109d3e91bc6 2021-09-23 10:43:29 +03:00
Aliaksandr Valialkin
00277583f9 vendor: update github.com/valyala/gozstd from v1.12.0 to v1.13.0 2021-09-22 20:06:44 +03:00
Aliaksandr Valialkin
99a6c212e8 docs/CaseStudies.md: fix a link to third-party articles 2021-09-22 03:52:35 +03:00
Aliaksandr Valialkin
9b3d1a1996 docs/CHANGELOG.md: cut v1.66.1 2021-09-22 01:47:05 +03:00
Aliaksandr Valialkin
a13c3de36f docs/CHANGELOG.md: document 9ca1cbced1
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1635
2021-09-21 23:17:08 +03:00
Aliaksandr Valialkin
9ca1cbced1 lib/httpserver: add -enterprise and/or -cluster suffixes to short_version label of vm_app_version metric
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1635
2021-09-21 23:12:42 +03:00
Aliaksandr Valialkin
207c5760ce lib/promrelabel: fix parsing regex: true in relabeling rules 2021-09-21 23:00:53 +03:00
Aliaksandr Valialkin
9884a55f3c vendor: temporarily stick to v0.93.3 for cloud.google.com/go until the binary size bloat issue is resolved
This returns back VictoriaMetrics binary size from 24Mb to 18Mb.

See https://github.com/googleapis/google-cloud-go/issues/4783
2021-09-21 18:13:47 +03:00
Roman Khavronenko
ac1abe2faf app/vmalert: support http.pathPrefix flag in UI (#1636)
The change makes UI to respect `http.pathPrefix` flag
for API or navigation items links.
2021-09-21 14:41:01 +03:00
Aliaksandr Valialkin
a22aa0608b app/vmselect: fix accessing /graphite/* endpoints 2021-09-21 13:56:35 +03:00
Aliaksandr Valialkin
94148d5ad7 docs/vmagent.md: typo fix 2021-09-20 16:49:20 +03:00
Aliaksandr Valialkin
51657b1e04 docs/vmagent.md: typo fixes in Prometheus staleness markers docs 2021-09-20 16:44:09 +03:00
Aliaksandr Valialkin
76811c2f60 docs/CHANGELOG.md: cut v1.66.0 2021-09-20 15:20:25 +03:00
Nikolay
ad08d9dfc0 changes protoparser apis for accepting reading from io.Reader (#1624)
adds InsertHandlerForReader apis to vmagent
2021-09-20 14:49:28 +03:00
Aliaksandr Valialkin
15ea4c6dae vendor: make vendor-update 2021-09-20 14:38:55 +03:00
n4mine
1ac8d55147 fix: typo, dddresses -> addresses (#1630) 2021-09-20 14:28:59 +03:00
Aliaksandr Valialkin
a06ff456f8 docs/Articles.md: add a link to Open-source strategy at VictoriaMetrics 2021-09-20 11:46:10 +03:00
Aliaksandr Valialkin
9a3d0c43b5 app/vmselect/promql: add quantiles_over_time("phiLabel", phi1, ..., phiN, m[d]) function for calculating multiple quantiles at once 2021-09-17 23:35:10 +03:00
Aliaksandr Valialkin
e1e5a20b36 docs/CHANGELOG.md: document 0e09fdb8b0 2021-09-17 18:47:06 +03:00
Nikolay
0e09fdb8b0 makes filters optional for ec2 api requests (#1627)
filters can be applied only for DescribeInstances requests, like prometheus does.
related issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1626
2021-09-17 18:00:37 +03:00
Aliaksandr Valialkin
2951dd0a57 app/vmselect/promql: add histogram_quantiles("phiLabel", phi1, ..., phiN, buckets) function
This function calculates multiple quantiles over the given buckets at once

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1573
2021-09-17 13:32:39 +03:00
Aliaksandr Valialkin
8c504d6efa docs/CHANGELOG.md: document the change in enterprise apps, which allows passing -version without -eula flag
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1621
2021-09-17 12:37:45 +03:00
Aliaksandr Valialkin
5a44be0e52 app/vmselect/promql: optimize quantiles() calculation
Calculate quantiles in one go instead of calculating each quantile individually
2021-09-17 12:33:42 +03:00
Aliaksandr Valialkin
948fb638f5 docs/FAQ.md: extend VictoriaMetrics vs TimescaleDB section with real user experience
See also https://github.com/timescale/promscale/issues/427 , which is mentioned in the https://abiosgaming.com/press/high-cardinality-aggregations/
2021-09-16 20:46:25 +03:00
Roman Khavronenko
b75455c650 vmalert: add new metric vmalert_remotewrite_flush_duration_seconds (#1622) 2021-09-16 14:00:16 +03:00
Roman Khavronenko
f83fa31985 docs: fix indentation for FAQ document (#1620) 2021-09-16 13:59:22 +03:00
f41gh7
9375b60c5f adds stub for functions api 2021-09-16 13:49:52 +03:00
Aliaksandr Valialkin
e60dfc96ff app/vmselect/promql: add mad(q) and outliers_mad(tolerance, q) functions to MetricsQL 2021-09-16 13:33:53 +03:00
Aliaksandr Valialkin
eca75cc650 app/vmselect/prometheus: make more clear log messages for errors during sending data to remote clients 2021-09-16 12:56:58 +03:00
Aliaksandr Valialkin
26cd0d36b4 vendor: make vendor-update 2021-09-15 18:22:59 +03:00
Aliaksandr Valialkin
44b01fff13 app/{vminsert,vmselect}: automatically add missing port in -storageNode lists passed to vminsert and vmselect
This should simplify manual setup of the cluster according to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-setup
2021-09-15 18:08:30 +03:00
Aliaksandr Valialkin
06ed694ad9 docs/CHANGELOG.md: document 777ff75874 2021-09-15 17:45:08 +03:00
Aliaksandr Valialkin
2f86d4cf38 app/vmui: follow-up after 777ff75874
The commit contains the following changes:

- Show vmui when requesting /graph page in order to be compatible with Prometheus datasource in Grafana.
- Properly encode query args at vmui url.
- Set the number of points on the graph to the number of horizontal pixels divided by 2. Previously it was hardcoded to 30.
- Do not save server url to persistent storage at browser, since it should be always obtained from the url.
- Run `make vmui-update` for updating vmui embedded into VictoriaMetrics.
2021-09-15 17:40:48 +03:00
Yury Molodov
777ff75874 vmui: change query params compatible with prometheus (#1619)
* feat: change url params for compatible prometheus

* style: add comment for TimeParams

* fix: change get default server for single version

* fix: change function for get query string value
2021-09-15 09:42:49 +03:00
Aliaksandr Valialkin
cf9efde50c vendor: update github.com/valyala/quicktemplate from v1.6.3. to v1.7.0 2021-09-15 09:34:07 +03:00
Aliaksandr Valialkin
3cba77765a vendor: update github.com/VictoriaMetrics/fastcache from v1.6.0 to v1.7.0 2021-09-15 09:34:07 +03:00
Aliaksandr Valialkin
77682f516a vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.16 to v1.1.0 2021-09-15 09:34:07 +03:00
Aliaksandr Valialkin
68ea3d18f7 vendor: update github.com/valyala/histogram from v1.1.2 to v1.2.0
This fixes the non-repeatable quantile_over_time() results when the number of input samples exceeds 1000.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1612
2021-09-15 09:34:07 +03:00
Roman Khavronenko
ecd3069b6c vmalert: create basic auth config only if args aren't empty (#1618)
* vmalert: create basic auth config only if args aren't empty

follow-up after 68721f6

* vmalert: make lint happy
2021-09-15 01:53:31 +03:00
Roman Khavronenko
84b41e498f docs: add "Choosing a Time Series Database for High Cardinality Aggregations" article (#1617) 2021-09-15 01:51:56 +03:00
Aliaksandr Valialkin
3e1683756b docs/vmalert.md: follow-up after 68721f6e7d
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:47:47 +03:00
Roman Khavronenko
68721f6e7d vmalert: support bearer token for datasource, remotewrite and remoteread (#1614)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608
2021-09-14 14:32:06 +03:00
Dima Lazerka
56069a3022 Remove port 3000 and add https to play-grafana link (#1616)
* Remove port 3000 and add https to play-grafana link

* Fix typo

Co-authored-by: Dzmitry Lazerka <dlazerka@gmail.com>
2021-09-14 14:24:30 +03:00
Aliaksandr Valialkin
8f685d81c6 lib/storage: follow up after 00cbb099b6 2021-09-14 14:16:25 +03:00
faceair
00cbb099b6 lib/storage: optimize convert multiple values regexp filter to composite tag filter (#1610)
* lib/storage: optimize convert multiple values regexp filter to composite tag filter

* Apply suggestions from code review

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-09-14 12:47:07 +03:00
dependabot[bot]
bc2d05be8e build(deps): bump codecov/codecov-action from 2.0.3 to 2.1.0 (#1615)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2.0.3 to 2.1.0.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v2.0.3...v2.1.0)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-14 12:23:01 +03:00
Aliaksandr Valialkin
adedc83b3b app/vmauth: do not log invalid auth tokens by default for security reasons
The logging can be enabled by passing `-logInvalidAuthTokens` command-line flag to vmauth
2021-09-14 12:20:03 +03:00
Aliaksandr Valialkin
e46bd9e47f docs/Single-server-VictoriaMetrics.md: link to cardinality limiter docs in vmagent 2021-09-13 21:26:59 +03:00
Aliaksandr Valialkin
07b9c7994f docs/vmagent.md: mention out of order sample errors, which are typically emitted by Thanos, Cortex or Prometheus 2021-09-13 19:36:31 +03:00
Aliaksandr Valialkin
8a6a36429a app/vminsert/netstorage: disable rerouting by default
Production clusters work more stable with the disabled rerouting during rolling restarts and/or
during spikes in time series churn rate. So it would be better disabling the rerouting by default.

The re-routing can be enabled by passing `-disableRerouting=false` command-line flag to `vminsert` nodes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1165
2021-09-13 18:51:56 +03:00
Aliaksandr Valialkin
c4f11a49f8 docs/CHANGELOG.md: document 5494bc02a6 2021-09-13 17:11:23 +03:00
Aliaksandr Valialkin
7f0a8d4bdb docs: consistency renaming: Influx -> InfluxDB 2021-09-13 17:05:16 +03:00
Aliaksandr Valialkin
143a3b34ee app/vmui/Dockerfile-web: update Go builder from 1.16.7 to 1.17.1 and Alpine base image from 3.14.1 to 3.14.2 2021-09-13 17:05:16 +03:00
Roman Khavronenko
5494bc02a6 vmalert: add flag to limit the max value for auto-resovle duration for alerts (#1609)
* vmalert: add flag to limit the max value for auto-resovle duration for alerts

The new flag `rule.maxResolveDuration` suppose to limit max value for
alert.End param, which is used by notifiers like Alertmanager for alerts auto resolve.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586
2021-09-13 15:48:18 +03:00
Aliaksandr Valialkin
b9727a36dc docs/vmbackup.md: update the outdated link to vmbackupmanager 2021-09-13 14:16:53 +03:00
Roman Khavronenko
75f35c3b11 vmalert: display extra filter labels in UI (#1613) 2021-09-13 14:11:38 +03:00
Aliaksandr Valialkin
d1a16e0891 app/vmselect/promql: use Prometheus-compatible label value formatting for count_values function 2021-09-13 13:48:06 +03:00
Aliaksandr Valialkin
fb6ed0ce19 lib/promscrape/discovery/docker: support host networking mode
See https://github.com/prometheus/prometheus/issues/9116
2021-09-13 13:30:16 +03:00
Aliaksandr Valialkin
6295861acd lib/promscrape/discovery/kubernetes: properly use https scheme for wildcard TLS certificates in ingress target discovery 2021-09-13 13:03:42 +03:00
Aliaksandr Valialkin
2814388891 vendor: make vendor-update 2021-09-12 15:26:44 +03:00
Aliaksandr Valialkin
2394b5018b deployment/docker: update Go builder from v1.17.0 to v1.17.1
See https://github.com/golang/go/issues?q=milestone%3AGo1.17.1+label%3ACherryPickApproved
2021-09-12 15:23:53 +03:00
Aliaksandr Valialkin
728c4c3841 lib/promscrape: generate scrape_timeout_seconds metric per each scrape target in the same way as Prometheus 2.30 does
See https://github.com/prometheus/prometheus/pull/9247
2021-09-12 15:20:44 +03:00
Aliaksandr Valialkin
0b4eb0fa7d lib/promscrape: make fmt 2021-09-12 13:34:15 +03:00
Aliaksandr Valialkin
48e3e6c8df lib/promscrape: add ability to configure scrape_timeout and scrape_interval via relabeling
See https://github.com/prometheus/prometheus/pull/8911
2021-09-12 13:33:41 +03:00
Aliaksandr Valialkin
f3e89754a9 lib/promscrape: reduce CPU usage for common case when calculating scrape_series_added metric
Also reduce CPU usage when applying `series_limit` to scrape targets with constant set of metrics.

The main idea is to perform the calculations on scrape_series_added and series_limit
only if the set of metrics exposed by the target has been changed.
Scrape targets rarely change the set of exposed metrics,
so this optimization should reduce CPU usage in general case.
2021-09-12 12:53:14 +03:00
Aliaksandr Valialkin
674a6eee6c docs/Single-server-VictoriaMetrics.md: refer to relabeling section for vmagent
This removes duplicate docs about additional relabeling actions supported by VictoriaMetrics components
2021-09-12 11:39:19 +03:00
Aliaksandr Valialkin
77168e3e94 docs/vmagent.md: sync with app/vmagent/README.md by running make docs-sync 2021-09-11 11:04:42 +03:00
Aliaksandr Valialkin
cebcb15ba4 lib/storage: verify that the tsidsFound contain the needed tsids in tests added at f4dead529f 2021-09-11 10:57:13 +03:00
Aliaksandr Valialkin
9286107e82 lib/promscrape: send stale markers for disappeared metrics like Prometheus does 2021-09-11 10:51:04 +03:00
Aliaksandr Valialkin
cfed015bb6 docs/vmalert.md: typo fix in Multitenancy chapter 2021-09-10 17:57:14 +03:00
Aliaksandr Valialkin
f4dead529f lib/storage: properly search series by multiple tag filters matching empty labels such as foo{bar=~"baz|",x=~"y|"}
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/395
2021-09-09 21:09:21 +03:00
Aliaksandr Valialkin
ea943911bc app/vmselect/promql: keep metric name in rollup_candlestick results, since they don't change the original series meaning
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1600
2021-09-09 19:21:18 +03:00
Aliaksandr Valialkin
f27980dcb3 docs/CHANGELOG.md: typo fix 2021-09-09 18:57:30 +03:00
Aliaksandr Valialkin
4aeb8db83f lib/promscrape: add ability to set series_limit and stream_parse options via relabeling
This allows managing these options on a per-target basis.

Typical use case: to manage these options for pods via Kubernetes annotations.
2021-09-09 18:49:39 +03:00
Aliaksandr Valialkin
468f941f7e lib/promscrape: add the actual job name to the labels of promscrape_series_limit_rows_dropped_total metric 2021-09-09 17:37:37 +03:00
Aliaksandr Valialkin
086b5d0cf1 lib/promscrape: add scrape_ prefix to job and target labels exported by promscrape_series_limit_rows_dropped_total metric
This is needed in order to prevent from possible clash with the corresponding (job, target) labels for the job, which scrapes this metric.
2021-09-09 17:29:21 +03:00
Aliaksandr Valialkin
d2708a1fb7 docs/vmagent.md: typo fix in Relabeling chapter 2021-09-09 16:39:40 +03:00
Denys Holius
abba6e8370 Bump alpine linux to latest (#1607) 2021-09-09 16:29:15 +03:00
Aliaksandr Valialkin
d6bd956930 lib/promrelabel: add keep_metrics and drop_metrics actions to relabeling rules
These actions simlify metrics filtering. For example,

- action: keep_metrics
  regex: 'foo|bar|baz'

would leave only metrics with `foo`, `bar` and `baz` names, while the rest of metrics will be deleted.

The commit also makes possible to split long regexps into multiple lines. For example, the following config is equivalent to the config above:

- action: keep_metrics
  regex:
  - foo
  - bar
  - baz
2021-09-09 16:18:21 +03:00
Aliaksandr Valialkin
3a827b98cd docs/vmalert.md: make docs-sync after 21f022e5f0 2021-09-09 16:16:25 +03:00
Aliaksandr Valialkin
a8053d9fc6 docs/MetricsQL.md: add a link to VictoriaMetrics github 2021-09-08 00:14:59 +03:00
Aliaksandr Valialkin
e84fa9eb38 app/vmalert: document GroupAlerts
This makes golint happy
2021-09-07 22:50:08 +03:00
Aliaksandr Valialkin
e6c9869d86 app/vmalert: follow-up after 21f022e5f0 2021-09-07 22:43:37 +03:00
Roman Khavronenko
21f022e5f0 vmalert: add initial UI implementation (#1602)
New UI pages:
/ - welcome page with API handlers list;
/groups - list of all rules per group;
/alerts - list of all active alerts;
/groupID/alertID/status - status of the active alert;
2021-09-07 22:39:22 +03:00
Aliaksandr Valialkin
6fbaf8f978 docs/CHANGELOG.md: document 42e07cfaea 2021-09-07 22:34:39 +03:00
dependabot[bot]
0c7110d1a5 build(deps): bump github.com/aws/aws-sdk-go from 1.40.34 to 1.40.37 (#1598)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.34 to 1.40.37.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.34...v1.40.37)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-07 20:55:49 +03:00
Aliaksandr Valialkin
e34f7081d0 docs/vmagent.md: add API path for Prometheus text exposition format 2021-09-07 16:14:51 +03:00
Aliaksandr Valialkin
0166eae7c4 docs/Articles.md: add a link to case studies 2021-09-06 10:57:59 +03:00
Aliaksandr Valialkin
5e3ef376b5 docs/Articles.md: add an url to https://www.vultr.com/docs/install-and-configure-victoriametrics-on-debian 2021-09-03 11:49:37 +03:00
Aliaksandr Valialkin
ef27786e37 docs/Single-server-VictoriaMetrics.md: fix a link multitenancy docs for cluster version in VictoriaMetrics 2021-09-02 17:43:35 +03:00
Aliaksandr Valialkin
f529058d3a docs/CHANGELOG.md: cut v1.65.0 2021-09-01 17:12:32 +03:00
Aliaksandr Valialkin
bddd1c35e2 docs/FAQ.md: add questions on how to migrate data from various systems (Prometheus, InfluxDB, OpenTSDB, Graphite) to VictoriaMetrics 2021-09-01 16:47:30 +03:00
Aliaksandr Valialkin
ed818fceef docs: update -help output for victoria-metrics and vmagent after f77dde837a 2021-09-01 16:34:32 +03:00
Aliaksandr Valialkin
ae90225b46 .github/dependabot.yml: increase check intervals for gomod and docker ecosystems from daily to weekly
Daily checks are too verbose and result into too many automatic pull requests and commits
2021-09-01 16:07:00 +03:00
Aliaksandr Valialkin
f77dde837a lib/promscrape: add the ability to limit the number of unique series per each scrape target
The number of series per target can be limited with the following options:

* Global limit with `-promscrape.maxSeriesPerTarget` command-line option.
* Per-target limit with `max_series: N` option in `scrape_config` section.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561
2021-09-01 16:03:59 +03:00
Roman Khavronenko
867e426070 dashboards: bump vmagent version requirement 2021-09-01 14:20:50 +03:00
dependabot[bot]
c2d17ec655 build(deps): bump github.com/aws/aws-sdk-go from 1.40.33 to 1.40.34 (#1591)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.33 to 1.40.34.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.33...v1.40.34)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-01 12:53:10 +03:00
dependabot[bot]
4bebafd885 build(deps): bump google.golang.org/api from 0.55.0 to 0.56.0 (#1590)
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.55.0 to 0.56.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md)
- [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.55.0...v0.56.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-01 12:52:17 +03:00
dependabot[bot]
1a6b9157e2 build(deps): bump cloud.google.com/go/storage from 1.16.0 to 1.16.1 (#1589)
Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.16.0 to 1.16.1.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/master/CHANGES.md)
- [Commits](https://github.com/googleapis/google-cloud-go/compare/pubsub/v1.16.0...storage/v1.16.1)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-01 12:51:42 +03:00
Aliaksandr Valialkin
111ea89a7d docs: make docs-sync 2021-09-01 12:02:34 +03:00
Aliaksandr Valialkin
9e41b05401 docs/CHANGELOG.md: document eff940aa76 2021-09-01 12:00:02 +03:00
Aliaksandr Valialkin
fce87bfe8d docs/CHANGELOG.md: document 7c70dcbe3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471
2021-09-01 11:56:23 +03:00
Roman Khavronenko
0f4bcc00b2 Single dashboards upd (#1593)
* dasbhoard: replace `null` datasources

null datasource value may confuse Grafana and make it drop panel query in some
versions.

* docker: bump grafana image version

* dashboards: add URL variable selector to vmagent dashboard

* dashboards: add new panel `Remote write connection saturation` to vmagent dashboard

* alerts: add new alert for `Remote write connection saturation` panel of vmagent dashboard

* dashboards: add "Logging rate" panel to vmagent dashboard
2021-09-01 11:46:22 +03:00
Roman Khavronenko
de26b1d4a2 vmctl: update README and flags description (#1588)
The purpose of update is to make README and flags description more
clear to the reader. Especially, show that vm-account-id flag is required
for clustered version of VM.
2021-09-01 09:31:44 +03:00
Roman Khavronenko
2ed2878a57 docs: fix the link for cluster docker compose 2021-09-01 09:21:45 +03:00
Roman Khavronenko
0d6735106b docs: update docker env description 2021-09-01 09:18:56 +03:00
Roman Khavronenko
cfb6436be5 Vmalert extra params (#1587)
* vmalert: allow extra GET params in datasource package

ExtraParams will be added as GET params to every HTTP request made by datasource.
The `roundDigits` param, for example, was substituted by corresponding extra param.

* vmalert: add nocache=1 param for replay process

The `nocache=1` param is VictoriaMetrics specific parameter which prevents it
from caching and boundaries aligning for queries. We set it to avoid cache
pollution in `replay` mode and also to avoid unnecessary time range boundaries
alignment.

* vmalert: mention nocache=1 in replay description

* vmalert: fix bug with unused param
2021-08-31 14:57:47 +03:00
Nikolay
7c70dcbe3b adds external_labels per group for vmalert (#1485)
* adds external_label per group for vmalert
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471
2021-08-31 14:52:34 +03:00
Roman Khavronenko
eff940aa76 Vmalert metrics update (#1580)
* vmalert: remove `vmalert_execution_duration_seconds` metric

The summary for `vmalert_execution_duration_seconds` metric gives no additional
value comparing to `vmalert_iteration_duration_seconds` metric.

* vmalert: update config reload success metric properly

Previously, if there was unsuccessfull attempt to reload config and then
rollback to previous version - the metric remained set to 0.

* vmalert: add Grafana dashboard to overview application metrics

* docker: include vmalert target into list for scraping

* vmalert: extend notifier metrics with addr label

The change adds an `addr` label to metrics for alerts_sent and alerts_send_errors
to identify which exact address is having issues.
The according change was made to vmalert dashboard.

* vmalert: update documentation and docker environment for vmalert's dashboard

Mention Grafana's dashboard in vmalert's README in a new section #Monitoring.

Update docker-compose env to automatically add vmalert's dashboard.
Update docker-compose README with additional info about services.
2021-08-31 12:28:02 +03:00
Aliaksandr Valialkin
f41b3d6118 vendor: make vendor-update 2021-08-31 12:03:21 +03:00
Aliaksandr Valialkin
6e085e6dac docs/Single-server-VictoriaMetrics.md: remove outdated link to VictoriaMetrics wiki
VictoriaMetrics wiki became outdated after publishing all the docs at https://docs.victoriametrics.com
2021-08-31 11:48:32 +03:00
Aliaksandr Valialkin
8b228a5873 docs/CHANGELOG.md: add a link to Prometheus staleness tracking 2021-08-31 11:48:32 +03:00
dependabot[bot]
6c388f63b3 build(deps): bump github.com/aws/aws-sdk-go from 1.40.30 to 1.40.33 (#1582)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.30 to 1.40.33.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.30...v1.40.33)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-31 11:10:31 +03:00
dependabot[bot]
525e6ae1b8 build(deps): bump google.golang.org/api from 0.54.0 to 0.55.0 (#1583)
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.54.0 to 0.55.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md)
- [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.54.0...v0.55.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-31 11:09:55 +03:00
dependabot[bot]
462fa70967 build(deps): bump github.com/klauspost/compress from 1.13.4 to 1.13.5 (#1584)
Bumps [github.com/klauspost/compress](https://github.com/klauspost/compress) from 1.13.4 to 1.13.5.
- [Release notes](https://github.com/klauspost/compress/releases)
- [Changelog](https://github.com/klauspost/compress/blob/master/.goreleaser.yml)
- [Commits](https://github.com/klauspost/compress/compare/v1.13.4...v1.13.5)

---
updated-dependencies:
- dependency-name: github.com/klauspost/compress
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-31 11:01:58 +03:00
Aliaksandr Valialkin
5c63d69454 lib/promscrape/discovery/kubernetes: return back support role: endpointslices, since it is used by VictoriaMetrics operator
This is a follow up commit after 31b42b30b6
2021-08-29 12:37:03 +03:00
Aliaksandr Valialkin
db330232ac lib/protoparser/opentsdb: follow-up after 8ee75ca45a 2021-08-29 11:49:21 +03:00
envzhu
8ee75ca45a lib/protoparser/opentsdb: accept multiple spaces between fields in a row as a deliminator. (#1575) 2021-08-29 11:38:32 +03:00
Aliaksandr Valialkin
31b42b30b6 lib/promscrape/discovery/kubernetes: rename role: endpointslices to role: endpointslice to be consistent with Prometheus
See 2ec6c7dbb8/discovery/kubernetes/kubernetes.go (L99)
2021-08-29 11:23:08 +03:00
Aliaksandr Valialkin
2e001db4de lib/promscrape/discovery/kubernetes: use v1 API instead of v1beta1 API for role: ingress and role: endpointslices
This should fix service discovery for these roles in Kubernetes v1.22 and newer versions.
See https://kubernetes.io/docs/reference/using-api/deprecation-guide/#ingress-v122

The corresponding change in Prometheus - https://github.com/prometheus/prometheus/pull/9205
2021-08-29 11:16:59 +03:00
Aliaksandr Valialkin
189507d9d0 docs/Single-server-VictoriaMetrics.md: mention that downsampling doesnt improve query performance on high churn rate 2021-08-27 18:50:26 +03:00
Aliaksandr Valialkin
5ea689d61b app/vmselect/promql: add quantile("phiLabel", phi1, ..., phiN, q) aggregate function to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1573
2021-08-27 18:37:20 +03:00
Aliaksandr Valialkin
bec18e4fe9 app/vmselect: add -search.disableAutoCacheReset command-line option for disabling automatic cache reset when a sample with old timestamp outside -search.cacheTimestampOffset is inserted
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1570
2021-08-27 17:15:31 +03:00
Aliaksandr Valialkin
67189be1cb docs/{vmgateway,vmbackupmanager}: mention that enterprise binaries are free for download and evaluation 2021-08-27 14:54:09 +03:00
Aliaksandr Valialkin
321da535fa docs: link to active time series, churn rate and high cardinality questions 2021-08-27 14:44:53 +03:00
Aliaksandr Valialkin
c8c153fb91 docs/CHANGELOG.md: document the bugfix for possible timeout error in vmbackupmanager when making snapshots
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1571
2021-08-27 13:04:06 +03:00
Aliaksandr Valialkin
2bc79042f6 docs: mention that enterprise binaries can be downloaded and evaluated for free 2021-08-27 12:48:14 +03:00
Aliaksandr Valialkin
4b3877b798 vendor: make vendor-update 2021-08-26 09:42:23 +03:00
Aliaksandr Valialkin
2bef940add docs/vmagent.md: document the ability to load scrape configs from multiple files
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559
2021-08-26 09:13:14 +03:00
Aliaksandr Valialkin
10f960fa0c lib/promscrape: add ability to load scrape configs from multiple files
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559
2021-08-26 08:51:16 +03:00
Aliaksandr Valialkin
25ee4a3644 vendor: make vendor-update 2021-08-25 13:41:02 +03:00
dependabot[bot]
66626db92f build(deps): bump codecov/codecov-action from 2.0.2 to 2.0.3 (#1563)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2.0.2 to 2.0.3.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v2.0.2...v2.0.3)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-25 13:40:21 +03:00
Aliaksandr Valialkin
e24203cde8 docs/CHANGELOG.md: document 48f33d098b 2021-08-25 13:31:35 +03:00
benclive
48f33d098b Remove trailing slash for URLPrefixes with specific path (#1554) 2021-08-25 13:28:50 +03:00
Aliaksandr Valialkin
9fc9d76a7f docs/Cluster-VictoriaMetrics.md: mention that the -replicationFactor at vmselect is an optional parameter 2021-08-25 13:10:57 +03:00
Aliaksandr Valialkin
c27ee35c5c lib/promscrape: expose promscrape_discovery_http_errors_total metric for tracking errors per each http_sd config 2021-08-25 13:05:49 +03:00
Aliaksandr Valialkin
ffc0ab1774 lib/{mergeset,storage}: improve the detection of the needed free space for background merge
This should prevent from possible out of disk space crashes during big merges.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1560
2021-08-25 09:35:44 +03:00
Aliaksandr Valialkin
a287a48634 docs/FAQ.md: add more entries for frequently asked questions
The following topics are covered:

* Active time series
* High cardinality
* High churn rate
* Slow inserts
2021-08-24 11:34:44 +03:00
Aliaksandr Valialkin
8358890e33 docs/MetricsQL.md: typo fix: histogram_qunatile -> histogram_quantile 2021-08-23 23:08:16 +03:00
Aliaksandr Valialkin
6c5760db9c app/vmselect/promql: make fmt after 0078486ea7 2021-08-23 23:06:00 +03:00
Aliaksandr Valialkin
7c57745f40 docs/MetricsQL.md: fix the indentation for median function 2021-08-23 12:04:31 +03:00
Aliaksandr Valialkin
a78672f95a docs/MetricsQL.md: typo fix: convesions->conversions 2021-08-23 12:02:03 +03:00
Aliaksandr Valialkin
17a1241022 docs/MetricsQL.md: typo fixes 2021-08-23 11:59:12 +03:00
Aliaksandr Valialkin
60ac3e1e46 docs/MetricsQL.md: rehaul the documentation on MetricsQL
* Document all the functions supported by MetricsQL, including PromQL functions
* Group functions by their type: rollup functions, transform functions, label manipulation functions and aggregate functions.
* Document implicit query transformations.
2021-08-23 11:45:52 +03:00
Aliaksandr Valialkin
0078486ea7 app/vmselect/promql: rename sign() function to sgn() in order to be consistent with Prometheus
See https://github.com/prometheus/prometheus/pull/8457 for details.
2021-08-23 11:45:51 +03:00
Aliaksandr Valialkin
69c291353b deployment/docker: update Go builder from Go1.16.0 to Go1.17.0
This improves data ingestion and query performance by up to 5% according to benchmarks.

See https://go.dev/blog/go1.17
2021-08-21 22:20:49 +03:00
Aliaksandr Valialkin
d5622b32e2 lib/promscrape: reduce memory and CPU usage when Prometheus staleness tracking is enabled for metrics from deleted / disappeared scrape targets
Store the scraped response body instead of storing the parsed and relabeld metrics.
This should reduce memory usage, since the response body takes less memory than the parsed and relabeled metrics.
This is especially true for Kubernetes service discovery, which adds many long labels for all the scraped metrics.

This should also reduce CPU usage, since the marshaling of the parsed
and relabeld metrics has been substituted by response body copying.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-21 21:17:26 +03:00
Aliaksandr Valialkin
2288e75f03 docs/vmalert.md: run make docs-sync after 9ee3d0378f 2021-08-21 20:24:56 +03:00
Aliaksandr Valialkin
6ffbb46aef docs/CHANGELOG.md: document 9ee3d0378f 2021-08-21 20:20:08 +03:00
Aliaksandr Valialkin
89a4e8fd9b vendor: make vendor-update 2021-08-21 20:16:19 +03:00
Roman Khavronenko
9ee3d0378f vmalert: add flag disableAlertgroupLabel for disabling extra label added to series (#1534)
The new label added in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611
may negatively impact deduplication in Alertmanager. The new flag supposed to give
an option to disable adding this label.

To enable flag just add `-disableAlertgroupLabel` to binary execution command.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532
2021-08-21 20:08:55 +03:00
Aliaksandr Valialkin
4f3a5742eb app/vmselect/prometheus: do not extend [d] to the detected interval between samples for first_over_time(m[d])
This is for the sake of consistency with similar change for the last_over_time(m[d]) at a724229b5d
2021-08-21 19:56:14 +03:00
dependabot[bot]
41fdfdb895 build(deps): bump github.com/aws/aws-sdk-go from 1.40.25 to 1.40.26 (#1551)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.25 to 1.40.26.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.25...v1.40.26)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-21 19:52:46 +03:00
Alexander Rickardsson
f4cecaf296 vmalert: accept http.StatusOK for remotewrite (#1550) 2021-08-20 11:58:32 +03:00
Aliaksandr Valialkin
f46a73dcdd lib/promscrape: use scrapeTimestamp when storing stale markers for failed scrape
This will make timestamps for stale markers more consistent for timestamps for other samples
2021-08-19 14:18:05 +03:00
Aliaksandr Valialkin
c14edc860b docs/CHANGELOG.md: document b5d6a0e499 2021-08-19 14:03:20 +03:00
Roman Khavronenko
b5d6a0e499 vmselect: update vm_request_duration_seconds value when request fails (#1537)
Before, metric `vm_request_duration_seconds` was update only on successful
attempts which could be misleading. For example, timeout errors on netstorage
request may be not accounted in the metric and won't be visible on dashboards.
Using `defer` statement to update the metric after query arguments validation
may improve the situation.
2021-08-19 13:58:54 +03:00
dependabot[bot]
cbab5f3b42 build(deps): bump github.com/aws/aws-sdk-go from 1.40.22 to 1.40.25 (#1548)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.22 to 1.40.25.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.22...v1.40.25)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-19 13:56:42 +03:00
Aliaksandr Valialkin
80ddade4ed docs/CHANGELOG.md: clarify the change, which adds -search.noStaleMarkers command-line flag 2021-08-19 13:56:04 +03:00
Aliaksandr Valialkin
a724229b5d app/vmselect/promql: do not override [d] at last_over_time(m[d]) if [d] is smaller than scrape_interval
Since most users do not expect the overriding of explicitly set `[d]`.
2021-08-19 10:31:48 +03:00
Aliaksandr Valialkin
ce0c270e75 docs/CHANGELOG.md: cut v1.64.1
This is mostly bugfix release, which includes fixes for staleness handling and a security update for Alpine base image
2021-08-18 22:06:05 +03:00
Aliaksandr Valialkin
c09446a9aa lib/promscrape: send stale markers for the previously scraped metrics on failed scrapes like Prometheus does 2021-08-18 21:59:03 +03:00
Aliaksandr Valialkin
f6e6056c17 vendor: update github.com/valyala/gozstd from v1.11.0 to v1.12.0
This should improve query scalability on systems with big number of CPU cores
2021-08-18 14:57:19 +03:00
Aliaksandr Valialkin
04c3e9916d docs/CHANGELOG.md: document 06bf21c21b 2021-08-18 14:01:04 +03:00
Aliaksandr Valialkin
cdc372bb98 app/vmselect: add -search.noStaleMarkers command-line flag for disabling stale markers handling in queries
This option allows reducing CPU usage a bit when VictoriaMetrics is used
for collecting and processing non-Prometheus data. For example, InfluxDB line protocol, Graphite, OpenTSDB, CSV, etc.
2021-08-18 13:59:02 +03:00
Aliaksandr Valialkin
226143f31b lib/promscrape: add ability to disable sending Prometheus staleness markers with -promscrape.disableStaleMarkers command-line flag
This option can be useful when vmagent consumes too much additional memory
for staleness markers functionality and when staleness markers aren't needed.
2021-08-18 13:43:21 +03:00
Aliaksandr Valialkin
06bf21c21b deployment/docker: upgrade Alpine base docker image from v3.14.0 to v3.14.1
See https://www.alpinelinux.org/posts/Alpine-3.14.1-released.html

This fixes https://vuldb.com/?source_cve.180051
See also https://vuldb.com/?id.180051 and https://snyk.io/vuln/SNYK-ALPINE314-APKTOOLS-1533752
2021-08-18 11:04:11 +03:00
Aliaksandr Valialkin
db1e62495b app/vmselect/promql: add bitmap_and(), bitmap_or() and bitmap_xor() functions to MetricsQL
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1541
2021-08-17 13:21:21 +03:00
Aliaksandr Valialkin
277538d655 docs/Single-server-VictoriaMetrics.md: mention that vmctl can migrate data from OpenTSDB to VictoriaMetrics 2021-08-17 11:12:16 +03:00
Aliaksandr Valialkin
bd14b0887e app/vmselect/promql: move common condition to dropStaleNaNs in order to improve code maintainability 2021-08-17 11:01:16 +03:00
Aliaksandr Valialkin
03c959f1df lib/promscrape: stop scrapers for the removed targets before starting scrapers for the added targets
This should prevent from possible time series overlap when old target is substituted by new target (for example, during Kubernetes deployments).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
2021-08-17 00:55:51 +03:00
Aliaksandr Valialkin
90434ba25b app/vmalert: mention -remoteWrite.disablePathAppend in the description for -remoteWrite.url 2021-08-16 15:22:47 +03:00
Aliaksandr Valialkin
f37b963619 app/vmalert: follow-up for 2400f85761 2021-08-16 15:20:22 +03:00
Aliaksandr Valialkin
4547d4f692 docs/CHANGELOG.md: update urls to Prometheus 2.29 release
Previously these urls were pointing to rc0 release
2021-08-16 14:53:38 +03:00
Aliaksandr Valialkin
ae9f923449 docs/CHANGELOG.md: typo fix: satureated -> saturated 2021-08-16 14:53:38 +03:00
Alexander Rickardsson
2400f85761 vmalert: enable configuring explicit path (#1536)
* vmalert: allow to disable automatically added path to remote write address via disablePathAppend flag
* docs: update docs to include remoteWrite.disablePathAppend
2021-08-16 14:20:57 +03:00
dependabot[bot]
9af8c71975 build(deps): bump github.com/aws/aws-sdk-go from 1.40.21 to 1.40.22 (#1539)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.21 to 1.40.22.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.21...v1.40.22)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-16 11:31:22 +03:00
dependabot[bot]
8297ad8f03 build(deps): bump google.golang.org/api from 0.53.0 to 0.54.0 (#1538)
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.53.0 to 0.54.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/master/CHANGES.md)
- [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.53.0...v0.54.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-16 11:30:08 +03:00
Aliaksandr Valialkin
c518858145 docs/CHANGELOG.md: cut v1.64.0 2021-08-15 23:52:03 +03:00
Aliaksandr Valialkin
a0e18f06eb lib/promscrape: restore red highlighting for DOWN targets at /targets page
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1461
2021-08-15 16:03:57 +03:00
Aliaksandr Valialkin
40a859a760 docs/CHANGELOG.md: mention the bugfix when more than 27 time series are selected at /vmui 2021-08-15 15:10:41 +03:00
Aliaksandr Valialkin
386ee5b82c docs/CHANGELOG.md: mention that VMUI automatically fills Server URL field
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1506
2021-08-15 14:45:09 +03:00
Artem Navoiev
64d64976e4 add dependency chekcs for (#1535)
- ruby (for docs)
- gomod for monorepo
- npm for vmui
- gomod go small webserver in  vmui
2021-08-15 14:09:34 +03:00
Aliaksandr Valialkin
aefba16d5e app/vmagent/remotewrite: expose vmagent_remotewrite_send_duration_seconds_total metric
This metric can be used for determining high saturation of every connection to remote storage with
an alerting query `rate(vmagent_remotewrite_send_duration_seconds_total) > 0.9s`.
This query triggers when a connection is satureated by more than 90%
2021-08-15 13:34:12 +03:00
Aliaksandr Valialkin
113f0a8a07 app/vmselect/promql: drop staleness marks before calling rollupConfig.Do
This allows dropping staleness marks only once and then calculate multiple rollup functions on the result.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-15 13:21:10 +03:00
Aliaksandr Valialkin
25997a70f1 Revert "app/vmselect/promql: properly handle Prometheus staleness marks in removeCounterResets functions"
This reverts commit 94dfcb6747a3b29a11d14e71bea21a2312bb6346.

It is better to remove staleness marks (decimal.StaleNaN) before calling rollupConfig.Do, e.g. in preFunc
2021-08-15 13:19:16 +03:00
Aliaksandr Valialkin
73d7b568da app/vmselect/promql: properly handle Prometheus staleness marks in removeCounterResets functions
Prometheus stalenss marks shouldn't be changed in removeCounterResets. Otherwise they will be converted to an ordinary NaN values,
which couldn't be removed in dropStaleNaNs() function later. This may result in incorrect calculations for rollup functions.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2021-08-14 12:45:57 +03:00
Aliaksandr Valialkin
6d53620adf vendor: make vendor-update 2021-08-13 13:15:27 +03:00
Aliaksandr Valialkin
2ae2c1dd09 app/victoria-metrics/testdata: fix tests after 4401464c22 2021-08-13 12:21:54 +03:00
Aliaksandr Valialkin
4401464c22 all: add support for Prometheus staleness markers
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1530
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
2021-08-13 12:10:17 +03:00
Aliaksandr Valialkin
9a8d1bcec5 docs/Cluster-VictoriaMetrics.md: meniton that vmagent can be used for replicating the data among multiple clusters
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-12 12:47:32 +03:00
Aliaksandr Valialkin
556c1b36e5 vendor: update github.com/klauspost/compress from v1.13.1 to v1.13.4 2021-08-12 12:40:13 +03:00
Aliaksandr Valialkin
95dd5a48bb app/vmselect: make vmui-update after the commit 4ae14df864a7e327955f44941295a286175423b3 2021-08-11 13:41:41 +03:00
Aliaksandr Valialkin
860b272a95 app/vmui: actualize Dockerfiles 2021-08-11 13:41:41 +03:00
Denys Holius
81e4d644dd added guide for HA monitoring setup in K8s via VM Cluster (#1523)
* added guide for HA monitoring setup in K8s via VM Cluster

* fixed missed divs

* fixed different typos

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

* Update docs/guides/k8s-ha-monitoring-via-vm-cluster.md

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>

Co-authored-by: Roman Khavronenko <hagen1778@gmail.com>
2021-08-11 11:45:42 +03:00
Aliaksandr Valialkin
755f65f4bc app/vminsert: add vm_rpc_send_duration_seconds_total metric per each vminsert->vmstorage link
This metric is useful for determining high link saturation with the following alerting rule:

rate(vm_rpc_send_duration_seconds_total) > 0.9s
2021-08-11 11:44:40 +03:00
Aliaksandr Valialkin
869ff25392 docs/Cluster-VictoriaMetrics.md: update -help output for cluster components after the d375d9b878 2021-08-11 11:44:39 +03:00
Aliaksandr Valialkin
bcffd04e3a docs: make docs-sync after e0ee69797d 2021-08-11 10:53:49 +03:00
Roman Khavronenko
e0ee69797d docs: update "number of open files" tuning recommendation (#1527)
* docs: update "number of open files" tuning recomendation

Make "number of open files" recomendation not only Prometheus specific to avoid
confusion for users who does not use Prometheus.

* docs: mention fstrim in Tuning section
2021-08-11 10:51:02 +03:00
Aliaksandr Valialkin
d375d9b878 lib/envflag: add a link to docs for -envflag.enable 2021-08-11 10:29:33 +03:00
Aliaksandr Valialkin
5716af4636 deployment/dm: update Go builder from Go1.16.6 to Go1.16.7
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.7+label%3ACherryPickApproved
2021-08-06 12:12:03 +03:00
Yury Molodov
236fc7d739 vmui: fix layout and add server url by default (#1519)
* fix: change layout for correctly display big query

* fix: set default server from url

* fix: change get default server url
2021-08-06 12:06:08 +03:00
Aliaksandr Valialkin
5ce531027f docs/CHANGELOG.md: document new metrics added to vmalert at 7416fdaa8b 2021-08-05 10:13:08 +03:00
Aliaksandr Valialkin
c1185363ca app/vmagent: typo fix in the description for -remoteWrite.queues 2021-08-05 10:01:35 +03:00
Roman Khavronenko
7416fdaa8b vmalert: expose new metrics for tracking number of produced samples during last evaluation (#1518)
* vmalert: expose new metrics for tracking number of produced samples during last evaluation

Two new metrics were added to track the number of samples produced during the last evaluation:
* vmalert_recording_rules_last_evaluation_samples
* vmalert_alerting_rules_last_evaluation_samples

The gauge type is used to remain consistent with Prometheus metric
`prometheus_rule_group_last_evaluation_samples` which is on the group level.
However, the counter type was considered as well.

Two metrics instead of one are used to make it easier to separate recording and
alerting rules. It is likely, number of samples produced by recording rules is
more important so people will refer to it more frequently.

The expected usage of the new metric is the following:
```
   - alert: RecordingRuleReturnsEmptyResults
        expr: sum(vmalert_recording_rules_last_evaluation_samples) by(recording) < 1
        annotations:
          summary: Recording rule {{$labels.recording}} returns empty results.
            Please verify expression correctness.
```

Addresses https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494

* vmalert: rename `vmalert_alerts_error` to `vmalert_alerting_rules_error` to remain consistent with recording rules metrics
2021-08-05 09:59:46 +03:00
Aliaksandr Valialkin
d826352688 app/vmagent: follow-up after fe445f753b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491
2021-08-05 09:52:32 +03:00
Omar Ghader
46e27d60a6 feature: Add multitenant for vmagent (#1505)
* feature: Add multitenant for vmagent

* Minor fix

* Fix rcs index out of range

* Minor fix

* Fix multi Init

* Fix multi Init

* Fix multi Init

* Add default multi

* Adjust naming

* Add TenantInserted metrics

* Add TenantInserted metrics

* fix: remove unused metrics for vmagent

* fix: remove unused metrics for vmagent

Co-authored-by: mghader <marc.ghader@ubisoft.com>
Co-authored-by: Sebastian YEPES <syepes@gmail.com>
2021-08-05 09:52:31 +03:00
Aliaksandr Valialkin
f07165977a docs/Articles.md: actualize links and re-order some links 2021-08-03 16:11:49 +03:00
Aliaksandr Valialkin
50663ba41f lib/promscrape/discovery/gce: add __meta_gce_interface_ipv4_<name> labels as in Prometheus 2.29
See https://github.com/prometheus/prometheus/pull/8978
2021-08-03 16:11:49 +03:00
Aliaksandr Valialkin
3cad8b4564 lib/promscrape/discovery/ec2: add __meta_ec2_availability_zone_id label as Prometheus 2.29 does 2021-08-03 16:11:49 +03:00
Aliaksandr Valialkin
e92fde7945 app/vmselect/promql: add present_over_time(m[d]) function, which will be available starting from Prometheus 2.29.0
See https://github.com/prometheus/prometheus/releases/tag/v2.29.0-rc.0 and https://github.com/prometheus/prometheus/pull/9097
2021-08-03 16:11:49 +03:00
Qifei Wan
fa9c5c5940 app/vmalert: update config state metrics if config parsed failed (#1507) 2021-08-03 12:55:29 +03:00
Roman Khavronenko
370fe9fa2a docs: add "Scaling to trillions of metric data points" to articles (#1517) 2021-08-03 11:07:45 +03:00
wusphinx
c1ed7b77aa Update TimeSelector.tsx (#1515)
delete garbled code
2021-08-03 10:01:01 +03:00
Nikolay
7bbff7fb86 adds /rules and /alerts api for grafana (#1504)
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-08-02 17:28:09 +03:00
Roman Khavronenko
b385fa622b docs: mention "Push Prometheus metrics to VictoriaMetrics or other exporters" article (#1511) 2021-08-02 17:23:10 +03:00
Roman Khavronenko
a641102ec2 docs: fix indentation for guide articles (#1512) 2021-08-02 17:16:58 +03:00
Roman Khavronenko
408ba43092 Alerts single update (#1510)
* alerts: move `ProcessNearFDLimits` to `vm-health` group since it is relevant for all services

* alerts: add new `TooHighMemoryUsage` alerting rule
2021-08-02 15:51:24 +03:00
Aliaksandr Valialkin
66eb60f20d docs/CaseStudies.md: typo fix: hed->had 2021-07-30 18:32:12 +03:00
Aliaksandr Valialkin
8a3c13fd53 docs/CHANGELOG.md: typo fix 2021-07-30 12:35:57 +03:00
Aliaksandr Valialkin
a3b4fc0474 docs/CHANGELOG.md: document d05cac6c98
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486
2021-07-30 12:19:53 +03:00
Aliaksandr Valialkin
a1911e1330 app/vmselect/netstorage: unpack time series data in mostly local big chunks
This should improve performance on multi-CPU systems for queries selecting time series with big number of raw samples
2021-07-30 12:03:17 +03:00
Aliaksandr Valialkin
d05cac6c98 li/storage: re-use the per-day inverted index search code for searching in global index
This allows removing a big pile of outdated code for global index search.

This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486
2021-07-30 10:31:37 +03:00
Aliaksandr Valialkin
74ffaa45d9 app/vmselect/netstorage: do not query Go maps with unsafe string keys, since this breaks in Go 1.17 2021-07-30 09:57:53 +03:00
Aliaksandr Valialkin
192dfbfd90 app/vmselect: follow-up for ed95bc9531
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1493
2021-07-29 09:53:28 +03:00
arnoldyahad
00af4ff5a4 Add case prometheus/rules for grafana 8 (#1502) 2021-07-29 09:53:27 +03:00
assassins
a483044557 Performance optimization (#1481)
There are redundant steps
2021-07-28 19:26:20 +03:00
Aliaksandr Valialkin
e20ec090b2 docs: remove SampleSizeCalculations.md, since it is outdated and no longer used
There was a reference to this doc from the old victoriametrics.com site
2021-07-28 19:25:16 +03:00
Aliaksandr Valialkin
8ee8660ac4 app/vmselect: follow-up for 626073bca8
* Rename -search.maxMetricsPointSearch to -search.maxSamplesPerQuery, so it is more consistent with the existing -search.maxSamplesPerSeries
* Move the -search.maxSamplesPerQuery from vmstorage to vmselect, so it could effectively limit the number of raw samples obtained from all the vmstorage nodes
* Document the -search.maxSamplesPerQuery in docs/CHANGELOG.md
2021-07-28 18:00:23 +03:00
Denys Holius
9ffd70a921 Added new guide for monitoring k8s via VictoriaMetrics cluster (#1476)
* renamed and moved screenshots

* fixed cluster guide, updated helm chart versions, added values.yaml for vm single

* renamed guide files

* fixed typo

* add some fixes

* fixed typos,added guide k8s-monitoring-via-vm-cluster

* added fixes for yamls
2021-07-27 18:01:12 +03:00
Aliaksandr Valialkin
8481f4f004 docs/CHANGELOG.md: document 9d45b46f4c 2021-07-27 12:38:31 +03:00
Nikolay
9d45b46f4c adds check for region with custom s3 endpoint (#1465) 2021-07-27 12:35:38 +03:00
Aliaksandr Valialkin
c2deee9911 lib/storage: yet another attempt to properly determine disk space shortage, which prevents from optimal merges
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-27 12:04:50 +03:00
Aliaksandr Valialkin
bb31117555 lib/promrelabel: add tests for verifying that regex works as expected in single quotes and double quotes 2021-07-27 10:50:55 +03:00
Aaron France
fec509fe2d fix: typo in metrics.md docs 2021-07-26 21:53:26 +03:00
dependabot[bot]
0ef150c14b build(deps): bump codecov/codecov-action from 2.0.1 to 2.0.2
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2.0.1 to 2.0.2.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v2.0.1...v2.0.2)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-26 20:42:21 +03:00
Aliaksandr Valialkin
ef781cefa7 vendor: make vendor-update 2021-07-26 16:02:46 +03:00
Aliaksandr Valialkin
8b7917cd81 all: add go:build lines for Go1.17
See https://tip.golang.org/doc/go1.17#gofmt for more details
2021-07-26 15:48:21 +03:00
Aliaksandr Valialkin
95aff47330 app/vmselect: prevent from possible deadlock when f callback blocks inside RunParallel 2021-07-26 15:47:30 +03:00
Aliaksandr Valialkin
1318736ad1 lib/promscrape: add missing whitespace at /targets page before up word 2021-07-26 12:22:59 +03:00
Aliaksandr Valialkin
fcaf152480 app/vmselect: make vmui-update after a91d41f12a 2021-07-26 10:31:11 +03:00
Aliaksandr Valialkin
bfb18438ec docs/Articles.md: add links to new articles 2021-07-23 21:06:58 +03:00
dependabot[bot]
bf25a256c5 build(deps): bump codecov/codecov-action from 1.5.2 to 2.0.1 (#1468)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 1.5.2 to 2.0.1.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v1.5.2...v2.0.1)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-23 12:01:52 +03:00
Yury Molodov
a91d41f12a Vmui/query editor (#1472)
* fix: move request button to server input

* feat: add switch for query autocomplete

* refactor: rename state for popover open

* feat: add detect os by userAgent

* fix: change hotkey to run query for mac

* fix: change detect mac os

* fix: change div to span inside Typography

Co-authored-by: yury <yurymolodov@victoriametrics.com>
2021-07-23 12:00:44 +03:00
Aliaksandr Valialkin
05672ffc32 app/vmselect/promql: properly handle (a op b) default N if (a op b) returns NaN series
The result should be a series with `N` values and `a op b` labels. Previously such series has been removed from the result.
2021-07-16 01:44:58 +03:00
Aliaksandr Valialkin
ed10141ff8 app/vmselect/netstorage: use more scalable algorithm for ditributing the work among among multiple channels on systems with big number of CPU cores 2021-07-16 00:35:23 +03:00
Aliaksandr Valialkin
ca75432e66 app/vmselect: do not track queries with less than 1ms execution time at /api/v1/status/top_queries
This should improve the readability and usefullness of the /api/v1/status/top_queries when debugging slow queries
or queries that take too much cpu time.
2021-07-15 16:44:28 +03:00
Aliaksandr Valialkin
4ba3fd9e6d lib/workingsetcache: switch from split cache to full cache after the cache size exceeds 95% of split capacity
Previously the switch occurred when the cache size becomes 100% of its capacity. The cache size could never reach 100% capacity.
This could prevent from switching from the split cache to full cache, thus reducing the cache effectiveness.
2021-07-15 16:12:04 +03:00
Aliaksandr Valialkin
f4e81aef7e app/vmselect/netstorage: add -search.maxSamplesPerSeries command-line option for limiting the number of samples a query can process per each series
This should prevent from out of memory crashes like in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1067
2021-07-15 16:03:28 +03:00
Aliaksandr Valialkin
e6ef97a5ee app/vmselect/netstorage: improve scalability of series unpacking on multi-CPU systems 2021-07-15 15:41:58 +03:00
Aliaksandr Valialkin
171d44acd8 docs/CHANGELOG.md: typo fix: suffxies->suffixes 2021-07-15 15:02:20 +03:00
Aliaksandr Valialkin
f81d972581 app/vmui/README.md: typo fix: naviate->navigate 2021-07-15 15:02:04 +03:00
Aliaksandr Valialkin
61cc13c16f docs/CHANGELOG.md: cut v1.63.0 2021-07-15 14:02:13 +03:00
Aliaksandr Valialkin
b060d8bf53 vendor: make vendor-update 2021-07-15 12:55:40 +03:00
Aliaksandr Valialkin
d472b03e34 lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples 2021-07-15 12:17:45 +03:00
Aliaksandr Valialkin
682662b2ae lib/storage: remove cache directory if it contains reset_cache_on_startup file
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:58:51 +03:00
Aliaksandr Valialkin
2b0e3efa5c .github/workflows/wiki.yml: properly copy subdirectories 2021-07-13 17:35:02 +03:00
Aliaksandr Valialkin
8d99e94a52 docs: clarify why spare CPU and RAM resources are needed in capacity planning 2021-07-13 15:48:25 +03:00
Aliaksandr Valialkin
8d0ec47be9 docs/Cluster-VictoriaMetrics.md: clarify the docs about the needed values for -dedup.minScrapeInterval at vmselect during replication when the data is pushed from HA pair 2021-07-13 15:43:02 +03:00
Aliaksandr Valialkin
2df66dad7b lib/httpserver: add is_set label to flag metrics
This label allows determining the set flags with the query `flag{is_set="true"}`
2021-07-13 15:10:13 +03:00
Aliaksandr Valialkin
5bc240bffe vendor: make vendor-update 2021-07-13 14:30:19 +03:00
Aliaksandr Valialkin
244d0fe5d7 deployment/docker: update Go builder from v1.16.5 to v1.16.6
Ths Go release has the following bugfixes: https://github.com/golang/go/issues?q=milestone%3AGo1.16.6+label%3ACherryPickApproved
2021-07-13 14:25:41 +03:00
Aliaksandr Valialkin
a925d5a3e1 app/vmselect/promql: duration handling improvements in MetricsQL queries
- Support durations anywhere in MetricsQL queries. E.g. sum_over_time(m[1h])/1h is equivalent to sum_over_time(m[1h])/3600
- Support durations without suffix. E.g. rate(m[300]) is equivalent to rate(m[5m])
2021-07-12 17:16:41 +03:00
Aliaksandr Valialkin
f9de546139 lib/storage: reset perKeyMisses stats less frequently
This should reduce CPU usage for queries executed with intervals higher than 30 seconds
2021-07-12 14:33:42 +03:00
Aliaksandr Valialkin
4f80b2f230 lib/storage: properly limit the size of storage/date_metricID cache 2021-07-12 14:25:44 +03:00
Aliaksandr Valialkin
f3a5465ece docs/CHANGELOG.md: document the change from bfba4c28a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1444
2021-07-12 12:44:06 +03:00
Aliaksandr Valialkin
bfba4c28a4 app/vmalert: accept Prometheus-like durations in interval config option inside group section 2021-07-12 12:35:17 +03:00
Aliaksandr Valialkin
a684408e27 docs: update http://slack.victoriametrics.com to https://slack.victoriametrics.com 2021-07-12 10:57:16 +03:00
Aliaksandr Valialkin
8ca2799478 lib/storage: properly determine when the deduplication is needed in needsDedup
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:

         d < interval
           /     \
   |        v | v        |
     interval   interval

Now it properly returns false for this case
2021-07-12 10:53:30 +03:00
Aliaksandr Valialkin
f539772ca6 docs: sync with the cluster branch 2021-07-10 12:45:38 +03:00
Aliaksandr Valialkin
8c764e88f0 app/vmui: move source code from https://github.com/VictoriaMetrics/vmui to app/vmui
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
2021-07-09 17:15:23 +03:00
Aliaksandr Valialkin
2be340a4c9 docs: clarify what does "workload" mean in capacity planning docs 2021-07-09 12:49:53 +03:00
Aliaksandr Valialkin
c5f0b454f0 app/vmselect: follow-up after aa11ef6d3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
2021-07-07 17:43:35 +03:00
tony
e9e35a7d6a add vmui for vmselect component (#1431)
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-07-07 17:33:02 +03:00
Aliaksandr Valialkin
92b8380fdf docs/Cluster-VictoriaMetrics.md: improve capacity planning recommendations 2021-07-07 16:22:51 +03:00
Aliaksandr Valialkin
1152de4321 vendor: make vendor-update 2021-07-07 16:05:04 +03:00
Aliaksandr Valialkin
7e77980608 vendor: update github.com/VictoriaMetrics/metrics from v1.17.2 to v1.17.3 2021-07-07 16:00:25 +03:00
Aliaksandr Valialkin
6e0553c92e lib/mergeset: cache indexBlock items only on the second request
This should reduce the indexdb/indexBlocks cache size, since it won't contain one-time-wonders items.
2021-07-07 15:23:06 +03:00
Aliaksandr Valialkin
ed944313b0 app/{vminsert,vmselect}: export vminsert_request_duration_seconds and vmselect_request_duration_seconds histograms 2021-07-07 13:25:21 +03:00
Aliaksandr Valialkin
766edbc421 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:09:40 +03:00
Aliaksandr Valialkin
c55a64ba08 docs: clarify capacity planning docs 2021-07-07 12:46:55 +03:00
Aliaksandr Valialkin
e843bd7bd7 lib/storage: do not cache inmemoryBlock entries requested only once (aka one-time-wonder items)
This should reduce the cache size and memory usage for the indexdb/dataBlocks cache
2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
8b262d4ba7 lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate 2021-07-07 10:58:51 +03:00
Roman Khavronenko
2f54559c89 alerts: sync alert expression for DiskRunsOutOfSpaceIn3Days with dashboard (#1436) 2021-07-07 10:31:09 +03:00
Aliaksandr Valialkin
a7694092b8 Revert "lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method"
This reverts commit 7c6d3981bf.

Reason for revert: high contention at bucket16Pool on systems with big number of CPU cores.
This slows down query processing significantly.
2021-07-06 18:21:35 +03:00
Aliaksandr Valialkin
8aa9bba9bd lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects
This should reduce memory usage on systems with big number of CPU cores,
since every inmemoryPart object occupies at least 64KB of memory and sync.Pool maintains
a separate pool inmemoryPart objects per each CPU core.

Though the new scheme for the pool worsens per-cpu cache locality, this should be amortized
by big sizes of inmemoryPart objects.
2021-07-06 16:28:41 +03:00
Aliaksandr Valialkin
7c6d3981bf lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method
This reduces the load on memory allocator in Go runtime in production workload.
2021-07-06 15:35:03 +03:00
Aliaksandr Valialkin
78c9174682 lib/mergeset: increase pool capacity for inmemoryBlock according to collected profiles from production workload
CPU and memory profiles show that the pool capacity for inmemoryBlock objects is too small.
This results in the increased load on memory allocation code in Go runtime.
Increase the pool capacity in order to reduce the load on Go runtime.
2021-07-06 13:41:34 +03:00
Aliaksandr Valialkin
f71e4d1853 lib/mergeset: limit the frequency for flushCallback calls to once per 10 seconds
This should improve hit ratio for tagFiltersCache when big number of new time series are constantly registered
(aka high churn rate). This, in turn, should reduce CPU usage for queries over such time series.
2021-07-06 12:17:17 +03:00
Aliaksandr Valialkin
f3acf065c9 lib/storage: consistency renaming: tagCache -> tagFiltersCache
This improves code readability
2021-07-06 11:03:51 +03:00
Aliaksandr Valialkin
0020b9f904 lib/workingsetcache: properly update stats for requests and cache misses
Previously the stats for cache misses could be improperly counted, because it had inflated cache misses
if the entry was missing in the curr cache, but was existing in the prev cache.

The same applies to cache requests - they were inflated if the entry was missing in the curr cache.
2021-07-06 10:53:32 +03:00
Roman Khavronenko
75f12bfe78 add option to add Copy button for code snippets (#1433)
To add a Copy button wrap code snippet with the following element:
```
<div class="with-copy" markdown="1">

<your-code-snippet>

</div>
```

See the changes to `Kubernetes monitoring with VictoriaMetrics Single` for details.
2021-07-06 08:23:39 +03:00
Aliaksandr Valialkin
4cf47163c1 lib/workingsetcache: fix cache capacity calculations after 4f0003f182 2021-07-05 17:11:57 +03:00
Aliaksandr Valialkin
4f0003f182 lib/workingsetcache: typo fixes after d0c830039d 2021-07-05 15:35:37 +03:00
Aliaksandr Valialkin
d0c830039d lib/storage: tune cache sizes according to production workload 2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
8f973e34fb lib/workingsetcache: properly switch to whole mode
Previously the switch from `split` to `whole` mode had been performed too early,
e.g. when the current cache size became bigger than 1/4 of the allowed cache size.

Now it is performed when the current cache size becomes bigger than 1/2 of the allowed cache size.

This change can reduce memory usage for data ingestion path when big number of active time series are ingested.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
43103be011 lib/{storage,mergeset}: increase cache timeout for data and index blocks from a minute to two minutes
One minute cache timeout result in slower queries in some production workloads where the interval
between query execution is in the range 1 minute - 2 minutes.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
54b9e1d3cb lib/cgroup: set GOGC to 50 by default if it isn't set
This should reduce memory usage for typical VictoriaMetrics workloads by up to 50%
2021-07-05 15:16:11 +03:00
Roman Khavronenko
bd6b8f7e31 move github-pages docs to the main repo (#1432)
* move github-pages docs to the main repo

* rm github actions for copying docs to VictoriaMetrics/VictoriaMetrics.github.io
2021-07-05 14:34:10 +03:00
Aliaksandr Valialkin
888d62e40c docs/CHANGELOG.md: document the bugfix for vm_merge_need_free_disk_space metric at 9a83e9018d 2021-07-05 12:01:08 +03:00
Aliaksandr Valialkin
6bef227388 docs/Articles.md: add an url to https://medium.com/ibm-garage/monitoring-of-multiple-openshift-clusters-with-victoriametrics-d4f0979e2544 2021-07-05 11:51:37 +03:00
Aliaksandr Valialkin
9a83e9018d lib/storage: properly detect free disk space shortage during data merge
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:40:54 +03:00
Aliaksandr Valialkin
f6bb130898 app/{vmselect,vmstorage}: clarify the description for -dedup.minScrapeInterval command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1426
2021-07-02 15:05:57 +03:00
Aliaksandr Valialkin
7088f17494 lib/promscrape/discovery/consul: use case-insensitive comparison for service names
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1424
2021-07-02 14:49:27 +03:00
Aliaksandr Valialkin
6e406083f2 lib/promauth: cache the client TLS certificate for up to a second
This should reduce CPU usage when TLS connections are established at a high rate.
2021-07-02 13:21:51 +03:00
Aliaksandr Valialkin
a93da746c0 vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.15 to v1.0.16
This should help with proper copying of tls.Config struct after it has been set up in lib/promauth.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-02 12:03:32 +03:00
Aliaksandr Valialkin
1e52f11e87 docs/Cluster-VictoriaMetrics.md: typo fix: siplify -> simplify 2021-07-02 10:50:55 +03:00
Aliaksandr Valialkin
5beb099ff9 docs/Cluster-VictoriaMetrics.md: add a chapter describing a toy cluster setup on a single host
While at it, refer to available tools, which can simplify cluster setup
2021-07-02 10:48:56 +03:00
Aliaksandr Valialkin
158c50c0ee lib/promauth: reload TLS certificates from disk on every mTLS connection as Prometheus does
This allows updating client certificates without the need to restart vmagent and/or single-node VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-01 15:27:47 +03:00
Aliaksandr Valialkin
02ed4b8b46 docs/CHANGELOG.md: document ae485c2bfd 2021-07-01 11:51:19 +03:00
Aliaksandr Valialkin
c25b839078 lib/workingsetcache: reset the cache mode when the cache is reset
This should reduce memory usage if the working set is reduced after the cache reset.
2021-07-01 11:50:11 +03:00
Nikolay
ae485c2bfd fixes /targets button style (#1423)
* fixes /targets button style
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1422

* updates boostrap version
2021-07-01 11:48:07 +03:00
Aliaksandr Valialkin
00bbe1608b deployment/docker: upgrade alpine image from v3.13.5 to v3.14.0 2021-07-01 10:56:56 +03:00
Roman Khavronenko
a38a6fe8ad dashboard: move panel Disk writes/reads to Resource usage row (#1417)
* dashboard: move panel `Disk writes/reads` to `Resource usage` row

* dashboard: make Stats panel consistent with Cluster dashboard
2021-07-01 05:46:26 +03:00
Aliaksandr Valialkin
c93cee8de8 lib/{mergeset,storage}: reduce the maximum lifetime for cached indexdb and data blocks from 2 minutes to a minute
This should reduce memory usage on a system with high number of active time series and a high churn rate.
One minute is enough for caching the blocks needed for repeated queries (e.g. alerting rules, recording rules and dashboard refreshes).
2021-06-29 19:57:07 +03:00
Aliaksandr Valialkin
fc12484734 lib/mergeset: switch from sync.Pool to a channel for a pool for inmemoryBlock structs
This should reduce memory usage for the pool on systems with big number of CPU cores.

The sync.Pool maintains per-CPU pools, so the total number of objects in the pool
is proportional to the number of available CPU cores. The channel limits the number
of pooled objects by its own capacity. This means smaller number of pooled objects on average.
2021-06-29 19:56:59 +03:00
Aliaksandr Valialkin
9ce211a514 lib/promscrape/discovery/docker: fix golint warning: struct field Id should be ID 2021-06-29 13:12:28 +03:00
Aliaksandr Valialkin
5506cff76e lib/storage: put indexDBName into the key for dateTagFilter cache and for uselessTagFilters cache
This should prevent from stats overwriting when the previous indexdb is queried.
2021-06-29 12:40:05 +03:00
Aliaksandr Valialkin
1b0501a09e lib/promscrape: typo fix in /targets output
The typo has been introduced in fb72a2133f

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1408
2021-06-28 21:26:37 +03:00
Aliaksandr Valialkin
3af2162085 docs/vmagent.md: mention about docker_sd_config support 2021-06-25 20:52:15 +03:00
Aliaksandr Valialkin
008033a374 docs/CHANGELOG.md: cut v1.62.0 2021-06-25 13:29:38 +03:00
Aliaksandr Valialkin
cb5453953f lib/promscrape: split docker and dockerswarm service discovery code bases, since they have very little in common
This is a follow up after c85a5b7fcb
2021-06-25 13:20:20 +03:00
Aliaksandr Valialkin
a69045e440 lib/promscrape: consistently sort service discovery routines
This should simplify further maintenance of the code
2021-06-25 12:10:46 +03:00
Lu Jiajing
c85a5b7fcb Support Docker ServiceDiscovery (#1402)
* add docker discovery

* add test

* add labels test and add scrape work

* remove TODO

* refactor to merge apiConfig and sdConfig

* apply suggestion
2021-06-25 11:42:47 +03:00
Nikolay
434e33da9b adds missing MustStop call to do and http sd (#1404) 2021-06-25 11:39:18 +03:00
Aliaksandr Valialkin
b50e2ec88c vendor: make vendor-update 2021-06-24 17:33:42 +03:00
Aliaksandr Valialkin
4345c07777 docs: consistently put the link to articles and slides about VictoriaMetrics after the links to case studies 2021-06-24 15:37:38 +03:00
Aliaksandr Valialkin
b2fca1ab22 docs/CaseStudies.md: add a case study for DFKI 2021-06-24 15:24:39 +03:00
Aliaksandr Valialkin
906fca9e88 docs/CaseStudies.md: add Groove X case study 2021-06-24 15:04:49 +03:00
Aliaksandr Valialkin
3c3694f72a README.md: add missing linke to Sensedia case study 2021-06-24 14:37:49 +03:00
Aliaksandr Valialkin
d25a161579 docs/CaseStudies.md: add Sensedia case study 2021-06-24 14:36:13 +03:00
Aliaksandr Valialkin
ca42410afd docs/CHANGELOG.md: document the bugfix in increase_pure() function from the commit fb4f758715 2021-06-24 12:05:39 +03:00
Aliaksandr Valialkin
5d64ed73c5 lib/protoparser/clusternative: do not pool unmarshalWork structs, since they can occupy big amounts of memory (more than 100MB per each struct)
This should reduce memory usage for vmstorage under high ingestion rate when the vmstorage runs on a system with big number of CPU cores
2021-06-23 15:46:50 +03:00
Aliaksandr Valialkin
cdfae0117a app/vmselect/promql: return the last timestamp for the max / min value from tmax_over_time() and tmin_over_time() function as most users expect 2021-06-23 14:19:00 +03:00
Aliaksandr Valialkin
70e2852376 docs/CHANGELOG.md: document the bugfix for incorrect stats collection for concurrently executed tag filter
Follow up for c22114c6f0
2021-06-23 14:05:28 +03:00
Aliaksandr Valialkin
94f3e40ab3 app/vminsert/netstorage: sort the -storageNode list passed to vminsert nodes
This should reduce resource usage (CPU, RAM, disk IO) at vmstorage nodes
if the addresses of vmstorage nodes are passed in random order to vminsert nodes.
2021-06-23 14:01:41 +03:00
Aliaksandr Valialkin
c22114c6f0 lib/storage: tune tag filters search logic
Tune the logic according to the logs provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-864293624

The previous logic had a race when multiple concurrent queries execute the same tag filter without prior stats.
This could result in incorrectly stored stats for such tag filter, which then could result in non-optimal sorting of tag filters
for further queries.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-23 13:29:39 +03:00
Aliaksandr Valialkin
e8a5bb92b7 lib/promscrape/discovery/consul: properly pass namespace to Consul watcher
Follow-up for 58a2989fe7
2021-06-22 17:42:41 +03:00
Aliaksandr Valialkin
ac54f34f9e lib/promscrape/discovery/http: follow up after e307bbb29a 2021-06-22 13:40:33 +03:00
Aliaksandr Valialkin
755040a171 lib/promscrape/discovery: support generic auth configs in Consul service discovery in the same way as Prometheus 2.28 does 2021-06-22 13:34:02 +03:00
Aliaksandr Valialkin
59e7755df9 docs/CHANGELOG.md: document the support for Consul namepsace
See 58a2989fe7
2021-06-22 13:34:02 +03:00
Nikolay
e307bbb29a adds http_sd (#1399)
* adds http_sd

* adds X-Prometheus-Refresh-Interval-Seconds header

* Update lib/promscrape/discovery/http/api.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 13:33:37 +03:00
Nikolay
58a2989fe7 adds consul enterprise namespace support (#1400)
* adds consul enterprise namespace support

* Update lib/promscrape/discovery/consul/consul.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 12:49:44 +03:00
Aliaksandr Valialkin
3bc83d3b17 docs/PerTenantStatistic.md: document that the per-tenant statistic is a part of cluster version of VictoriaMetrics 2021-06-22 12:43:01 +03:00
Roman Khavronenko
da8c901fab vmctl: add more context to flags description in vm-native mode (#1395) 2021-06-18 19:20:01 +03:00
Aliaksandr Valialkin
a3262daac0 docs/CHANGELOG.md: typo fixes 2021-06-18 19:14:53 +03:00
Aliaksandr Valialkin
83a4db813e app/vmselect: log slow requests to all the /api/v1/* handlers if their execution time exceeds -search.logSlowQueryDuration 2021-06-18 19:04:42 +03:00
Aliaksandr Valialkin
4cfedc5931 docs/Single-server-VictoriaMetrics.md: mention that it is recommended to use a single scrape_interval across all the scrape targets 2021-06-18 15:39:33 +03:00
Aliaksandr Valialkin
570f36b344 app/vmctl: limit JSON line size by 10K samples (#1394)
This should reduce the maximum memory usage at VictoriaMetrics when importing time series with big number of samples.
2021-06-18 15:26:47 +03:00
Aliaksandr Valialkin
eb1af09a04 docs/Cluster-VictoriaMetrics.md: clarify docs about VictoriaMetrics cluster architecture 2021-06-18 14:36:39 +03:00
Aliaksandr Valialkin
cbd9159a22 docs/CHANGELOG.md: document the reduced disk write IO usage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-18 14:02:59 +03:00
Aliaksandr Valialkin
fb72a2133f lib/promscrape: show jobs with empty scrape targets on /targets page 2021-06-18 10:53:52 +03:00
Aliaksandr Valialkin
d8ab409418 docs/{vmgateway,vmbackupmanager}: explicitly mention that these components are a part of an enterprise package 2021-06-17 17:19:49 +03:00
Nikolay
6c434b260e fixes DO service discovery labels (#1389)
adds test for digitalocean sd
2021-06-17 15:12:20 +03:00
Aliaksandr Valialkin
dcbc22552f lib/storage: fix infinite loop introduced in aa9b56a046 2021-06-17 14:28:10 +03:00
Aliaksandr Valialkin
aa9b56a046 lib/{mergeset,storage}: reduce the number of fsync calls on data ingestion path on systems with many cpu cores
VictoriaMetrics maintains a buffer per CPU core for the ingested data. These buffers are flushed to disk every second.
These buffers are flushed to disk in parallel starting from the commit 56b6b893ce .
This resulted in increased write disk IO usage on systems with many cpu cores
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999 .

This commit merges the per-CPU buffers into bigger in-memory buffers before flushing them to disk.
This should reduce the rate of fsync syscalls and, consequently, the write disk IO on systems with many CPU cores.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 13:52:08 +03:00
Aliaksandr Valialkin
12a83d25bf app/vmagent/remotewrite: go fmt after 0a796f7c3a 2021-06-17 13:52:06 +03:00
Aliaksandr Valialkin
9eb3fc346f docs/vmagent.md: sync with app/vmagent/README.md via make docs-sync 2021-06-16 12:36:49 +03:00
Aliaksandr Valialkin
6d17a4e12d docs/CHANGELOG.md: document the changed -remoteWrite.queues value
This is a follow-up for 0a796f7c3a

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1385
2021-06-16 12:35:46 +03:00
Zongyang
0a796f7c3a Change default value of '-remoteWrite.queues' to cgroup.AvailableCPUs * 2 (#1385)
* Change default value of '-remoteWrite.queues' to cgroup.AvailableCPUS() * 2 to reduce scrape interval

Default value of vmagent option '-remotewrite.queues' is 4 and default
size of vmagent ScheudleUnmarshalWorkers is number of CPUs, when available
CPUs is much greater than 4, e.g 32, worker are competing push queues
which will increase scrape interval and may cause scrape timeout.

* Update README and flag description

Co-authored-by: xiaozy <xiaozy01@fenbi.com>
2021-06-16 12:16:44 +03:00
Roman Khavronenko
fb4f758715 promql: fix increase_pure calculation for cases with stale series (#1381)
Due to staleness handling, increase_pure were using incorrect previous value
during calculation in cases where series disappears for period longer
than staleness period and then returns back. The fix suppose to account
for a real datapoint value before staleness takes place. The fix should
remove unexpected spikes while using `increase_pure` for staled series.
2021-06-15 17:37:19 +03:00
Aliaksandr Valialkin
6a8369f0fc docs/Single-server-VictoriaMetrics.md: mention that VictoriaMetrics works great with APM workloads (aka Application Performance Monitoring) 2021-06-15 17:32:52 +03:00
Aliaksandr Valialkin
84fb59b0ba lib/storage: move deletedMetricIDs set from indexDB to Storage
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .

Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin
e028ad241a lib/protoparser: stop reading the input stream as soon as the callback provided by the caller returns error
This is a follow-up for af90c3c43b
2021-06-14 15:18:49 +03:00
faceair
af90c3c43b lib/protoparser: stop read when callback error (#1380) 2021-06-14 15:10:58 +03:00
Aliaksandr Valialkin
36d55bff66 lib/promscrape: show the number of samples collected during the last scrape at /targets and /api/v1/targets pages
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1377
2021-06-14 14:04:00 +03:00
Aliaksandr Valialkin
7b283ee91c vendor: update github.com/klauspost/compress from v1.13.0 to v1.13.1 2021-06-14 13:42:46 +03:00
Roman Khavronenko
a90012ef26 dashboard: bump version requirements (#1378) 2021-06-14 13:31:59 +03:00
Aliaksandr Valialkin
05bc9667c1 docs/CHANGELOG.md: document the addition of DigitalOcean service discovery
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367
2021-06-14 13:18:19 +03:00
Nikolay
729c4eeb9c adds digital ocean sd (#1376)
* adds digital ocean sd config

* adds digital ocean sd
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367

* typo fix
2021-06-14 13:15:04 +03:00
Roman Khavronenko
b8526e88d3 Dashboard single (#1374)
* dashboard: update single version dash

The update contains the following changes:
* display anonymous memory usage metric. This metric suppose to reflect
memory usage of the process which can't be freed by OS;
* add legends to all panels. This is important for cases when users share
the screenshots;
* modify panels for Grafana v8.0.0

* dashboard: update single version dash tags

* dashboard: update vmagent dash

The update contains the following changes:
* display anonymous memory usage metric. This metric suppose to reflect
memory usage of the process which can't be freed by OS;
* add legends to all panels. This is important for cases when users share
the screenshots;
* modify panels for Grafana v8.0.0
2021-06-14 13:03:23 +03:00
Aliaksandr Valialkin
06b8e7d148 lib/promscrape: increase the duration for reading the full response in stream parsing mode
Increase the duration from 10x to 30x of the configured `scrape_interval'.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:28:09 +03:00
Aliaksandr Valialkin
48210130ac lib/protoparser: measure the duration for reading the whole block of data instead of a single read operation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:25:52 +03:00
Aliaksandr Valialkin
3e5b6bae66 docs/Cluster-VictoriaMetrics.md: add lists for command-line flags for cluster components 2021-06-14 12:22:05 +03:00
Aliaksandr Valialkin
3c4366806c lib/protoparser/common: log the duration for reading a block of data in ReadLinesBlockExt on error
This may help debugging issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:22:04 +03:00
Aliaksandr Valialkin
8a519f1518 docs/vmalert.md: follow-up after 6d5a8c28cd 2021-06-14 11:37:26 +03:00
Roman Khavronenko
6d5a8c28cd Vmalert docs (#1372)
* vmalert: mention what happens if `for` is set to 0 or omitted

* vmalert: add more context to docs
2021-06-11 13:25:53 +03:00
Aliaksandr Valialkin
2ef2e3d6f8 docs/CHANGELOG.md: cut v1.61.1 2021-06-11 13:01:31 +03:00
Aliaksandr Valialkin
ed83558646 app/vmauth: properly handle http.ErrAbortHandler panic
This panic can be raised by the reverseProxy on aborted request to the backend.
So handle it (e.g. suppress) at reverseProxy.ServeHTTP call.

Do not suppress the panic at lib/httpserver generic HTTP handler,
since it may result in an inconsistent state left after the panicking handler.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
2021-06-11 12:50:25 +03:00
Aliaksandr Valialkin
c4f3fbfa5d lib/storage: reset cache on disk during series deletion and during indexdb rotation
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin
d979e14da2 docs/CHANGELOG.md: document the bugfix from 7adfe878e1 2021-06-11 11:28:10 +03:00
Roman Khavronenko
7adfe878e1 vmalert: fix mistake with object reuse while parsing response (#1370)
* vmalert: fix mistake with object reuse while parsing response

During the refactoring, the wrong optimisations was applied in
parse function which caused metric fields reset. The change removes
optimisation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369

* vmalert: add test to cover multiple metrics in one response
2021-06-11 11:22:05 +03:00
Aliaksandr Valialkin
69b1482bdb lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard 2021-06-11 10:57:23 +03:00
Aliaksandr Valialkin
044ab46824 lib/storage: reduce the amounts of memory which can be occupied by rawRow items during data ingestion on a system with many CPU cores 2021-06-11 10:57:23 +03:00
John Belmonte
67b17cdd68 spelling fix: synonym (#1363) 2021-06-10 08:32:52 +03:00
Aliaksandr Valialkin
b4008c1e65 vendor: make vendor-update 2021-06-09 20:40:50 +03:00
Aliaksandr Valialkin
b83a51366e docs/CHANGELOG.md: cut v1.61.0 2021-06-09 19:04:44 +03:00
Aliaksandr Valialkin
10d105ee25 docs/FAQ.md: add a chapter comparing VictoriaMetrics to QuestDB 2021-06-09 19:03:23 +03:00
Aliaksandr Valialkin
d0dca62026 app/vmselect/promql: typo fix in the comment 2021-06-09 18:34:31 +03:00
Aliaksandr Valialkin
46d2c7d640 docs/Articles.md: update the broken link to https://nordicapis.com/api-monitoring-with-prometheus-grafana-alertmanager-and-victoriametrics/ 2021-06-09 16:39:56 +03:00
Aliaksandr Valialkin
2422b5091f app/vmauth: improve readability for a config with multiple src_paths 2021-06-09 15:39:32 +03:00
Aliaksandr Valialkin
329b6cd146 docs/CHANGELOG.md: document the enterprise bugfix for the target property in Graphite Render API 2021-06-09 13:50:52 +03:00
Aliaksandr Valialkin
db1c548cb4 docs/CHANGELOG.md: document improvements in re-routing handling in vminsert
See the following commits:

* 1c09e71f5b
* 0d067eb112
* 2c6b917749
2021-06-09 13:40:37 +03:00
Aliaksandr Valialkin
f93a31b490 docs/vmagent.md: mention that vmagent supports scrape targets sharding 2021-06-09 12:29:34 +03:00
Aliaksandr Valialkin
ab15bf8c90 docs: document rules replay feature for vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

This is a follow-up for 2a259ef5e7
2021-06-09 12:27:34 +03:00
Roman Khavronenko
2a259ef5e7 vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:20:38 +03:00
dependabot[bot]
408fc90b40 build(deps): bump codecov/codecov-action from 1.5.0 to 1.5.2 (#1362)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 1.5.0 to 1.5.2.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v1.5.0...v1.5.2)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-09 12:16:46 +03:00
Roman Khavronenko
5e9f3777bf alerts: add new alert LabelsLimitExceededOnIngestion (#1359) 2021-06-09 12:15:36 +03:00
Aliaksandr Valialkin
c16edf8287 docs/CHANGELOG.md: document the bugfix, which prevents panics for aborted http requests in vmauth
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353

This is a follow-up for 6b29b955c0
2021-06-09 12:11:53 +03:00
Nikolay
6b29b955c0 disables panic for net/httpAbortHandler (#1355) 2021-06-09 12:08:58 +03:00
Aliaksandr Valialkin
28c44ef065 deployment/docker/docker-compose.yml: update Grafana from v7.5.2 to v8.0.0
See https://github.com/grafana/grafana/releases/tag/v8.0.0
2021-06-09 02:25:24 +03:00
k1rk
668165f53d rename serviceHealth group name to vm-health (#1360)
this causes conflicts in `victoria-metrics-k8s-stack` chart =)
2021-06-08 23:34:38 +03:00
Aliaksandr Valialkin
c0feafefc8 vendor: make vendor-update 2021-06-08 15:49:30 +03:00
Aliaksandr Valialkin
8a7e6ad5cc deployment/docker: update Go builder from v1.16.4 to v1.16.5
See the fixed isses at https://github.com/golang/go/issues?q=milestone%3AGo1.16.5+label%3ACherryPickApproved
2021-06-08 15:45:44 +03:00
Aliaksandr Valialkin
645e18dd88 vendor: update github.com/klauspost/compress from v1.12.3 to v1.13.0 2021-06-08 15:45:43 +03:00
Aliaksandr Valialkin
96b691a0ab lib/storage: properly account the number of loops spent when matching for or suffixes
This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-08 13:06:12 +03:00
Aliaksandr Valialkin
661d2668f8 lib/promrelabel: add tests for labelsToString() function 2021-06-04 20:42:46 +03:00
Aliaksandr Valialkin
78f83dc5ad app/{vmagent,vminsert}: follow-up after 2fe045e2a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1343
2021-06-04 20:27:58 +03:00
jelmd
2fe045e2a4 new feature: debug relabeling (#1344)
* new feature: relabel logging

Use scrape_configs[x].relabel_debug = true to log metric names inkl.
labels before and after relabeling. After relabeling related metrics
get dropped, i.e. not submitted to servers.

* vminsert wants relabel logging, too.
2021-06-04 17:50:23 +03:00
Aliaksandr Valialkin
5ac25d2585 docs/CHANGELOG.md: document the bugfix from 6f19bb23a1 2021-06-04 11:53:00 +03:00
Hason Chan
6f19bb23a1 fix eureka_sd_configs HTTPClientConfig incorrect parsing (#1350) 2021-06-04 11:47:17 +03:00
Aliaksandr Valialkin
f963f04d3d app/vminsert: add -disableRerouting command-line flag for disabling re-routing if some vmstorage nodes have lower performance than the others 2021-06-04 04:42:01 +03:00
Aliaksandr Valialkin
2d8bd41f8a lib/storage: reduce memory allocations when syncing dateMetricIDCache 2021-06-03 16:20:42 +03:00
Aliaksandr Valialkin
d2d746c4fc docs/CHANGELOG.md: document that it is possible to build VictoriaMetrics components for Solaris
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1322

This is a follow-up for ddc8022702
2021-05-31 09:33:29 +03:00
Nikolay
ddc8022702 fixes solaris build (#1345) 2021-05-31 09:21:23 +03:00
Aliaksandr Valialkin
de8e4b6223 vendor: update github.com/VictoriaMetrics/fastcache from v1.5.8 to v1.6.0 2021-05-31 09:14:25 +03:00
Aliaksandr Valialkin
b22e380a34 app/vmauth: allow balancing the load among multiple backend nodes by specifying multiple urls in url_prefix config 2021-05-29 01:03:37 +03:00
Aliaksandr Valialkin
ffa4cd6fa5 docs/Articles.md: add a link to https://www.percona.com/blog/2021/05/26/compiling-a-percona-monitoring-and-management-v2-client-in-arm-raspberry-pi-3/ 2021-05-28 14:34:31 +03:00
Aliaksandr Valialkin
9315e3ded3 vendor: make vendor-update 2021-05-28 13:18:37 +03:00
Aliaksandr Valialkin
cecbdee00d docs/MetricsQL.md: add a link to technical details about rate() and increase() calculations in Prometheus and VictoriaMetrics 2021-05-28 13:13:50 +03:00
Aliaksandr Valialkin
634eeac804 docs/Single-server-VictoriaMetrics.md: remove misleading wording about querying Graphite metrics with MetricsQL 2021-05-28 02:39:14 +03:00
Aliaksandr Valialkin
a52a20659a lib/promscrape: fix tests after f0c21b6300 2021-05-28 01:32:50 +03:00
Aliaksandr Valialkin
d088923aef Revert "lib/mergeset: remove a pool for inmemoryBlock structs"
This reverts commit 793fe39921.

Reason to revert: production testing revealed possible slowdown when registering big number of new time series
2021-05-28 01:09:32 +03:00
Aliaksandr Valialkin
793fe39921 lib/mergeset: remove a pool for inmemoryBlock structs
The pool for inmemoryBlock struct doesn't give any performance gains in production workloads,
while it may result in excess memory usage for inmemoryBlock structs inside the pool during
background merge of indexdb.
2021-05-27 21:57:33 +03:00
Aliaksandr Valialkin
60341722d5 docs: document f0c21b6300 2021-05-27 15:03:30 +03:00
faceair
f0c21b6300 lib/promscrape: apply body size & sample limit to stream parse (#1331)
* lib/promscrape: apply body size limit to stream parse

Signed-off-by: faceair <git@faceair.me>

* lib/promscrape: apply sample limit to stream parse

Signed-off-by: faceair <git@faceair.me>
2021-05-27 14:52:44 +03:00
Aliaksandr Valialkin
eda6ee40b6 docs: make docs-sync after 2bbb1cc7c1 2021-05-26 12:32:16 +03:00
Roman Khavronenko
2bbb1cc7c1 Docs review (#1330)
* re-order components by prioritizing Cluster-VictoriaMetrics.md

* drop Home.md since it just duplicates other links
2021-05-26 12:28:58 +03:00
Roman Khavronenko
7ecaa2fe2c update the issue template (#1329)
The main changes are:
* ask for Grafana's dashboard screenshots;
* ask only for non-default cmd-line flags;
* explicitly ask about logs;
2021-05-26 12:26:45 +03:00
Aliaksandr Valialkin
7f531e3a60 docs/CHANGELOG.md: document changes from 2233d6ed8a and d210958fd0 2021-05-26 12:23:22 +03:00
Roman Khavronenko
d210958fd0 vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-25 14:27:22 +01:00
Aliaksandr Valialkin
2233d6ed8a lib/uint64set: store pointers to bucket16 instead of bucket16 objects in bucket32
This speeds up bucket32.addBucketAtPos() when bucket32.buckets contains big number of items,
since the copying of bucket16 pointers is much faster than the copying of bucket16 objects.

This is a cpu profile for copying bucket16 objects:

      10ms     13.43s (flat, cum) 32.01% of Total
      10ms      120ms    650:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    651:	b.b16his[pos] = hi
         .     13.31s    652:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .          .    653:	b16 := &b.buckets[pos]
         .          .    654:	*b16 = bucket16{}
         .          .    655:	return b16
         .          .    656:}

This is a cpu profile for copying pointers to bucket16:

      10ms      1.14s (flat, cum)  2.19% of Total
         .      100ms    647:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    648:	b.b16his[pos] = hi
      10ms      700ms    649:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .      330ms    650:	b16 := &bucket16{}
         .          .    651:	b.buckets[pos] = b16
         .          .    652:	return b16
         .          .    653:}
2021-05-25 14:20:52 +03:00
Aliaksandr Valialkin
fd264477bf snap: update Go builder from Go 1.15 to Go 1.16 2021-05-25 12:12:37 +03:00
Dan Fredell
c78d7dde92 Fix quote difference on label_move example (#1321)
Fix quote difference on label_move example
2021-05-24 23:20:38 +03:00
Aliaksandr Valialkin
08234aa7a0 docs/CHANGELOG.md: cut v1.60.0 2021-05-24 15:55:08 +03:00
Aliaksandr Valialkin
a47d4927d2 docs/Single-server-VictoriaMetrics.md: clarify that the storage size depends on the number of samples per series 2021-05-24 15:47:44 +03:00
Aliaksandr Valialkin
b1e8d92577 docs/vmalert.md: sync with app/vmalert/README.md via make docs-sync 2021-05-24 15:47:20 +03:00
Aliaksandr Valialkin
8e7d1f8824 app/vmagent/remotewrite: use WARN level instead of ERROR level for couldnt send a block with size ... bytes to ... log message
This is really warning, since vmagent re-tries sending the data block until success.
2021-05-24 15:43:59 +03:00
Aliaksandr Valialkin
39ef1e7a51 lib/storage: do not stop data ingestion on the first error in Storage.AddRows
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:47 +03:00
Aliaksandr Valialkin
4b01c9fb2e lib/storage: limit the number of rows per each block in Storage.AddRows()
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:24:07 +03:00
Aliaksandr Valialkin
a4ff4b8e65 lib/storage: allow filling all the rows up to their capacity in rawRowsShard.addRows
This should reduce memory usage a bit on data ingestion path
2021-05-24 15:22:59 +03:00
Aliaksandr Valialkin
a46551245c lib/bloomfilter: fix TestLimiterConcurrent 2021-05-24 05:17:36 +03:00
Aliaksandr Valialkin
eafed5335d vendor: update github.com/valyala/gozstd from v1.10.0 to v1.11.0 2021-05-24 04:59:55 +03:00
Aliaksandr Valialkin
93d81b486d lib/fs: do not pass done callback to tryRemoveAll() func
This improves code readability a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-24 04:51:57 +03:00
Aliaksandr Valialkin
f54133b200 lib/storage: do not populate MetricID->MetricName cache during data ingestion
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.

This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin
24858820b5 docs/CHANGELOG.md: small typo fix 2021-05-23 14:15:01 +03:00
Aliaksandr Valialkin
04eb37a590 docs/CHANGELOG.md: document the addition of extra_filter_labels at 84cc0513e1 2021-05-23 14:10:33 +03:00
Aliaksandr Valialkin
ec79abc382 lib/{mergeset,storage}: reduce the number of IFNO log messages like merged ... items across ... blocks in ... seconds
Log these messages if the merge takes more than 30 seconds instead of 10 seconds.
2021-05-23 14:03:21 +03:00
Roman Khavronenko
84cc0513e1 vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 00:26:01 +03:00
Aliaksandr Valialkin
78dddfb98f lib/promauth: follow-up after 5b8176c68e 2021-05-22 18:01:11 +03:00
Nikolay
5b8176c68e basic OAuth2 support for remoteWrite and scrape targets (#1316)
* adds OAuth2 support for remoteWrite and scrapping

* adds tests
changes init
2021-05-22 16:20:18 +03:00
Aliaksandr Valialkin
e05dd475f0 lib/fs: concurrently remove up to 1024 blocked NFS directories
Previously the blocked directories were removed sequentially by a single goroutine.
This can be not enough for highly loaded VictoriaMetrics that accepts millions of sample per second,
when big number of LSM parts are created and removed at high rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:57:46 +03:00
Aliaksandr Valialkin
8e2985b53d lib/fs: wait for a while before giving up on NFS file removal if the removal queue is full
This should reduce the probability of the panic on a highly loaded VictoriaMetrics
accepting millions of samples per second.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:21:00 +03:00
Aliaksandr Valialkin
d173a9348c docs/MetricsQL.md: add a link to a list of supported timezones that can be passed to timezone_offset() function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306
2021-05-21 16:55:24 +03:00
Aliaksandr Valialkin
14ec2b9f26 docs/CHANGELOG.md: mention the bugfix from d626c5c2a9
Updates https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 16:37:14 +03:00
Aliaksandr Valialkin
c54bb73867 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:34:06 +03:00
Nikolay
d626c5c2a9 changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 13:55:43 +03:00
Aliaksandr Valialkin
060664e221 docs/FAQ.md: re-order questions to be more attractive to visitors 2021-05-20 19:49:34 +03:00
Aliaksandr Valialkin
49ecbc765d app/vmauth: add ability to protect /-/reload endpoint with authKey 2021-05-20 18:47:01 +03:00
Aliaksandr Valialkin
362a49bdd1 vendor: make vendor-update 2021-05-20 18:46:26 +03:00
Aliaksandr Valialkin
4c7bb75fa2 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:27:10 +03:00
Aliaksandr Valialkin
0d3e78b9ee docs/CHANGELOG.md: move tip to proper place 2021-05-20 17:56:03 +03:00
Aliaksandr Valialkin
480087944a docs/FAQ.md: add a question on how to run VictoriaMetrics on FreeBSD
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:16:35 +03:00
Aliaksandr Valialkin
49c39ab388 docs/FAQ.md: add can I use VictoriaMetrics instead of Prometheus?
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:10:36 +03:00
Aliaksandr Valialkin
6cc6a032cd docs/FAQ.md: add a question about memory limits for VictoriaMetrics components
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:09:41 +03:00
Aliaksandr Valialkin
d06aae9454 docs/FAQ.md: add a question about multi-tenancy
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 15:52:40 +03:00
Aliaksandr Valialkin
e394ff6466 app/vmagent/remotewrite: expose metrics with the current number of active series per day and per hour
These numbers are exposed via the following metrics:

- vmagent_hourly_series_limit_current_series
- vmagent_daily_series_limit_current_series

Expose also the limits via the following metrics:

- vmagent_hourly_series_limit_max_series
- vmagent_daily_series_limit_max_series
2021-05-20 15:28:09 +03:00
Aliaksandr Valialkin
ad73f226ff app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin
7e526effaa app/vmagent: add ability to limit series cardinality on a per-hour and per-day basis 2021-05-20 13:13:40 +03:00
Aliaksandr Valialkin
3cd8606abd docs/CHANGELOG.md: document the bugfix in vmctl import for InfluxDB lines with identical names for field and tag
See dcf8803bbd

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299
2021-05-20 12:06:16 +03:00
Aliaksandr Valialkin
98e425ee09 docs/CHANGELOG.md: refer to the issue related to timezone_offset() function 2021-05-20 12:03:39 +03:00
Roman Khavronenko
dcf8803bbd vmctl: explicitly set ::tag type for labels selector in influx mode (#1310)
The `::tag` type is needed in cases when field and tag names are equal, which
results into unexpected results in InfluxQL. Setting the type explicitly helps
InfluxDB to understand which exact column we apply filter to.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299
2021-05-20 12:03:16 +03:00
Aliaksandr Valialkin
22585531ad lib/promscrape/discovery/kubernetes: make golangci-lint happy by removing empty branches 2021-05-20 12:00:29 +03:00
Aliaksandr Valialkin
0842bb9294 app/vmselect/promql: add timezone_offset(tz) function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306
2021-05-20 11:53:09 +03:00
Aliaksandr Valialkin
009e136d88 lib/storage: remove possible data race when logging dropped labels 2021-05-20 02:47:22 +03:00
Aliaksandr Valialkin
f7e73fbe8b app/vmagent/remotewrite: sort labels before sending the series to per-remoteWrite.url queues 2021-05-20 02:12:36 +03:00
Aliaksandr Valialkin
eb8093ca6b lib/promscrape/discovery/kubernetes: reload objects on object parse error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 23:25:48 +03:00
Neo He
e8a6c6927d app/{vmbackup,vmrestore},docs/vmrestore.md: typo fix: vbackup -> vmbackup (#1305) 2021-05-18 16:37:28 +03:00
Aliaksandr Valialkin
f4719889da lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:26:16 +03:00
Aliaksandr Valialkin
b30925738b docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:26:14 +03:00
Aliaksandr Valialkin
0cf1fe19e6 lib/promscrape/discovery/kubernetes: simplify the reload logic for urlWatcher.objectsByKey 2021-05-18 15:40:46 +03:00
Aliaksandr Valialkin
bfb61de606 lib/promscrape/discovery/kubernetes: properly update vm_promscrape_discovery_kubernetes_scrape_works metric
Previously it wasn't descreased during config update.
2021-05-18 15:33:45 +03:00
Aliaksandr Valialkin
3166994244 lib/promscrape/discovery/kubernetes: log errors and stop service discovery when unexpected updates are received from Kubernetes API server
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 15:10:48 +03:00
Aliaksandr Valialkin
66aba00549 app/vmauth: reload -auth.config on the request to /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1194
2021-05-18 02:23:55 +03:00
Aliaksandr Valialkin
3339ea41e7 docs/vmbackup.md: typo fix: snaphosts -> snapshots
Thanks to @jelmd - see 1ab27582a3 (r50884395)
2021-05-18 01:12:36 +03:00
Aliaksandr Valialkin
6c944b86d8 docs: dealay -> delay
Thanks to @jelmd . See 0b7e3510c8 (r50884991)
2021-05-18 01:07:52 +03:00
Aliaksandr Valialkin
ec6a284978 lib/promrelabel: add tests for conditional removal of label on another label match
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1294
2021-05-18 00:23:03 +03:00
Aliaksandr Valialkin
2eed410466 lib/promscrape/discovery/kubernetes: key ScrapeWork objects by urlWatcher instead of namespace
This makes the code less fragile if urlWatcher would depend on additional to namepsace properties.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-17 23:47:08 +03:00
Aliaksandr Valialkin
ede2ba5a45 docs/CHANGELOG.md: document b38edec7ee
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-17 01:57:34 +03:00
Aliaksandr Valialkin
55ed77760b vendor: make vendor-update 2021-05-17 01:47:33 +03:00
Aliaksandr Valialkin
2ec6f81b10 vendor: update github.com/valyala/gozstd from v1.9.0 to v1.10.0 2021-05-17 01:47:33 +03:00
Roman Khavronenko
b96b19f040 Docs update from victoriaMetrics.github.io (#1302)
* port change from 11ca65677b

* port change from afb41dfa43

* port change from f82e3733c9

* port change from d499ab0502
2021-05-16 18:43:09 +01:00
Roman Khavronenko
b38edec7ee vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-15 13:25:57 +03:00
Aliaksandr Valialkin
733706e6c6 lib/promscrape: reload auth tokens from files every second
Previously auth tokens were loaded at startup and couldn't be updated without vmagent restart.
Now there is no need in vmagent restart.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297
2021-05-14 20:00:08 +03:00
Aliaksandr Valialkin
a15145e597 docs/Articles.md: add a link to https://fly.io/blog/measuring-fly/ 2021-05-14 19:47:43 +03:00
Aliaksandr Valialkin
24cbf8b66c docs/vmauth.md: sync with app/vmauth/README.md after 10a47af631 2021-05-14 18:13:10 +03:00
Aliaksandr Valialkin
10a47af631 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:12:24 +03:00
Aliaksandr Valialkin
e6ec442e96 docs/Single-server-VictoriaMetrics.md: document how to reduce memory usage when importing too long JSON lines into VictoriaMetrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1295
2021-05-14 17:22:08 +03:00
Denys Holius
70165a6758 Fix Cortex typo 2021-05-13 16:26:05 +03:00
Aliaksandr Valialkin
a422165dc6 app/vmagent/remotewrite: clarify the comment explaining why vmagent drops blocks if remote storage returns 400 or 409 status code 2021-05-13 16:16:16 +03:00
Aliaksandr Valialkin
fc3519fa26 lib/promscrape: limit scrape_timeout by scrape_interval like Prometheus does 2021-05-13 16:09:45 +03:00
Aliaksandr Valialkin
1f75ae6006 docs/CHANGELOG.md: document the bugfix from b4f5be8bd8 2021-05-13 11:18:39 +03:00
匠心零度
b4f5be8bd8 fix vagent imbalance problem (#1292)
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2021-05-13 11:14:51 +03:00
Aliaksandr Valialkin
e6fda03e8f vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.14 to v1.0.15
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:44:14 +03:00
Aliaksandr Valialkin
f6a641de62 lib/promscrape: exponentially increase retry interval on unsuccesful requests to scrape targets or to service discovery services
This should reduce CPU load at vmagent and at remote side when the remote side doesn't accept HTTP requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:38:50 +03:00
Aliaksandr Valialkin
c0ec541559 lib/cgroup: document the ability to detect cgroup v2 memory and cpu limits. This is follow-up for b50024812e 2021-05-13 09:26:20 +03:00
Aliaksandr Valialkin
d7be2753c0 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:02:33 +03:00
Nikolay
b50024812e adds cgroupsv2 support (#1283)
* adds cgroupv2 limits support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269

* small fix

* changes Atoi to ParseUint
2021-05-13 09:02:13 +03:00
Aliaksandr Valialkin
a22a17dc66 lib/storage: merge getTSDBStatusForDate with getTSDBStatusWithFiltersForDate
These functions are non-trivial, while their code has minimal differences.
It is better from maintainability PoV to merge these functions into a single function.
2021-05-12 17:56:53 +03:00
Aliaksandr Valialkin
beddc0c0d5 docs/Single-server-VictoriaMetrics.md: typo fix: retuns->returns 2021-05-12 17:22:34 +03:00
Aliaksandr Valialkin
832651c6c2 app/vmselect: follow up after 8a0678678b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168
2021-05-12 17:18:30 +03:00
Nikolay
8a0678678b Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 15:18:45 +03:00
Aliaksandr Valialkin
cfd6aa28e1 lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices scrape targets every 5 seconds, since they may depend on changed service and pod objects
This should make endpoints and endpointslices scrape targets eventually consistent with the maximum delay of 5 seconds after the related service or pod object changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-12 14:10:34 +03:00
Aliaksandr Valialkin
afec68ad13 lib/httpserver: add new X-Server-Hostname header instead of overwriting already exsiting header
This makes possible tracking origins of chained requests over multiple hops.
2021-05-11 23:48:59 +03:00
Aliaksandr Valialkin
f2d5c4e2d0 lib/httpserver: return X-Server-Hostname http header in all the responses for better debuggability 2021-05-11 22:03:48 +03:00
Aliaksandr Valialkin
33f7bacb01 lib/storage: properly apply time range when matching an empty filter
It must match all the time series on the given time range.
Previously it was matched to all the time series without the restriction on the given time range.
2021-05-11 01:12:05 +03:00
Aliaksandr Valialkin
3fb3ce2a6d Revert ".github/dependabot.yml: remove automated dependency version checks"
This reverts commit 5b986c95dd.

This check verifies only dependencies needed for github-actions. This is OK.
2021-05-10 12:05:09 +03:00
Aliaksandr Valialkin
8b65920a8b Add make check-licenses rule for the ability to manually check licenses in vendored dependencies
This is a follow-up for c687536956
2021-05-10 11:54:43 +03:00
Aliaksandr Valialkin
5b986c95dd .github/dependabot.yml: remove automated dependency version checks
Dependency updates must be under manual control, since the resulting code diffs must be reviewed manually for the sake of security.
It is done with `make vendor-update` now.
2021-05-10 11:41:23 +03:00
Artem Navoiev
c687536956 Add vendor license checker, update codecov action, add dependbot for … (#1280)
* Add vendor license checker, update codecov action, add dependbot for github actions

* update gitingore, temprorary turn on check

* fix action name

* change action rules to trigger only when vendor changes

* remove obsolete line from main action
2021-05-10 11:38:56 +03:00
Aliaksandr Valialkin
229d9d6dd7 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:26 +03:00
Roman Khavronenko
5c448126dc vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:11:45 +03:00
Aliaksandr Valialkin
3b0966c00c docs/CHANGELOG.md: document vmalert fix for state restoration on startup 2021-05-10 11:10:04 +03:00
Roman Khavronenko
4247168a2d vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:06:31 +03:00
Aliaksandr Valialkin
6ff19096be docs/vmagent.md: add stream parsing mode chapter 2021-05-08 23:14:07 +03:00
Aliaksandr Valialkin
cbd0569ce2 docs/CHANGELOG.md: mention the comment, which gives an example of multi-level vminsert setup 2021-05-08 22:57:04 +03:00
Aliaksandr Valialkin
120927ab65 vendor: make vendor-update 2021-05-08 20:21:57 +03:00
Aliaksandr Valialkin
75c2c813fc lib/storage: remove dead code after the commit 3ccf7ea20c 2021-05-08 20:14:11 +03:00
Aliaksandr Valialkin
f8d50e9641 deployment/dm: update Go builder from v1.16.3 to v1.16.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.4+label%3ACherryPickApproved for details
2021-05-08 20:04:05 +03:00
Aliaksandr Valialkin
7aea5f58c4 lib/ingestserver: properly close incoming connections during graceful shutdown 2021-05-08 19:52:58 +03:00
Aliaksandr Valialkin
12d733dd5d app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:52:57 +03:00
Aliaksandr Valialkin
237e9f9fd7 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 15:59:35 +03:00
Aliaksandr Valialkin
d6439490f5 docs/Single-server-VictoriaMetrics.md: add links to vmauth and vmgateway as auth proxy examples 2021-05-07 10:46:05 +03:00
Aliaksandr Valialkin
79ec35cef9 app/vmbackup: make sure that -snapshotName isnt set if -snapshot.createURL is set 2021-05-07 08:44:25 +03:00
Aliaksandr Valialkin
d9e3872b1c docs/CHANGELOG.md: document 904bbffc7f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 20:33:05 +03:00
Aliaksandr Valialkin
4bea1afc6d docs/CHANGELOG.md: document 9cdd4696fe
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 20:28:58 +03:00
Aliaksandr Valialkin
904bbffc7f lib/promscrape/discovery/kubernetes: start watchers for pods and services before starting watchers for endpoints
This should eliminate possible race when an update on endpoints depends on pods and/or services, which are missing in the cache yet.
This could result in missing targets based on endpoints or endpointslices.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 12:23:50 +03:00
Roman Khavronenko
9cdd4696fe vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 10:07:19 +03:00
Denis Fondras
cdd1b06473 Update to OpenBSD 6.9 and VictoriaMetrics 1.59 (#1263)
* Update to OpenBSD 6.9 and VictoriaMetrics 1.58

While at it, also build vmagent

* Bump to v1.59 and remove zstd dependency
2021-05-03 14:14:13 +03:00
Aliaksandr Valialkin
76027093ca lib/storage: use WARNING instead of INFO level for logging dropped labels 2021-05-03 13:56:43 +03:00
Aliaksandr Valialkin
ce9e163e94 lib/httpserver: stop the process on panics in request handlers
Panics may leave the process in inconsistent state. That's why it is better to stop the process after the panic
instead of recovering from the panic. Unfortunately, the standard net/http.Server recovers panics in request handlers.
See https://github.com/golang/go/issues/16542 . That's lib/httpserver must stop the process on itself after the panic.
2021-05-03 11:59:40 +03:00
Aliaksandr Valialkin
988c3b386f docs/CHANGELOG.md: document the bugfix for proper removal of stale parts (477369b62f) 2021-05-03 11:38:15 +03:00
Nikolay
477369b62f adds stalePartsRemover (#1261)
for new created partitions
2021-05-03 11:34:00 +03:00
Aliaksandr Valialkin
9a930720b3 lib/promrelabel: add tests for removing the specified {label="value"} pair 2021-05-03 11:26:53 +03:00
Aliaksandr Valialkin
0969b446b3 deployment/docker: update base docker image from alpine:3.13.2 to alpine:3.13.5 2021-05-01 10:50:09 +03:00
Aliaksandr Valialkin
fb097ff774 docs/CHANGELOG.md: cut v1.59.0 2021-05-01 09:39:48 +03:00
Aliaksandr Valialkin
4e2ea844ca vendor: make vendor-update 2021-05-01 09:39:26 +03:00
Aliaksandr Valialkin
58d1e6eeea lib/storage: log dropped labels if the number of labels in a metric exceeds -maxLabelsPerTimeseries command-line flag value
This should improve debuggability for this case.
2021-05-01 09:27:55 +03:00
Aliaksandr Valialkin
508f500477 docs/vmalert.md: update docs after afca7b430c 2021-04-30 11:49:01 +03:00
Aliaksandr Valialkin
86bdb5ea95 docs/Cluster-VictoriaMetrics.md: document api/v1/series/count endpoint 2021-04-30 11:46:14 +03:00
Roman Khavronenko
0f988e5a31 vmalert: fix the typo in ApplyParams func (#1259) 2021-04-30 10:31:45 +03:00
Roman Khavronenko
afca7b430c vmalert: use rule's evaluationInterval as step param by default (#1258)
User still can override param by specifying `datasource.queryStep` flag.
2021-04-30 10:01:05 +03:00
Aliaksandr Valialkin
4394dc6cbb docs/CHANGELOG.md: document the change from f3a048288e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
2021-04-30 09:54:41 +03:00
Roman Khavronenko
f3a048288e Vmalert: adjust time param for datasource queries according to evaluationInterval (#1257)
* Simplify arguments list for fn `queryDataSource` to improve readbility

* vmalert: adjust `time` param according to rule evaluation interval

With this change, vmalert will start to use rule's evaluation interval
for truncating the `time` param. This is mostly needed to produce consistent
time series with timestamps unaffected by vmalert start time. Now, timestamp
becomes predictable.
Additionally, adjustment is similar to what Grafana does for plotting range graphs.
Hence, recording rule series and recording rule expression plotted in grafana
suppose to become similar in most of cases.
2021-04-30 09:46:03 +03:00
Aliaksandr Valialkin
6fa5981e68 app/vmagent: list user-visible endpoints at http://vmagent:8429/
While at it, use common WriteAPIHelp function for the listing in vmagent, vmalert and victoria-metrics
2021-04-30 09:36:43 +03:00
Aliaksandr Valialkin
2ab1266593 lib/promscrape/discovery/kubernetes: remove a mutex at urlWatcher - use groupWatcher mutex for accessing all the urlWatcher children
This simplifies the code a bit and reduces the probability of improper mutex handling and deadlocks.
2021-04-29 10:14:26 +03:00
Aliaksandr Valialkin
25b8d71df5 vendor: update github.com/klauspost/compress from v1.12.1 to v1.12.2 2021-04-29 10:12:50 +03:00
Nikolay
4e5a88114a vmagent kubernetes_sd tests (#1253)
* first part of tests for kubernetes sd

* makes linter happy

* added more test cases

* adds pub/sub for tests
2021-04-29 10:10:24 +03:00
Nikolay
15609ee447 changes vmalert Querier with per rule querier (#1249)
* changes vmalert Querier with per rule querier
it allows to changes some parametrs based on rule setting
for instance - alert type, tenant for cluster version or event endpoint url.
2021-04-28 21:41:15 +01:00
Aliaksandr Valialkin
87179c6839 lib/{storage,mergeset}: fix unaligned 64-bit atomic operation panic for 32-bit architectures
The panic has been introduced in 56b6b893ce
2021-04-27 16:41:32 +03:00
Aliaksandr Valialkin
56b6b893ce lib/mergeset: split rows ingestion among multiple shards
This improves rows ingestion on systems with many CPU cores by reducing lock contention.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244

Thanks to @waldoweng for the original idea and draft implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1243
2021-04-27 15:36:34 +03:00
Aliaksandr Valialkin
2ec7d8b384 lib/promscrape/discovery/kubernetes: fix a deadlock introduced in eddba29664
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240

Thanks to @f41gh7 for providing the initial idea for deadlock fix at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1248
2021-04-27 14:57:51 +03:00
Aliaksandr Valialkin
f89c1f7f49 lib/storage: typo fix in info message when deleting the part outside the configured retention
Previously the message was displaying incorrect retention time
2021-04-27 13:32:46 +03:00
Aliaksandr Valialkin
60947fb2d5 lib/persistentqueue: eliminate possible data race when obtaining vm_persistentqueue_bytes_pending metric value 2021-04-27 00:25:52 +03:00
Roman Khavronenko
87018650dd vmalert: keep the returned timestamp when persisting recording rule (#1245)
Previously, vmalert used `lastExecTime` timestamp when writing recording rules
to the remote storage. This may be incorrect, if vmalert uses `datasource.lookback` flag,
which means rule's expression will be executed at some moment in the past.
To avoid such situations, vmalert now will use returned timestamp instead of `lastExecTime`.
2021-04-26 01:03:22 +03:00
Roman Khavronenko
d6f44977a7 docs: update per tenant stats page (#1246) 2021-04-25 09:34:28 +01:00
Aliaksandr Valialkin
00deb69e28 docs: ordering fix 2021-04-24 02:28:53 +03:00
Aliaksandr Valialkin
bf2439f962 vendor: make vendor-update 2021-04-24 01:34:07 +03:00
Aliaksandr Valialkin
20b77abc17 docs: update docs order 2021-04-24 01:27:13 +03:00
Aliaksandr Valialkin
e9898e1772 docs/Single-server-VictoriaMetrics.md: mention that the native export format can change between releases
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1203
2021-04-24 01:22:54 +03:00
Aliaksandr Valialkin
52b4b1605e app/vmagent/remotewrite: increase the maximum possible number of inmemory blocks for systems with high amounts of RAM
This should reduce the probability of using much slower file-based persistent queue
when vmagent processes metrics at high rate (millions of metrics per second).

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1235
2021-04-23 22:03:51 +03:00
Aliaksandr Valialkin
fb37b853e9 app/vmagent/remotewrite: count maxLabelsPerBlock as 10x of maxRowsPerBlock
This should increase block sizes and subsequently increase the maximum possible bandwidth per each connection to remote storage.
This, in turn, should reduce the probability of storing the data in local buffers.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1235
2021-04-23 21:55:47 +03:00
Aliaksandr Valialkin
908e35affd lib/promscrape: apply scrape_timeout on receiving the first response byte for stream_parse: true scrape targets
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017#issuecomment-767235047
2021-04-23 21:53:35 +03:00
Aliaksandr Valialkin
eddba29664 lib/promscrape/discovery/kubernetes: refresh role: endpoints targets on service object removal as Prometheus does
This is a follow-up for ae37cfd528

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:26:57 +03:00
Aliaksandr Valialkin
ae37cfd528 lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices targets on service object update like Prometheus does
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-04-23 20:11:40 +03:00
Aliaksandr Valialkin
f7d7236b44 Makefile: remove trailing whitespace 2021-04-23 11:05:57 +03:00
Aliaksandr Valialkin
478c56f281 docs/Cluster-VictoriaMetrics.md: sync with cluster branch 2021-04-22 14:49:16 +03:00
Aliaksandr Valialkin
758de04440 docs: add missing images for vmbackupmanager.md 2021-04-22 14:49:14 +03:00
Aliaksandr Valialkin
4c4c383f30 docs/vmbackupmanager.md: sync with app/vmbackupmanager/README.md 2021-04-22 14:44:07 +03:00
Aliaksandr Valialkin
bbebdf9ba1 lib/{storage,mergeset}: remove empty directories on startup. Such directories can be left after unclean shutdown on NFS storage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1142
2021-04-22 13:02:44 +03:00
Aliaksandr Valialkin
1ab27582a3 docs/vmbackup.md: sync with app/vmbackup/README.md 2021-04-22 11:18:51 +03:00
Aliaksandr Valialkin
1fbc04facd app/vmbackup: typo fix: snaphsot -> snapshot
Follow-up for 9de0fa3649
2021-04-22 11:16:56 +03:00
Denys Holius
9de0fa3649 fixed typo; added example of usage vmbackup in docker-compose (#1237) 2021-04-22 11:14:13 +03:00
Aliaksandr Valialkin
7c4e460513 app/vmauth: parse url_prefix only once during config load 2021-04-21 10:55:29 +03:00
Aliaksandr Valialkin
6bc52fe41a all: rename https://victoriametrics.github.io to https://docs.victoriametrics.com 2021-04-20 20:16:17 +03:00
Aliaksandr Valialkin
5720b283fa all: consistency renaming Victoria Metrics -> VictoriaMetrics
VMInsert -> vminsert
VMSelect -> vmselect
VMStorage -> vmstorage
2021-04-20 11:45:48 +03:00
Denys Holius
9f9c99c8c0 removed DS_Store files from VM_logo.zip (#1233) 2021-04-20 11:41:07 +03:00
Aliaksandr Valialkin
187e3ec909 app/vmauth: follow-up for 6a81a89b3d 2021-04-20 10:58:29 +03:00
Nikolay
6a81a89b3d adds query params support for vmauth urlPrefix (#1226)
* adds query params support for vmauth urlPrefix

* Update app/vmauth/example_config.yml

* Update app/vmauth/example_config.yml

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-20 10:51:03 +03:00
Roman Khavronenko
b955fe0038 dashboard: use unit short for Labels limit exceeded panel (#1227) 2021-04-19 13:33:21 +03:00
Aliaksandr Valialkin
cc94afeacc docs/Cluster-VictoriaMetrics.md: sync with cluster README.md 2021-04-19 13:31:55 +03:00
Roman Khavronenko
f80156d9df dashboard: fix avg GC duration expression (#1228)
Previous expression was not correct.
2021-04-19 13:28:41 +03:00
Aliaksandr Valialkin
90bba22c25 .github/workflows: remove CODECOV_TOKEN 2021-04-19 11:11:14 +03:00
Aliaksandr Valialkin
b3d88610fd app/vmctl: update README.md according to bfecd0fd55 2021-04-16 12:09:45 +03:00
Denys Holius
650c143f6d removed MACOSX dirrectory from VM_logo.zip (#1224) 2021-04-16 07:30:15 +01:00
Denys Holius
bfecd0fd55 Added some explanation to vmctl docs (#1217)
Added explanation that vmctl do not make snapshot from Prometheus when run migration data
2021-04-15 18:40:48 +01:00
dereksfoster99
cf726448f2 Update PerTenantStatistic.md 2021-04-14 20:40:53 +03:00
Aliaksandr Valialkin
8d244c5f7f vendor: update github.com/klauspost/compress from v1.11.13 to v1.12.1 2021-04-14 14:20:56 +03:00
Aliaksandr Valialkin
574b80f274 docs/Cluster-VictoriaMetrics.md: mention that sometimes -replicationFactor shouldnt be set at vmselect nodes
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207
2021-04-14 13:07:59 +03:00
ArtemVoitsekhovskyi
403a7b4a1f Grammar-correction (#1210) 2021-04-14 12:45:22 +03:00
ArtemVoitsekhovskyi
294c8b747b Grammar_correction (#1211) 2021-04-14 12:44:26 +03:00
Aliaksandr Valialkin
9b1999a5ff lib/uint64set: remove memory allocation in getOrCreateSmallPool()
This partially reverts fb82c4b9fa

It has been appeared that the additional memory allocation may result in higher GC pauses.
It is better to spend CPU time on copying bigger bucket16 structs instead of increasing query latencies due to higher GC pauses
2021-04-14 12:30:19 +03:00
f41gh7
e5db518c0f updates per-tenant stats docs
changes docs order
changes per-tenant-stats pic
2021-04-14 12:14:51 +03:00
Artem Navoiev
e9ee2122df [draft] per tenant statistic (#121)
* [draft] per tenant statistic

* updates metric name
update graph
adds link and example config

* quick fix

* adds grafana dashboard
adds example alert

Co-authored-by: f41gh7 <nik@victoriametrics.com>
2021-04-14 11:23:07 +03:00
Aliaksandr Valialkin
251747e253 lib/storage: code clarification: remove caching the found metricName in searchMetricName 2021-04-13 10:22:21 +03:00
Aliaksandr Valialkin
663a91bb82 docs: update -help output after the commit 77be3e3a82 2021-04-12 12:34:59 +03:00
Artem Navoiev
77be3e3a82 improve docs for cli flags (#1202)
* improve docs for cli flags

* improve docs for cli flags.2
2021-04-12 12:28:04 +03:00
Aliaksandr Valialkin
0b7e3510c8 docs: make docs-sync 2021-04-10 19:54:18 +03:00
Artem Navoiev
3ed113b390 update vmgateway.md (#1201) 2021-04-10 19:52:33 +03:00
Lapo Luchini
3d20fd20d0 Allow specifying build date from a variable (#1200)
Just like the existing infrastructure for `BUILDINFO_TAG`, this can ease the production of [reproducible builds](https://wiki.freebsd.org/ReproducibleBuilds).
(e.g. in FreeBSD the date the port was committed is used at build time, not the actual build time, so that an identical port produced at different times produces an identical executable)
2021-04-10 15:46:53 +03:00
Roman Khavronenko
7644efae01 Docs update (#1199)
* docs: drop table of contents for `vmctl`

We already have it autogenerated on .github.io, so no need to keep it.

* docs: mention OpenTSDB migration feature for vmctl

* docs: sync docs for `vmalert`
2021-04-10 15:46:08 +03:00
John Seekins
a9d76b06a7 Improve documentation on OpenTSDB migration tool and fix a bug with hard offsets (#1198)
* add more documentation on OpenTSDB migration explaining what chunking means
* more clarification of OpenTSDB aggregations
* break out what a retention string becomes
* add more docs around retention strings
* add example of running program and fix mistake in how hard offsets are handled
* fix formatting
2021-04-10 13:24:18 +01:00
John Seekins
cd9786c2a7 OpenTSDB migration to VictoriaMetrics (#1089) 2021-04-08 20:58:06 +01:00
Roman Khavronenko
162681e60d add new alerts (#1195)
* alerts: backport `DiskRunsOutOfSpace` alert and some other tweaks from cluster branch

* alerts: add `ServiceDown` alert to detect "dead" services
2021-04-08 18:24:25 +03:00
Roman Khavronenko
4ed8de62ac vmalert: document template functions and mention them in README (#1197) 2021-04-08 18:19:08 +03:00
Aliaksandr Valialkin
06e962f141 docs/Cluster-VictoriaMetrics.md: recommend setting up official alerts for VictoriaMetrics components 2021-04-08 12:15:24 +03:00
Aliaksandr Valialkin
fac1e3810a docs/Single-server-VictoriaMetrics.md: recommend using the official alerts for VictoriaMetrics in Monitoring section 2021-04-08 12:12:45 +03:00
Aliaksandr Valialkin
edd1590ac7 dashboards/victoriametrics.json: typo fix: chur rate -> churn rate 2021-04-08 09:35:50 +03:00
Aliaksandr Valialkin
3f0bcbe067 lib/promscrape: create a single swosFunc per scrape_config 2021-04-08 09:31:48 +03:00
Aliaksandr Valialkin
3eadee6cb7 vendor: make vendor-update 2021-04-08 00:58:22 +03:00
Aliaksandr Valialkin
f1a22b097a docs/CHANGELOG.md: cut v1.58.0 2021-04-08 00:48:16 +03:00
Aliaksandr Valialkin
544821b719 app/vmselect/promql: fix tests after d3fa0ccabd 2021-04-08 00:18:01 +03:00
Aliaksandr Valialkin
d3fa0ccabd app/vmselect/promql: properly detect aggregate topk* and bottomk* aggregate functions in order to disable duplicate sorting
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189
2021-04-08 00:09:40 +03:00
Aliaksandr Valialkin
cb12a8f0a8 app/vmselect: return data:null instead of data:[] from /api/v1/query_exemplars, since Grafana throws an error otherwise
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1186
2021-04-07 23:34:06 +03:00
Aliaksandr Valialkin
5a0938d807 lib/promscrape: do not spend CPU time on constructing scrapeWork key if clustering is disabled 2021-04-07 21:54:22 +03:00
Aliaksandr Valialkin
1177dca3da app/vmselect: do not sort series returned from topk* and bottomk* functions, since these series are already sorted in user-expected order
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189
2021-04-07 14:16:08 +03:00
Aliaksandr Valialkin
75f309fbbf docs/Cluster-VictoriaMetrics.md: follow-up after c6a8ebb11f 2021-04-07 13:47:46 +03:00
Roman Khavronenko
c6a8ebb11f docs: update docs ordering and formatting (#1192)
The major change is adding `sort` directive to docs. For those docs which are copied
from internal packages `sort` is added via makefile command. For the rest it is added
manually since they're updated manually as well.

The rest of changes is connected with markdown formatting. For example, changing headers
in some files (`##` => `#`) makes navigation on .github.io to look better. This especially
useful for `changelog` docs.

Table of contents for `vmctl` is dropped, since we already have it autogenerated on .github.io.

No link changes expected. The corresponding PR to `cluster` branch will be made in follow-up PR.
2021-04-07 13:39:16 +03:00
Aliaksandr Valialkin
df32d2836c lib/storage: properly handle big time ranges passed to /api/v1/labels and /api/v1/label/<labelName>/values
It should be faster querying all the labels and/or all the values instead of querying per-day labels/values on time ranges exceeding maxDaysForPerDaySearch
2021-04-07 13:33:46 +03:00
Aliaksandr Valialkin
59f9960992 lib/promscrape/discovery: remove superflouos check in registerPendingAPIWatchers
The check `_, ok := uw.aws[aw]; !ok` isn't needed, since aw cannot exist in uw.aws
because of the check inside subscribeAPIWatcher
2021-04-07 13:07:39 +03:00
Aliaksandr Valialkin
3ec6639bbb lib/promscrape/discovery/kubernetes: register pending apiWatchers in uw.aws 2021-04-06 11:12:13 +03:00
Aliaksandr Valialkin
7d23598b33 app/vmselect: return dumb response on /api/v1/query_exemplars request
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1186
2021-04-05 23:25:08 +03:00
Aliaksandr Valialkin
78d35d4f46 Makefile: prepare arm64 and amd64 release archives for cluster version on make release command 2021-04-05 23:03:19 +03:00
Aliaksandr Valialkin
5f593b0ed3 lib/promscrape/discovery/kubernetes: remove superflouos mustStart and mustStop functions 2021-04-05 22:44:12 +03:00
Aliaksandr Valialkin
4a0d06d1db deployment/docker/docker-compose.yml: update Grafana from v7.5.1 to v7.5.2 2021-04-05 22:30:58 +03:00
Lu Jiajing
b59164cf33 fix access to nil *url.URL (#1180)
* fix access to nil *url.URL

Signed-off-by: Megrez Lu <lujiajing1126@gmail.com>

* Update lib/promscrape/discovery/kubernetes/api_watcher.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-05 22:25:31 +03:00
Aliaksandr Valialkin
b46194472f lib/promscrape/discovery/kubernetes: reduce CPU time spent on registering big number of Kubernetes objects shared among big number of scrape jobs
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 22:04:30 +03:00
Aliaksandr Valialkin
a51d0ec6ec lib/promscrape/discovery/kubernetes: load objects missing in local cache from api seriver in getObjectByRole()
This should fix possible race for `role: endpoints` and `role: endpointslices` service discovery,
when the referred `pod` and `service` objects aren't propagated to urlWatcher cache yet.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182#issuecomment-813353359 for details.
2021-04-05 20:31:17 +03:00
Aliaksandr Valialkin
95dbebf512 lib/persistentqueue: delete corrupted persistent queue instead of throwing a fatal error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030
2021-04-05 19:26:11 +03:00
Aliaksandr Valialkin
f010d773d6 lib/promscrape/discovery/kubernetes: synchronously load Kubernetes objects on first access
Remove async registration of apiWatchers, since it breaks discovering `role: endpoints` and `role: endpointslices` targets,
which depend on pod and service objects.

There is no need in reloading `endpoints` and `endpointslices` targets if the referenced `pod` or `service` objects change,
since in this case the corresponding `endpoints` and `endpointslices` objects should also change because they contain
ResourceVersion of the referenced `pod` or `service` objects, which is modified on object update.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182
2021-04-05 14:20:12 +03:00
Aliaksandr Valialkin
2c25b0322b lib/proxy: typo fix after a5c5b54c22 2021-04-05 13:45:24 +03:00
Aliaksandr Valialkin
a5c5b54c22 lib/proxy: add support for socks5 over tls proxy 2021-04-05 13:00:05 +03:00
Aliaksandr Valialkin
6742839fd6 lib/promscrape: pass X-Prometheus-Scrape-Timeout-Seconds header to scrape targets as Prometheus does 2021-04-05 12:15:24 +03:00
Aliaksandr Valialkin
9ff3ecb991 docs/CHANGELOG.md: explain why -sortLabels is set to false by default 2021-04-04 01:59:25 +03:00
Aliaksandr Valialkin
9a97941e2a docs/vmagent.md: mention that vmagent supports scraping via socks5 proxy 2021-04-04 01:45:52 +03:00
Aliaksandr Valialkin
fc2240fb22 docs/CHANGELOG.md: document the ability to use socks5 proxy
Follow-up for a4c6a3b3e1

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177
2021-04-04 01:42:00 +03:00
Nikolay
a4c6a3b3e1 adds socks5 support for fasthttp client (#1178)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-04-04 01:29:54 +03:00
Aliaksandr Valialkin
500e625e8c lib/promscrape: properly send full url in GET request via simple HTTP proxy
This is a follow-up for a0ae0f86666a75ec57b45eab2429da7ab4a7b250

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 01:20:06 +03:00
Aliaksandr Valialkin
5153410ced lib/promscrape: support for simple HTTP proxies without CONNECT method support such as https://github.com/prometheus-community/PushProx
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-04 00:40:40 +03:00
Aliaksandr Valialkin
4c56b1a6dd lib/promscrape: add tests for authorization config, which has been added in df148f48b7 2021-04-03 22:13:22 +03:00
Aliaksandr Valialkin
7a0b964e8d app/vmselect/promql: do not delete dst_label if src_label is empty in label_copy(q, src_label, dst_label) and label_move(q, src_label, dst_label) 2021-04-03 22:05:06 +03:00
Aliaksandr Valialkin
6f3080f9fb docs/CHANGELOG.md: document the change from 3055ab0115 (add ability to pass "label=value" to the third argument to topk_* and bottomk_* functions 2021-04-03 21:42:08 +03:00
Aliaksandr Valialkin
e1d1708fa2 docs/Single-server-VictoriaMetrics.md: add missing link in heading to Graphite Render API usage chapter 2021-04-03 21:34:13 +03:00
Aliaksandr Valialkin
43f9842b6f lib/proxy: log response body on non-200 response code
This should improve debuggability for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179
2021-04-03 03:03:55 +03:00
Aliaksandr Valialkin
2256b79a89 docs/vmgateway.md: update docs 2021-04-03 00:29:19 +03:00
Aliaksandr Valialkin
b8d9d6c326 docs/CHANGELOG.md: yet another typo fix 2021-04-03 00:25:39 +03:00
Aliaksandr Valialkin
22949911e9 docs/CHANGELOG.md: typo fix 2021-04-03 00:23:50 +03:00
Aliaksandr Valialkin
3055ab0115 app/vmselect/promql: add ability to set label value additionally to label name for the remaining sum of time series returned from topk_* and bottomk_* functions in the form: topk_min(N, m, "label=value") 2021-04-02 23:55:54 +03:00
Aliaksandr Valialkin
25e19c75c7 docs/{vmauth,vmgateway}.md: small fixes 2021-04-02 23:15:00 +03:00
Aliaksandr Valialkin
3c1b39c978 app/vmgateway: publish docs 2021-04-02 23:08:41 +03:00
Aliaksandr Valialkin
0db901617d app: do not process non-GET requests on at / handler 2021-04-02 22:54:06 +03:00
Aliaksandr Valialkin
569e58dcdf vendor: make vendor-update 2021-04-02 22:20:17 +03:00
Aliaksandr Valialkin
b1d0028e79 app/vmauth: add support for authorization via Authorization: Bearer <token> 2021-04-02 22:14:53 +03:00
Aliaksandr Valialkin
b88feb631e docs/vmagent.md: mention about proxy_authorization section 2021-04-02 21:24:11 +03:00
Aliaksandr Valialkin
df148f48b7 lib/promscrape: add support for authorization config in -promscrape.config as Prometheus 2.26 does
See https://github.com/prometheus/prometheus/pull/8512
2021-04-02 21:17:45 +03:00
Aliaksandr Valialkin
7f9c68cdcb lib/promscrape: add follow_redirect option to scrape_configs section like Prometheus does
See https://github.com/prometheus/prometheus/pull/8546
2021-04-02 19:56:40 +03:00
Aliaksandr Valialkin
d1dcbfd0f9 deployment/docker: upgrade Go builder from v1.16.2 to v1.16.3
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.3+label%3ACherryPickApproved
2021-04-02 19:22:32 +03:00
Aliaksandr Valialkin
c79e4a2f90 app/vmselect/promql: remove the limit on the number of time series that can be sorted, since it may confuse users
Always sort time series returned from `/api/v1/query` and `/api/v1/query_range` unless `sort_*` function is used at top level of the query.
2021-04-02 15:02:08 +03:00
Aliaksandr Valialkin
5b08e6fb16 lib/promscrape/discovery/kubernetes: properly track objects with the same names in multiple namespaces
This is a follow-up for 12e4785fe8

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:45:32 +03:00
Aliaksandr Valialkin
12e4785fe8 lib/promscrape/discovery/kubernetes: properly discover targets in multiple namespaces
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
2021-04-02 14:28:30 +03:00
Aliaksandr Valialkin
3967bd705a docs/CHANGELOG.md: mention about AWS IAM roles for tasks support for EC2 service discovery
Follow-up for fdb8995642
2021-04-02 13:10:03 +03:00
Nikolay
fdb8995642 Adds aws ECS credentials support (#1175) 2021-04-02 11:56:40 +03:00
Aliaksandr Valialkin
9d237408c6 Makefile: add missing -prod suffix to binary names in *_checksums.txt file
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1171
2021-04-02 11:55:31 +03:00
Aliaksandr Valialkin
759c938870 Makefile: properly generate checksums for *.tar.gz files
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1171
2021-04-02 11:50:04 +03:00
Aliaksandr Valialkin
dc9eafcd02 app/{vminsert,vmagent}: add -sortLabels command-line option for sorting time series labels before ingesting them in the storage
This option can be useful when samples for the same time series are ingested with distinct order of labels.
For example, metric{k1="v1",k2="v2"} and metric{k2="v2",k1="v1"}.
2021-03-31 23:27:58 +03:00
Aliaksandr Valialkin
e1f699bb6c lib/storage: reduce memory usage when ingesting samples for the same time series with distinct order of labels 2021-03-31 21:24:46 +03:00
Lapo Luchini
db963205cc Add vmutils-pure target (#1163) 2021-03-31 17:42:08 +03:00
Aliaksandr Valialkin
48275d8c12 app/vmagent/remotewrite: reduce memory usage when -remoteWrite.queues is set to a big value
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1167
2021-03-31 16:16:33 +03:00
Aliaksandr Valialkin
f888d194da docs/Articles.md: add a link to https://blog.kintone.io/entry/2021/03/31/175256 2021-03-31 15:14:35 +03:00
Aliaksandr Valialkin
33622c409c app/vmagent/remotewrite: reduce memory usage when samples with big number of labels are sent to remote storage 2021-03-31 00:44:54 +03:00
Aliaksandr Valialkin
3be0e6b087 docs/Single-server-VictoriaMetrics.md: update victoria-metrics -help output after e7fdea5953 2021-03-30 21:44:54 +03:00
Aliaksandr Valialkin
e7fdea5953 app/vmselect: add -search.maxStatusRequestDuration command-line flag for limiting the duration of requests to /api/v1/status/* and /api/v1/series/count 2021-03-30 21:41:35 +03:00
Denys Holius
2af39a96c5 deployment: Grafana version updated to 7.5.1 (#1161) 2021-03-30 20:43:54 +03:00
Aliaksandr Valialkin
2d3082bb55 docs/CHANGELOG.md: cut v1.57.1 2021-03-30 15:40:06 +03:00
Aliaksandr Valialkin
0fe8f11090 docs/CHANGELOG.md: mention about returned back type label for vm_tenant_inserted_rows_total metric
See 9b4e608199
2021-03-30 15:16:57 +03:00
Aliaksandr Valialkin
d58d5562f1 app/vmselect: remove -search.storageTimeout command-line flag, since it has the same meaning as -search.maxQueryDuration
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2021-03-30 15:04:54 +03:00
Aliaksandr Valialkin
7962cf1af8 app/vmselect: prevent from possible incomplete query results after timed out query
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
2021-03-30 13:35:45 +03:00
Aliaksandr Valialkin
65c60cf413 Makefile: build arm binaries on make release-victoria-metrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:44:39 +03:00
Aliaksandr Valialkin
75991277fa Makefile: build vmutils for arm on make release-vmutils
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:15:29 +03:00
Aliaksandr Valialkin
1db1a29ffa all: increase minimum supported Go version for building VictoriaMetrics components from v1.14 to v1.15
This is needed after the commit c0ac740f93, which uses URL.Redacted() method,
which has been added in v1.15.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1147
2021-03-29 23:04:53 +03:00
1230 changed files with 134736 additions and 66301 deletions

View File

@@ -4,14 +4,14 @@ about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
It would be great [upgrading](https://victoriametrics.github.io/#how-to-upgrade) to [the latest avaialble release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
It would be a great [upgrading](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verifying whether the bug is reproducible there.
It is also recommended reading [troubleshooting docs](https://victoriametrics.github.io/#troubleshooting).
It is also recommended reading [troubleshooting docs](https://docs.victoriametrics.com/#troubleshooting).
**To Reproduce**
Steps to reproduce the behavior.
@@ -19,9 +19,22 @@ Steps to reproduce the behavior.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Logs**
Check if any warnings or errors were logged by VictoriaMetrics components
or components in communication with VictoriaMetrics (e.g. Prometheus, Grafana).
**Screenshots**
If applicable, add screenshots to help explain your problem.
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [montioring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
**Version**
The line returned when passing `--version` command line flag to binary. For example:
```
@@ -30,15 +43,5 @@ victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
**Used command-line flags**
Command-line flags are listed as `flag{name="httpListenAddr", value=":443"} 1` lines at `/metrics` page.
See the following docs for details:
Please provide applied command-line flags used for running VictoriaMetrics and its components.
* [monitoring for single-node VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/README.md#monitoring)
* [montioring for VictoriaMetrics cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/README.md#monitoring)
**Additional context**
Add any other context about the problem here such as error logs from VictoriaMetrics and Prometheus,
`/metrics` output, screenshots from the official Grafana dashboards for VictoriaMetrics:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)

26
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,26 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
- package-ecosystem: "gomod"
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "bundler"
directory: "/docs"
schedule:
interval: "daily"
- package-ecosystem: "gomod"
directory: "/app/vmui/packages/vmui/web"
schedule:
interval: "weekly"
- package-ecosystem: "docker"
directory: "/"
schedule:
interval: "daily"
- package-ecosystem: "npm"
directory: "/app/vmui"
schedule:
interval: "weekly"

23
.github/workflows/check-licenses.yml vendored Normal file
View File

@@ -0,0 +1,23 @@
name: license-check
on:
push:
paths:
- 'vendor'
pull_request:
paths:
- 'vendor'
jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.16
id: go
- name: Code checkout
uses: actions/checkout@master
- name: Check License
run: |
make check-licenses

View File

@@ -1,30 +0,0 @@
name: github-pages
on:
push:
paths:
- 'docs/*'
- 'README.md'
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: publish
shell: bash
env:
TOKEN: ${{secrets.CI_TOKEN}}
run: |
git clone https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.github.io.git gpages
cp docs/* gpages
cp README.md gpages
cd gpages
git config --local user.email "info@victoriametrics.com"
git config --local user.name "Vika"
git add .
git commit -m "update github pages"
remote_repo="https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.github.io.git"
git push "${remote_repo}"
cd ..
rm -rf gpages

View File

@@ -60,8 +60,7 @@ jobs:
GOOS=darwin go build -mod=vendor ./app/vmctl
CGO_ENABLED=0 GOOS=windows go build -mod=vendor ./app/vmagent
- name: Publish coverage
uses: codecov/codecov-action@v1.0.6
uses: codecov/codecov-action@v2.1.0
with:
token: ${{secrets.CODECOV_TOKEN}}
file: ./coverage.txt

View File

@@ -16,7 +16,7 @@ jobs:
TOKEN: ${{secrets.CI_TOKEN}}
run: |
git clone https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.wiki.git wiki
cp docs/* wiki
cp -r docs/* wiki
cd wiki
git config --local user.email "info@victoriametrics.com"
git config --local user.name "Vika"

4
.gitignore vendored
View File

@@ -15,3 +15,7 @@
/package/temp-rpm-*
/package/*.deb
/package/*.rpm
.DS_store
Gemfile.lock
/_site
_site

5
.wwhrd.yml Normal file
View File

@@ -0,0 +1,5 @@
allowlist:
- Apache-2.0
- MIT
- BSD-3-Clause
- BSD-2-Clause

View File

@@ -7,7 +7,7 @@ contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
appearance, race, religion or sexual identity and orientation.
## Our Standards
@@ -24,9 +24,9 @@ Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Trolling, insulting/derogatory comments and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
* Publishing others' private information, such as physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
@@ -38,26 +38,26 @@ behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
reject comments, commits, code, wiki edits, issues and other contributions
that are not aligned to this Code of Conduct or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
threatening, offensive or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
address, posting via an official social media account or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
Instances of abusive, harassing or otherwise unacceptable behavior may be
reported by contacting the project team at info@victoriametrics.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
is deemed necessary and appropriate for the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

View File

@@ -1,5 +1,6 @@
PKG_PREFIX := github.com/VictoriaMetrics/VictoriaMetrics
DATEINFO_TAG ?= $(shell date -u +'%Y%m%d-%H%M%S')
BUILDINFO_TAG ?= $(shell echo $$(git describe --long --all | tr '/' '-')$$( \
git diff-index --quiet HEAD -- || echo '-dirty-'$$(git diff-index -u HEAD | openssl sha1 | cut -c 10-17)))
@@ -8,7 +9,7 @@ ifeq ($(PKG_TAG),)
PKG_TAG := $(BUILDINFO_TAG)
endif
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(shell date -u +'%Y%m%d-%H%M%S')-$(BUILDINFO_TAG)'
GO_BUILDINFO = -X '$(PKG_PREFIX)/lib/buildinfo.Version=$(APP_NAME)-$(DATEINFO_TAG)-$(BUILDINFO_TAG)'
.PHONY: $(MAKECMDGOALS)
@@ -53,6 +54,14 @@ vmutils: \
vmrestore \
vmctl
vmutils-pure: \
vmagent-pure \
vmalert-pure \
vmauth-pure \
vmbackup-pure \
vmrestore-pure \
vmctl-pure
vmutils-arm64: \
vmagent-arm64 \
vmalert-arm64 \
@@ -61,6 +70,14 @@ vmutils-arm64: \
vmrestore-arm64 \
vmctl-arm64
vmutils-arm: \
vmagent-arm \
vmalert-arm \
vmauth-arm \
vmbackup-arm \
vmrestore-arm \
vmctl-arm
vmutils-windows-amd64: \
vmagent-windows-amd64 \
vmalert-windows-amd64 \
@@ -77,11 +94,15 @@ release: \
release-victoria-metrics: \
release-victoria-metrics-amd64 \
release-victoria-metrics-arm \
release-victoria-metrics-arm64
release-victoria-metrics-amd64:
GOARCH=amd64 $(MAKE) release-victoria-metrics-generic
release-victoria-metrics-arm:
GOARCH=arm $(MAKE) release-victoria-metrics-generic
release-victoria-metrics-arm64:
GOARCH=arm64 $(MAKE) release-victoria-metrics-generic
@@ -91,11 +112,12 @@ release-victoria-metrics-generic: victoria-metrics-$(GOARCH)-prod
victoria-metrics-$(GOARCH)-prod \
&& sha256sum victoria-metrics-$(GOARCH)-$(PKG_TAG).tar.gz \
victoria-metrics-$(GOARCH)-prod \
| sed s/-$(GOARCH)// > victoria-metrics-$(GOARCH)-$(PKG_TAG)_checksums.txt
| sed s/-$(GOARCH)-prod/-prod/ > victoria-metrics-$(GOARCH)-$(PKG_TAG)_checksums.txt
release-vmutils: \
release-vmutils-amd64 \
release-vmutils-arm64 \
release-vmutils-arm \
release-vmutils-windows-amd64
release-vmutils-amd64:
@@ -104,6 +126,9 @@ release-vmutils-amd64:
release-vmutils-arm64:
GOARCH=arm64 $(MAKE) release-vmutils-generic
release-vmutils-arm:
GOARCH=arm $(MAKE) release-vmutils-generic
release-vmutils-windows-amd64:
GOARCH=amd64 $(MAKE) release-vmutils-windows-generic
@@ -129,7 +154,7 @@ release-vmutils-generic: \
vmbackup-$(GOARCH)-prod \
vmrestore-$(GOARCH)-prod \
vmctl-$(GOARCH)-prod \
| sed s/-$(GOARCH)// > vmutils-$(GOARCH)-$(PKG_TAG)_checksums.txt
| sed s/-$(GOARCH)-prod/-prod/ > vmutils-$(GOARCH)-$(PKG_TAG)_checksums.txt
release-vmutils-windows-generic: \
vmagent-windows-$(GOARCH)-prod \
@@ -165,7 +190,7 @@ lint: install-golint
golint app/...
install-golint:
which golint || go install golang.org/x/lint/golint
which golint || GO111MODULE=off go get golang.org/x/lint/golint
errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./lib/...
@@ -180,7 +205,7 @@ errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./app/vmctl/...
install-errcheck:
which errcheck || go install github.com/kisielk/errcheck
which errcheck || GO111MODULE=off go get github.com/kisielk/errcheck
check-all: fmt vet lint errcheck golangci-lint
@@ -229,20 +254,36 @@ quicktemplate-gen: install-qtc
qtc
install-qtc:
which qtc || go install github.com/valyala/quicktemplate/qtc
which qtc || GO111MODULE=off go get github.com/valyala/quicktemplate/qtc
golangci-lint: install-golangci-lint
golangci-lint run --exclude '(SA4003|SA1019|SA5011):' -D errcheck -D structcheck --timeout 2m
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.29.0
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.40.1
install-wwhrd:
which wwhrd || GO111MODULE=off go get github.com/frapposelli/wwhrd
check-licenses: install-wwhrd
wwhrd check -f .wwhrd.yml
copy-docs:
echo "---\nsort: ${ORDER}\n---\n" > ${DST}
cat ${SRC} >> ${DST}
# Copies docs for all components and adds the order tag.
# Cluster docs are supposed to be ordered as 9th.
# For The rest of docs is ordered manually.t
docs-sync:
cp app/vmagent/README.md docs/vmagent.md
cp app/vmalert/README.md docs/vmalert.md
cp app/vmauth/README.md docs/vmauth.md
cp app/vmbackup/README.md docs/vmbackup.md
cp app/vmrestore/README.md docs/vmrestore.md
cp app/vmctl/README.md docs/vmctl.md
cp README.md docs/Single-server-VictoriaMetrics.md
cp README.md docs/README.md
SRC=README.md DST=docs/Single-server-VictoriaMetrics.md ORDER=1 $(MAKE) copy-docs
SRC=app/vmagent/README.md DST=docs/vmagent.md ORDER=3 $(MAKE) copy-docs
SRC=app/vmalert/README.md DST=docs/vmalert.md ORDER=4 $(MAKE) copy-docs
SRC=app/vmauth/README.md DST=docs/vmauth.md ORDER=5 $(MAKE) copy-docs
SRC=app/vmbackup/README.md DST=docs/vmbackup.md ORDER=6 $(MAKE) copy-docs
SRC=app/vmrestore/README.md DST=docs/vmrestore.md ORDER=7 $(MAKE) copy-docs
SRC=app/vmctl/README.md DST=docs/vmctl.md ORDER=8 $(MAKE) copy-docs
SRC=app/vmgateway/README.md DST=docs/vmgateway.md ORDER=9 $(MAKE) copy-docs
SRC=app/vmbackupmanager/README.md DST=docs/vmbackupmanager.md ORDER=10 $(MAKE) copy-docs

603
README.md

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -3,10 +3,8 @@ package main
import (
"flag"
"fmt"
"io"
"net/http"
"os"
"path"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert"
@@ -26,9 +24,8 @@ import (
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Remove superflouos samples from time series if they are located closer to each other than this duration. "+
"This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. "+
"Deduplication is disabled if the -dedup.minScrapeInterval is 0")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the first sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication for details")
dryRun = flag.Bool("dryRun", false, "Whether to check only -promscrape.config and then exit. "+
"Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse")
)
@@ -86,14 +83,20 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
fmt.Fprintf(w, "<h2>Single-node VictoriaMetrics.</h2></br>")
fmt.Fprintf(w, "See docs at <a href='https://victoriametrics.github.io/'>https://victoriametrics.github.io/</a></br>")
fmt.Fprintf(w, "Useful endpoints: </br>")
writeAPIHelp(w, [][]string{
if r.Method != "GET" {
return false
}
fmt.Fprintf(w, "<h2>Single-node VictoriaMetrics</h2></br>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/'>https://docs.victoriametrics.com/</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"/vmui", "Web UI"},
{"/targets", "discovered targets list"},
{"/api/v1/targets", "advanced information about discovered targets in JSON format"},
{"/metrics", "available service metrics"},
{"/api/v1/status/tsdb", "tsdb status page"},
{"/api/v1/status/top_queries", "top queries"},
{"/api/v1/status/active_queries", "active queries"},
})
return true
}
@@ -109,20 +112,11 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return false
}
func writeAPIHelp(w io.Writer, pathList [][]string) {
pathPrefix := httpserver.GetPathPrefix()
for _, p := range pathList {
p, doc := p[0], p[1]
p = path.Join(pathPrefix, p)
fmt.Fprintf(w, "<a href='%s'>%q</a> - %s<br/>", p, p, doc)
}
}
func usage() {
const s = `
victoria-metrics is a time series database and monitoring solution.
See the docs at https://victoriametrics.github.io/
See the docs at https://docs.victoriametrics.com/
`
flagutil.Usage(s)
}

View File

@@ -11,6 +11,6 @@
"status":"success",
"data":{"resultType":"matrix",
"result":[
{"metric":{"item":"y"},"values":[["{TIME_S-1m}","0.5"]]}
{"metric":{"item":"y"},"values":[["{TIME_S-1m}","0.5"], ["{TIME_S}","0.5"]]}
]}}
}

View File

@@ -13,6 +13,6 @@
"data":{"resultType":"matrix",
"result":[
{"metric":{"__name__":"not_nan_as_missing_data","item":"x"},"values":[["{TIME_S-2m}","2"]]},
{"metric":{"__name__":"not_nan_as_missing_data","item":"y"},"values":[["{TIME_S-2m}","4"],["{TIME_S-1m}","3"]]}
{"metric":{"__name__":"not_nan_as_missing_data","item":"y"},"values":[["{TIME_S-2m}","4"],["{TIME_S-1m}","3"],["{TIME_S}", "3"]]}
]}}
}

View File

@@ -1,4 +1,4 @@
## vmagent
# vmagent
`vmagent` is a tiny but mighty agent which helps you collect metrics from various sources
and store them in [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)
@@ -21,21 +21,23 @@ to `vmagent` such as the ability to push metrics instead of pulling them. We did
See [Quick Start](#quick-start) for details.
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
* Accepts data via all ingestion protocols supported by VictoriaMetrics:
* Influx line protocol via `http://<vmagent>:8429/write`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf).
* Graphite plaintext protocol if `-graphiteListenAddr` command-line flag is set. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-graphite-compatible-agents-such-as-statsd).
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-opentsdb-compatible-agents).
* InfluxDB line protocol via `http://<vmagent>:8429/write`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf).
* Graphite plaintext protocol if `-graphiteListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-graphite-compatible-agents-such-as-statsd).
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-opentsdb-compatible-agents).
* Prometheus remote write protocol via `http://<vmagent>:8429/api/v1/write`.
* JSON lines import protocol via `http://<vmagent>:8429/api/v1/import`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-json-line-format).
* Native data import protocol via `http://<vmagent>:8429/api/v1/import/native`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-native-format).
* Data in Prometheus exposition format. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format) for details.
* Arbitrary CSV data via `http://<vmagent>:8429/api/v1/import/csv`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-import-csv-data).
* JSON lines import protocol via `http://<vmagent>:8429/api/v1/import`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-json-line-format).
* Native data import protocol via `http://<vmagent>:8429/api/v1/import/native`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-native-format).
* Prometheus exposition format via `http://<vmagent>:8429/api/v1/import/prometheus`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format) for details.
* Arbitrary CSV data via `http://<vmagent>:8429/api/v1/import/csv`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-csv-data).
* Can replicate collected metrics simultaneously to multiple remote storage systems.
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
* Uses lower amounts of RAM, CPU, disk IO and network bandwidth compared with Prometheus.
* Scrape targets can be spread among multiple `vmagent` instances when big number of targets must be scraped. See [these docs](#scraping-big-number-of-targets) for details.
* Scrape targets can be spread among multiple `vmagent` instances when big number of targets must be scraped. See [these docs](#scraping-big-number-of-targets).
* Can efficiently scrape targets that expose millions of time series such as [/federate endpoint in Prometheus](https://prometheus.io/docs/prometheus/latest/federation/). See [these docs](#stream-parsing-mode).
* Can deal with [high cardinality](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality) and [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) issues by limiting the number of unique time series at scrape time and before sending them to remote storage systems. See [these docs](#cardinality-limiter).
* Can load scrape configs from multiple files. See [these docs](#loading-scrape-configs-from-multiple-files).
## Quick Start
@@ -51,13 +53,13 @@ Example command line:
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
If you only need to collect Influx data, then the following command is sufficient:
If you only need to collect InfluxDB data, then the following command is sufficient:
```
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
```
Then send Influx data to `http://vmagent-host:8429`. See [these docs](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf) for more details.
Then send InfluxDB data to `http://vmagent-host:8429`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf) for more details.
`vmagent` is also available in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags).
@@ -133,7 +135,11 @@ Also, Basic Auth can be enabled for the incoming `remote_write` requests with `-
### remote_write for clustered version
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://victoriametrics.github.io/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<customer-id>/prometheus/api/v1/write`
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<accountID>/prometheus/api/v1/write` according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format). There is also support for multitenant writes. See [these docs](#multitenancy).
## Multitenancy
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`. But it can accept multitenant data if `-remoteWrite.multitenantURL` is set. In this case it accepts multitenant data at `http://vmagent:8429/insert/<accountID>/...` in the same way as cluster version of VictoriaMetrics does according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) and routes it to `<-remoteWrite.multitenantURL>/insert/<accountID>/prometheus/api/v1/write`. If multiple `-remoteWrite.multitenantURL` command-line options are set, then `vmagent` replicates the collected data across all the configured urls. This allows using a single `vmagent` instance in front of VictoriaMetrics clusters for processing the data from all the tenants.
## How to collect metrics in Prometheus format
@@ -171,20 +177,31 @@ The following scrape types in [scrape_config](https://prometheus.io/docs/prometh
* `openstack_sd_configs` - is for scraping OpenStack targets.
See [openstack_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config) for details.
[OpenStack identity API v3](https://docs.openstack.org/api-ref/identity/v3/) is supported only.
* `docker_sd_configs` - is for scraping Docker targets.
See [docker_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config) for details.
* `dockerswarm_sd_configs` - is for scraping Docker Swarm targets.
See [dockerswarm_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config) for details.
* `eureka_sd_configs` - is for scraping targets registered in [Netflix Eureka](https://github.com/Netflix/eureka).
See [eureka_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config) for details.
* `digitalocean_sd_configs` is for scraping targerts registered in [DigitalOcean](https://www.digitalocean.com/)
See [digitalocean_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config) for details.
* `http_sd_configs` is for scraping targerts registered in http service discovery.
See [http_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config) for details.
Please file feature requests to [our issue tracker](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you need other service discovery mechanisms to be supported by `vmagent`.
`vmagent` also support the following additional options in `scrape_config` section:
`vmagent` also support the following additional options in `scrape_configs` section:
* `disable_compression: true` - to disable response compression on a per-job basis. By default `vmagent` requests compressed responses from scrape targets
to save network bandwidth.
* `disable_keepalive: true` - to disable [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
By default, `vmagent` uses keep-alive connections to scrape targets to reduce overhead on connection re-establishing.
* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics.
* `series_limit: N` - for limiting the number of unique time series a single scrape target can expose. See [these docs](#cardinality-limiter).
* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics. See [these docs](#stream-parsing-mode).
* `scrape_align_interval: duration` - for aligning scrapes to the given interval instead of using random offset in the range `[0 ... scrape_interval]` for scraping each target. The random offset helps spreading scrapes evenly in time.
* `scrape_offset: duration` - for specifying the exact offset for scraping instead of using random offset in the range `[0 ... scrape_interval]`.
* `relabel_debug: true` - for enabling debug logging during relabeling of the discovered targets. See [these docs](#relabeling).
* `metric_relabel_debug: true` - for enabling debug logging during relabeling of the scraped metrics. See [these docs](#relabeling).
Note that `vmagent` doesn't support `refresh_interval` option for these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
command-line flag instead. For example, `-promscrape.consulSDCheckInterval=60s` sets `refresh_interval` for all the `consul_sd_configs`
@@ -193,11 +210,35 @@ entries to 60s. Run `vmagent -help` in order to see default values for the `-pro
The file pointed by `-promscrape.config` may contain `%{ENV_VAR}` placeholders which are substituted by the corresponding `ENV_VAR` environment variable values.
## Loading scrape configs from multiple files
`vmagent` supports loading scrape configs from multiple files specified in the `scrape_config_files` section of `-promscrape.config` file. For example, the following `-promscrape.config` instructs `vmagent` loading scrape configs from all the `*.yml` files under `configs` directory plus a `single_scrape_config.yml` file:
```yml
scrape_config_files:
- configs/*.yml
- single_scrape_config.yml
```
Every referred file can contain arbitrary number of any [supported scrape configs](#how-to-collect-metrics-in-prometheus-format). There is no need in specifying top-level `scrape_configs` section in these files. For example:
```yml
- job_name: foo
static_configs:
- targets: ["vmagent:8429"]
- job_name: bar
kubernetes_sd_configs:
- role: pod
```
`vmagent` dynamically reloads these files on `SIGHUP` signal or on the request to `http://vmagent:8429/-/reload`.
## Adding labels to metrics
Labels can be added to metrics by the following mechanisms:
* The `global -> external_labels` section in `-promscrape.config` file. These labels are added only to metrics scraped from targets configured in the `-promscrape.config` file. They aren't added to metrics collected via other [data ingestion protocols](https://victoriametrics.github.io/#how-to-import-time-series-data).
* The `global -> external_labels` section in `-promscrape.config` file. These labels are added only to metrics scraped from targets configured in the `-promscrape.config` file. They aren't added to metrics collected via other [data ingestion protocols](https://docs.victoriametrics.com/#how-to-import-time-series-data).
* The `-remoteWrite.label` command-line flag. These labels are added to all the collected metrics before sending them to `-remoteWrite.url`. For example, the following command will start `vmagent`, which will add `{datacenter="foobar"}` label to all the metrics pushed to all the configured remote storage systems (all the `-remoteWrite.url` flag values):
```
@@ -207,20 +248,37 @@ Labels can be added to metrics by the following mechanisms:
## Relabeling
`vmagent` supports [Prometheus relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config).
and also provides the following actions:
`vmagent` and VictoriaMetrics support Prometheus-compatible relabeling.
They provide the following additional actions on top of actions from the [Prometheus relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config):
* `replace_all`: replaces all of the occurences of `regex` in the values of `source_labels` with the `replacement` and stores the results in the `target_label`.
* `labelmap_all`: replaces all of the occurences of `regex` in all the label names with the `replacement`.
* `keep_if_equal`: keeps the entry if all the label values from `source_labels` are equal.
* `drop_if_equal`: drops the entry if all the label values from `source_labels` are equal.
* `keep_metrics`: keeps all the metrics with names matching the given `regex`.
* `drop_metrics`: drops all the metrics with names matching the given `regex`.
The `regex` value can be split into multiple lines for improved readability and maintainability. These lines are automatically joined with `|` char when parsed. For example, the following configs are equivalent:
```yaml
- action: keep_metrics
regex: "metric_a|metric_b|foo_.+"
```
```yaml
- action: keep_metrics
regex:
- "metric_a"
- "metric_b"
- "foo_.+"
```
The relabeling can be defined in the following places:
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels.
* At the `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`.
* At the `-remoteWrite.relabelConfig` file. This relabeling is aplied to all the collected metrics before sending them to remote storage.
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`.
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels. This relabeling can be debugged by passing `relabel_debug: true` option to the corresponding `scrape_config` section. In this case `vmagent` logs target labels before and after the relabeling and then drops the logged target.
* At the `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`. This relabeling can be debugged by passing `metric_relabel_debug: true` option to the corresponding `scrape_config` section. In this case `vmagent` logs metrics before and after the relabeling and then drops the logged metrics.
* At the `-remoteWrite.relabelConfig` file. This relabeling is aplied to all the collected metrics before sending them to remote storage. This relabeling can be debugged by passing `-remoteWrite.relabelDebug` command-line option to `vmagent`. In this case `vmagent` logs metrics before and after the relabeling and then drops all the logged metrics instead of sending them to remote storage.
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`. This relabeling can be debugged by passing `-remoteWrite.urlRelabelDebug` command-line options to `vmagent`. In this case `vmagent` logs metrics before and after the relabeling and then drops all the logged metrics instead of sending them to the corresponding `-remoteWrite.url`.
You can read more about relabeling in the following articles:
@@ -232,10 +290,50 @@ You can read more about relabeling in the following articles:
* [relabel_configs vs metric_relabel_configs](https://www.robustperception.io/relabel_configs-vs-metric_relabel_configs)
## Prometheus staleness markers
`vmagent` sends [Prometheus staleness markers](https://www.robustperception.io/staleness-and-promql) to `-remoteWrite.url` in the following cases:
* If they are passed to `vmagent` via [Prometheus remote_write protocol](#prometheus-remote_write-proxy).
* If the metric disappears from the list of scraped metrics, then stale marker is sent to this particular metric.
* If the scrape target becomes temporarily unavailable, then stale markers are sent for all the metrics scraped from this target.
* If the scrape target is removed from the list of targets, then stale markers are sent for all the metrics scraped from this target.
* Stale markers are sent for all the scraped metrics on graceful shutdown of `vmagent`.
Prometheus staleness markers aren't sent to `-remoteWrite.url` in [stream parsing mode](#stream-parsing-mode) or if `-promscrape.noStaleMarkers` command-line is set.
## Stream parsing mode
By default `vmagent` reads the full response from scrape target into memory, then parses it, applies [relabeling](#relabeling) and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode. When this mode is enabled, then `vmagent` reads response from scrape target in chunks, then immediately processes every chunk and pushes the processed metrics to remote storage. This allows saving memory when scraping targets that expose millions of metrics. Stream parsing mode may be enabled in the following places:
- Via `-promscrape.streamParse` command-line flag. In this case all the scrape targets defined in the file pointed by `-promscrape.config` are scraped in stream parsing mode.
- Via `stream_parse: true` option at `scrape_configs` section. In this case all the scrape targets defined in this section are scraped in stream parsing mode.
- Via `__stream_parse__=true` label, which can be set via [relabeling](#relabeling) at `relabel_configs` section. In this case stream parsing mode is enabled for the corresponding scrape targets. Typical use case: to set the label via [Kubernetes annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) for targets exposing big number of metrics.
Examples:
```yml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
honor_labels: true
metrics_path: /federate
params:
'match[]': ['{__name__!=""}']
```
Note that `sample_limit` option doesn't prevent from data push to remote storage if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed.
## Scraping big number of targets
A single `vmagent` instance can scrape tens of thousands of scrape targets. Sometimes this isn't enough due to limitations on CPU, network, RAM, etc.
In this case scrape targets can be split among multiple `vmagent` instances (aka `vmagent` horizontal scaling and clustering).
In this case scrape targets can be split among multiple `vmagent` instances (aka `vmagent` horizontal scaling, sharding and clustering).
Each `vmagent` instance in the cluster must use identical `-promscrape.config` files with distinct `-promscrape.cluster.memberNum` values.
The flag value must be in the range `0 ... N-1`, where `N` is the number of `vmagent` instances in the cluster.
The number of `vmagent` instances in the cluster must be passed to `-promscrape.cluster.membersCount` command-line flag. For example, the following commands
@@ -257,12 +355,12 @@ start a cluster of three `vmagent` instances, where each target is scraped by tw
```
If each target is scraped by multiple `vmagent` instances, then data deduplication must be enabled at remote storage pointed by `-remoteWrite.url`.
See [these docs](https://victoriametrics.github.io/#deduplication) for details.
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
## Scraping targets via a proxy
`vmagent` supports scraping targets via http and https proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
`vmagent` supports scraping targets via http, https and socks5 proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
target scraping via https proxy at `https://proxy-addr:1234`:
```yml
@@ -273,6 +371,7 @@ scrape_configs:
Proxy can be configured with the following optional settings:
* `proxy_authorization` for generic token authorization. See [Prometheus docs for details on authorization section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config)
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
@@ -294,6 +393,32 @@ scrape_configs:
server_name: real-server-name
```
## Cardinality limiter
By default `vmagent` doesn't limit the number of time series each scrape target can expose. The limit can be enforced in the following places:
- Via `-promscrape.seriesLimitPerTarget` command-line option. This limit is applied individually to all the scrape targets defined in the file pointed by `-promscrape.config`.
- Via `series_limit` config option at `scrape_config` section. This limit is applied individually to all the scrape targets defined in the given `scrape_config`.
- Via `__series_limit__` label, which can be set with [relabeling](#relabeling) at `relabel_configs` section. This limit is applied to the corresponding scrape targets. Typical use case: to set the limit via [Kubernetes annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) for targets, which may expose too high number of time series.
All the scraped metrics are dropped for time series exceeding the given limit. The exceeded limit can be [monitored](#monitoring) via `promscrape_series_limit_rows_dropped_total` metric.
See also `sample_limit` option at [scrape_config section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`. The limit can be enforced by setting the following command-line flags:
* `-remoteWrite.maxHourlySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last hour. Useful for limiting the number of active time series.
* `-remoteWrite.maxDailySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last day. Useful for limiting daily churn rate.
Both limits can be set simultaneously. If any of these limits is reached, then samples for new time series are dropped instead of sending them to remote storage systems. A sample of dropped series is put in the log with `WARNING` level.
The exceeded limits can be [monitored](#monitoring) with the following metrics:
* `vmagent_hourly_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded hourly limit on the number of unique time series.
* `vmagent_daily_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded daily limit on the number of unique time series.
These limits are approximate, so `vmagent` can underflow/overflow the limit by a small percentage (usually less than 1%).
## Monitoring
@@ -321,6 +446,12 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
* We recommend you increase the maximum number of open files in the system (`ulimit -n`) when scraping a big number of targets,
as `vmagent` establishes at least a single TCP connection per target.
* If `vmagent` uses too big amounts of memory, then the following options can help:
* Enabling stream parsing. See [these docs](#stream-parsing-mode).
* Reducing the number of output queues with `-remoteWrite.queues` command-line option.
* Reducing the amounts of RAM vmagent can use for in-memory buffering with `-memory.allowedPercent` or `-memory.allowedBytes` command-line option. Another option is to reduce memory limits in Docker and/or Kuberntes if `vmagent` runs under these systems.
* Reducing the number of CPU cores vmagent can use by passing `GOMAXPROCS=N` environment variable to `vmagent`, where `N` is the desired limit on CPU cores. Another option is to reduce CPU limits in Docker or Kubernetes if `vmagent` runs under these systems.
* When `vmagent` scrapes many unreliable targets, it can flood the error log with scrape errors. These errors can be suppressed
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
and `http://vmagent-host:8429/api/v1/targets`.
@@ -332,25 +463,7 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
This option drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page, which may result in reduced debuggability for improperly configured per-target relabeling.
* If `vmagent` scrapes targets with millions of metrics per target (for example, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
we recommend enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all of the scrape targets
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
```yml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
honor_labels: true
metrics_path: /federate
params:
'match[]': ['{__name__!=""}']
```
Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option
doesn't make sense during stream parsing.
we recommend enabling [stream parsing mode](#stream-parsing-mode) in order to reduce memory usage during scraping.
* We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page grows constantly.
@@ -360,7 +473,7 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
* `vmagent` drops data blocks if remote storage replies with `400 Bad Request` and `409 Conflict` HTTP responses. The number of dropped blocks can be monitored via `vmagent_remotewrite_packets_dropped_total` metric exported at [/metrics page](#monitoring).
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling). Such storage systems include Prometheus, Cortex and Thanos.
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling). Such storage systems include Prometheus, Cortex and Thanos, which typically emit `out of order sample` errors. The best solution is to use remote storage with [backfilling support](https://docs.victoriametrics.com/#backfilling).
* `vmagent` buffers scraped data at the `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
The directory can grow large when remote storage is unavailable for extended periods of time and if `-remoteWrite.maxDiskUsagePerURL` isn't set.
@@ -415,7 +528,7 @@ We recommend using [binary releases](https://github.com/VictoriaMetrics/Victoria
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmagent` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds the `vmagent` binary and puts it into the `bin` folder.
@@ -444,7 +557,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmagent-arm` or `make vmagent-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics)
It builds `vmagent-arm` or `vmagent-arm64` binary respectively and puts it into the `bin` folder.
@@ -485,7 +598,7 @@ The collected profiles may be analyzed with [go tool pprof](https://github.com/g
vmagent collects metrics data via popular data ingestion protocols and routes them to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmagent.html .
See the docs at https://docs.victoriametrics.com/vmagent.html .
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
@@ -494,7 +607,7 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-fs.disableMmap
@@ -504,17 +617,17 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-graphiteTrimTimestamp duration
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
@@ -526,26 +639,26 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
-influx.databaseNames array
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-influx.maxLineSize size
The maximum size in bytes for a single Influx line during parsing
The maximum size in bytes for a single InfluxDB line during parsing
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
-influxListenAddr string
TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
-influxMeasurementFieldSeparator string
Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol (default "_")
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
-influxSkipMeasurement
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
-influxSkipSingleField
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field
-influxTrimTimestamp duration
Trim timestamps for Influx line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-insert.maxQueueDuration duration
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
@@ -555,17 +668,17 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxConcurrentInserts int
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
-maxInsertRequestSize size
The maximum size in bytes of a single Prometheus remote_write API request
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
@@ -586,9 +699,9 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-promscrape.cluster.membersCount int
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
-promscrape.cluster.replicationFactor int
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://victoriametrics.github.io/#deduplication (default 1)
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://docs.victoriametrics.com/#deduplication (default 1)
-promscrape.config string
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://victoriametrics.github.io/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. See https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
-promscrape.config.dryRun
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
-promscrape.config.strictParse
@@ -599,6 +712,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.digitaloceanSDCheckInterval duration
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
@@ -609,6 +724,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerSDCheckInterval duration
Interval for checking for changes in docker. This works only if docker_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
@@ -621,6 +738,8 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.httpSDCheckInterval duration
Interval for checking for changes in http endpoint service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
@@ -630,44 +749,78 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
-promscrape.maxScrapeSize size
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
-promscrape.noStaleMarkers
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. See also https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode
-promscrape.openstackSDCheckInterval duration
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
-promscrape.seriesLimitPerTarget int
Optional limit on the number of unique time series a single scrape target can expose. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more info
-promscrape.streamParse
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.suppressDuplicateScrapeTargetErrors
Whether to suppress 'duplicate scrape target' errors; see https://victoriametrics.github.io/vmagent.html#troubleshooting for details
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
-promscrape.suppressScrapeErrors
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-remoteWrite.basicAuth.password array
Optional basic auth password to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.basicAuth.passwordFile array
Optional path to basic auth password to use for -remoteWrite.url. The file is re-read every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.basicAuth.username array
Optional basic auth username to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.bearerToken array
Optional bearer auth token to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.bearerTokenFile array
Optional path to bearer token file to use for -remoteWrite.url. The token is re-read from the file every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.flushInterval duration
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
-remoteWrite.label array
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDailySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter
-remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.maxHourlySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter
-remoteWrite.multitenantURL array
Base path for multitenant remote storage URL to write data to. See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientID array
Optional OAuth2 clientID to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientSecret array
Optional OAuth2 clientSecret to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientSecretFile array
Optional OAuth2 clientSecretFile to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.scopes array
Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.tokenUrl array
Optional OAuth2 tokenURL to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.queues int
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage (default 4)
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs (default 8)
-remoteWrite.rateLimit array
Optional rate limit in bytes per second for data sent to -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.relabelConfig string
Optional path to file with relabel_config entries. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://victoriametrics.github.io/vmagent.html#relabeling for details
Optional path to file with relabel_config entries. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details
-remoteWrite.relabelDebug
Whether to log metrics before and after relabeling with -remoteWrite.relabelConfig. If the -remoteWrite.relabelDebug is enabled, then the metrics aren't sent to remote storage. This is useful for debugging the relabeling configs
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Supports array of values separated by comma or specified via multiple flags.
@@ -681,31 +834,36 @@ See the docs at https://victoriametrics.github.io/vmagent.html .
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsInsecureSkipVerify array
Whether to skip tls verification when connecting to -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsKeyFile array
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tlsServerName array
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.tmpDataPath string
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
-remoteWrite.url array
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems
Supports array of values separated by comma or specified via multiple flags.
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelConfig array
Optional path to relabel config for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelDebug array
Whether to log metrics before and after relabeling with -remoteWrite.urlRelabelConfig. If the -remoteWrite.urlRelabelDebug is enabled, then the metrics aren't sent to the corresponding -remoteWrite.url. This is useful for debugging the relabeling configs
Supports array of values separated by comma or specified via multiple flags.
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version

View File

@@ -5,32 +5,35 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="csvimport"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="csvimport"}`)
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="csvimport"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="csvimport"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="csvimport"}`)
)
// InsertHandler processes csv data from req.
func InsertHandler(req *http.Request) error {
func InsertHandler(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
return insertRows(at, rows, extraLabels)
})
})
}
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -64,8 +67,11 @@ func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
remotewrite.Push(&ctx.WriteRequest)
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
rowsInserted.Add(len(rows))
if at != nil {
rowsTenantInserted.Get(at).Add(len(rows))
}
rowsPerInsert.Update(float64(len(rows)))
return nil
}

View File

@@ -8,34 +8,37 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
measurementFieldSeparator = flag.String("influxMeasurementFieldSeparator", "_", "Separator for '{measurement}{separator}{field_name}' metric name when inserted via Influx line protocol")
skipSingleField = flag.Bool("influxSkipSingleField", false, "Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if Influx line contains only a single field")
measurementFieldSeparator = flag.String("influxMeasurementFieldSeparator", "_", "Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol")
skipSingleField = flag.Bool("influxSkipSingleField", false, "Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field")
skipMeasurement = flag.Bool("influxSkipMeasurement", false, "Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'")
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="influx"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="influx"}`)
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="influx"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="influx"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="influx"}`)
)
// InsertHandlerForReader processes remote write for influx line protocol.
//
// See https://github.com/influxdata/telegraf/tree/master/plugins/inputs/socket_listener/
func InsertHandlerForReader(r io.Reader) error {
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, false, "", "", func(db string, rows []parser.Row) error {
return insertRows(db, rows, nil)
return parser.ParseStream(r, isGzipped, "", "", func(db string, rows []parser.Row) error {
return insertRows(nil, db, rows, nil)
})
})
}
@@ -43,7 +46,7 @@ func InsertHandlerForReader(r io.Reader) error {
// InsertHandlerForHTTP processes remote write for influx line protocol.
//
// See https://github.com/influxdata/influxdb/blob/4cbdc197b8117fee648d62e2e5be75c6575352f0/tsdb/README.md
func InsertHandlerForHTTP(req *http.Request) error {
func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
@@ -55,12 +58,12 @@ func InsertHandlerForHTTP(req *http.Request) error {
// Read db tag from https://docs.influxdata.com/influxdb/v1.7/tools/api/#write-http-endpoint
db := q.Get("db")
return parser.ParseStream(req.Body, isGzipped, precision, db, func(db string, rows []parser.Row) error {
return insertRows(db, rows, extraLabels)
return insertRows(at, db, rows, extraLabels)
})
})
}
func insertRows(db string, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
func insertRows(at *auth.Token, db string, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := getPushCtx()
defer putPushCtx(ctx)
@@ -130,8 +133,11 @@ func insertRows(db string, rows []parser.Row, extraLabels []prompbmarshal.Label)
ctx.ctx.Labels = labels
ctx.ctx.Samples = samples
ctx.commonLabels = commonLabels
remotewrite.Push(&ctx.ctx.WriteRequest)
remotewrite.PushWithAuthToken(at, &ctx.ctx.WriteRequest)
rowsInserted.Add(rowsTotal)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsTotal)
}
rowsPerInsert.Update(float64(rowsTotal))
return nil

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"fmt"
"io"
"net/http"
"os"
"strings"
@@ -19,6 +20,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
@@ -40,7 +42,7 @@ var (
httpListenAddr = flag.String("httpListenAddr", ":8429", "TCP address to listen for http connections. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for Influx line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for InfluxDB line protocol data. Usually :8189 must be set. Doesn't work if empty. "+
"This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write")
graphiteListenAddr = flag.String("graphiteListenAddr", "", "TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty")
opentsdbListenAddr = flag.String("opentsdbListenAddr", "", "TCP and UDP address to listen for OpentTSDB metrics. "+
@@ -92,7 +94,9 @@ func main() {
common.StartUnmarshalWorkers()
writeconcurrencylimiter.Init()
if len(*influxListenAddr) > 0 {
influxServer = influxserver.MustStart(*influxListenAddr, influx.InsertHandlerForReader)
influxServer = influxserver.MustStart(*influxListenAddr, func(r io.Reader) error {
return influx.InsertHandlerForReader(r, false)
})
}
if len(*graphiteListenAddr) > 0 {
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, graphite.InsertHandler)
@@ -145,61 +149,73 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
if r.URL.Path == "/" {
fmt.Fprintf(w, "vmagent - see docs at https://victoriametrics.github.io/vmagent.html")
if r.Method != "GET" {
return false
}
fmt.Fprintf(w, "<h2>vmagent</h2>")
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/vmagent.html'>https://docs.victoriametrics.com/vmagent.html</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"/targets", "discovered targets list"},
{"/api/v1/targets", "advanced information about discovered targets in JSON format"},
{"/metrics", "available service metrics"},
{"/-/reload", "reload configuration"},
})
return true
}
path := strings.Replace(r.URL.Path, "//", "/", -1)
switch path {
case "/api/v1/write":
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(r); err != nil {
if err := promremotewrite.InsertHandler(nil, r); err != nil {
prometheusWriteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "/api/v1/import":
vmimportRequests.Inc()
if err := vmimport.InsertHandler(r); err != nil {
if err := vmimport.InsertHandler(nil, r); err != nil {
vmimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "/api/v1/import/csv":
csvimportRequests.Inc()
if err := csvimport.InsertHandler(r); err != nil {
if err := csvimport.InsertHandler(nil, r); err != nil {
csvimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "/api/v1/import/prometheus":
prometheusimportRequests.Inc()
if err := prometheusimport.InsertHandler(r); err != nil {
if err := prometheusimport.InsertHandler(nil, r); err != nil {
prometheusimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "/api/v1/import/native":
nativeimportRequests.Inc()
if err := native.InsertHandler(r); err != nil {
if err := native.InsertHandler(nil, r); err != nil {
nativeimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "/write", "/api/v2/write":
influxWriteRequests.Inc()
if err := influx.InsertHandlerForHTTP(r); err != nil {
if err := influx.InsertHandlerForHTTP(nil, r); err != nil {
influxWriteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -234,9 +250,92 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
}
if remotewrite.MultitenancyEnabled() {
return processMultitenantRequest(w, r, path)
}
return false
}
func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path string) bool {
p, err := httpserver.ParsePath(path)
if err != nil {
// Cannot parse multitenant path. Skip it - probably it will be parsed later.
return false
}
if p.Prefix != "insert" {
httpserver.Errorf(w, r, `unsupported multitenant prefix: %q; expected "insert"`, p.Prefix)
return true
}
at, err := auth.NewToken(p.AuthToken)
if err != nil {
httpserver.Errorf(w, r, "cannot obtain auth token: %s", err)
return true
}
switch p.Suffix {
case "prometheus/", "prometheus", "prometheus/api/v1/write":
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(at, r); err != nil {
prometheusWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "prometheus/api/v1/import":
vmimportRequests.Inc()
if err := vmimport.InsertHandler(at, r); err != nil {
vmimportErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "prometheus/api/v1/import/csv":
csvimportRequests.Inc()
if err := csvimport.InsertHandler(at, r); err != nil {
csvimportErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "prometheus/api/v1/import/prometheus":
prometheusimportRequests.Inc()
if err := prometheusimport.InsertHandler(at, r); err != nil {
prometheusimportErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "prometheus/api/v1/import/native":
nativeimportRequests.Inc()
if err := native.InsertHandler(at, r); err != nil {
nativeimportErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "influx/write", "influx/api/v2/write":
influxWriteRequests.Inc()
if err := influx.InsertHandlerForHTTP(at, r); err != nil {
influxWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
return true
case "influx/query":
influxQueryRequests.Inc()
influxutils.WriteDatabaseNames(w)
return true
default:
httpserver.Errorf(w, r, "unsupported multitenant path suffix: %q", p.Suffix)
return true
}
}
var (
prometheusWriteRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/api/v1/write", protocol="promremotewrite"}`)
prometheusWriteErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/api/v1/write", protocol="promremotewrite"}`)
@@ -268,7 +367,7 @@ func usage() {
const s = `
vmagent collects metrics data via popular data ingestion protocols and routes it to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmagent.html .
See the docs at https://docs.victoriametrics.com/vmagent.html .
`
flagutil.Usage(s)
}

View File

@@ -5,36 +5,39 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="native"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="native"}`)
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="native"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="native"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="native"}`)
)
// InsertHandler processes `/api/v1/import` request.
//
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6
func InsertHandler(req *http.Request) error {
func InsertHandler(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(block *parser.Block) error {
return insertRows(block, extraLabels)
return insertRows(at, block, extraLabels)
})
})
}
func insertRows(block *parser.Block, extraLabels []prompbmarshal.Label) error {
func insertRows(at *auth.Token, block *parser.Block, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -42,6 +45,9 @@ func insertRows(block *parser.Block, extraLabels []prompbmarshal.Label) error {
// since relabeling can prevent from inserting the rows.
rowsLen := len(block.Values)
rowsInserted.Add(rowsLen)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsLen)
}
rowsPerInsert.Update(float64(rowsLen))
tssDst := ctx.WriteRequest.Timeseries[:0]
@@ -80,6 +86,6 @@ func insertRows(block *parser.Block, extraLabels []prompbmarshal.Label) error {
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
remotewrite.Push(&ctx.WriteRequest)
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
return nil
}

View File

@@ -1,24 +1,28 @@
package prometheusimport
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="prometheus"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="prometheus"}`)
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="prometheus"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="prometheus"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="prometheus"}`)
)
// InsertHandler processes `/api/v1/import/prometheus` request.
func InsertHandler(req *http.Request) error {
func InsertHandler(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
@@ -30,12 +34,21 @@ func InsertHandler(req *http.Request) error {
return writeconcurrencylimiter.Do(func() error {
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, defaultTimestamp, isGzipped, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
return insertRows(at, rows, extraLabels)
}, nil)
})
}
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
// InsertHandlerForReader processes metrics from given reader with optional gzip format
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, 0, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
}, nil)
})
}
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -69,8 +82,11 @@ func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
remotewrite.Push(&ctx.WriteRequest)
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
rowsInserted.Add(len(rows))
if at != nil {
rowsTenantInserted.Get(at).Add(len(rows))
}
rowsPerInsert.Update(float64(len(rows)))
return nil
}

View File

@@ -1,38 +1,51 @@
package promremotewrite
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/promremotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="promremotewrite"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="promremotewrite"}`)
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="promremotewrite"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="promremotewrite"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="promremotewrite"}`)
)
// InsertHandler processes remote write for prometheus.
func InsertHandler(req *http.Request) error {
func InsertHandler(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(tss []prompb.TimeSeries) error {
return insertRows(tss, extraLabels)
return parser.ParseStream(req.Body, func(tss []prompb.TimeSeries) error {
return insertRows(at, tss, extraLabels)
})
})
}
func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Label) error {
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(r io.Reader) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, func(tss []prompb.TimeSeries) error {
return insertRows(nil, tss, nil)
})
})
}
func insertRows(at *auth.Token, timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -68,8 +81,11 @@ func insertRows(timeseries []prompb.TimeSeries, extraLabels []prompbmarshal.Labe
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
remotewrite.Push(&ctx.WriteRequest)
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
rowsInserted.Add(rowsTotal)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsTotal)
}
rowsPerInsert.Update(float64(rowsTotal))
return nil
}

View File

@@ -2,8 +2,6 @@ package remotewrite
import (
"bytes"
"crypto/tls"
"encoding/base64"
"fmt"
"io/ioutil"
"net/http"
@@ -42,17 +40,35 @@ var (
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPassword = flagutil.NewArray("remoteWrite.basicAuth.password", "Optional basic auth password to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPasswordFile = flagutil.NewArray("remoteWrite.basicAuth.passwordFile", "Optional path to basic auth password to use for -remoteWrite.url. "+
"The file is re-read every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerToken = flagutil.NewArray("remoteWrite.bearerToken", "Optional bearer auth token to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerTokenFile = flagutil.NewArray("remoteWrite.bearerTokenFile", "Optional path to bearer token file to use for -remoteWrite.url. "+
"The token is re-read from the file every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientID = flagutil.NewArray("remoteWrite.oauth2.clientID", "Optional OAuth2 clientID to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecret = flagutil.NewArray("remoteWrite.oauth2.clientSecret", "Optional OAuth2 clientSecret to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecretFile = flagutil.NewArray("remoteWrite.oauth2.clientSecretFile", "Optional OAuth2 clientSecretFile to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2TokenURL = flagutil.NewArray("remoteWrite.oauth2.tokenUrl", "Optional OAuth2 tokenURL to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2Scopes = flagutil.NewArray("remoteWrite.oauth2.scopes", "Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
)
type client struct {
sanitizedURL string
remoteWriteURL string
authHeader string
fq *persistentqueue.FastQueue
hc *http.Client
authCfg *promauth.Config
rl rateLimiter
bytesSent *metrics.Counter
@@ -62,16 +78,18 @@ type client struct {
errorsCount *metrics.Counter
packetsDropped *metrics.Counter
retriesCount *metrics.Counter
sendDuration *metrics.FloatCounter
wg sync.WaitGroup
stopCh chan struct{}
}
func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
tlsCfg, err := getTLSConfig(argIdx)
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Panicf("FATAL: cannot initialize TLS config: %s", err)
logger.Panicf("FATAL: cannot initialize auth config: %s", err)
}
tlsCfg := authCfg.NewTLSConfig()
tr := &http.Transport{
Dial: statDial,
TLSClientConfig: tlsCfg,
@@ -92,26 +110,10 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
}
tr.Proxy = http.ProxyURL(urlProxy)
}
authHeader := ""
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
if len(username) > 0 || len(password) > 0 {
// See https://en.wikipedia.org/wiki/Basic_access_authentication
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
authHeader = "Basic " + token64
}
token := bearerToken.GetOptionalArg(argIdx)
if len(token) > 0 {
if authHeader != "" {
logger.Fatalf("`-remoteWrite.bearerToken`=%q cannot be set when `-remoteWrite.basicAuth.*` flags are set", token)
}
authHeader = "Bearer " + token
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authHeader: authHeader,
authCfg: authCfg,
fq: fq,
hc: &http.Client{
Transport: tr,
@@ -132,6 +134,7 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
c.errorsCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_errors_total{url=%q}`, c.sanitizedURL))
c.packetsDropped = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_packets_dropped_total{url=%q}`, c.sanitizedURL))
c.retriesCount = metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_retries_count_total{url=%q}`, c.sanitizedURL))
c.sendDuration = metrics.GetOrCreateFloatCounter(fmt.Sprintf(`vmagent_remotewrite_send_duration_seconds_total{url=%q}`, c.sanitizedURL))
for i := 0; i < concurrency; i++ {
c.wg.Add(1)
go func() {
@@ -149,23 +152,48 @@ func (c *client) MustStop() {
logger.Infof("stopped client for -remoteWrite.url=%q", c.sanitizedURL)
}
func getTLSConfig(argIdx int) (*tls.Config, error) {
c := &promauth.TLSConfig{
func getAuthConfig(argIdx int) (*promauth.Config, error) {
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
passwordFile := basicAuthPasswordFile.GetOptionalArg(argIdx)
var basicAuthCfg *promauth.BasicAuthConfig
if username != "" || password != "" || passwordFile != "" {
basicAuthCfg = &promauth.BasicAuthConfig{
Username: username,
Password: password,
PasswordFile: passwordFile,
}
}
token := bearerToken.GetOptionalArg(argIdx)
tokenFile := bearerTokenFile.GetOptionalArg(argIdx)
var oauth2Cfg *promauth.OAuth2Config
clientSecret := oauth2ClientSecret.GetOptionalArg(argIdx)
clientSecretFile := oauth2ClientSecretFile.GetOptionalArg(argIdx)
if clientSecretFile != "" || clientSecret != "" {
oauth2Cfg = &promauth.OAuth2Config{
ClientID: oauth2ClientID.GetOptionalArg(argIdx),
ClientSecret: clientSecret,
ClientSecretFile: clientSecretFile,
TokenURL: oauth2TokenURL.GetOptionalArg(argIdx),
Scopes: strings.Split(oauth2Scopes.GetOptionalArg(argIdx), ";"),
}
}
tlsCfg := &promauth.TLSConfig{
CAFile: tlsCAFile.GetOptionalArg(argIdx),
CertFile: tlsCertFile.GetOptionalArg(argIdx),
KeyFile: tlsKeyFile.GetOptionalArg(argIdx),
ServerName: tlsServerName.GetOptionalArg(argIdx),
InsecureSkipVerify: tlsInsecureSkipVerify.GetOptionalArg(argIdx),
}
if c.CAFile == "" && c.CertFile == "" && c.KeyFile == "" && c.ServerName == "" && !c.InsecureSkipVerify {
return nil, nil
}
cfg, err := promauth.NewConfig(".", nil, "", "", c)
authCfg, err := promauth.NewConfig(".", nil, basicAuthCfg, token, tokenFile, oauth2Cfg, tlsCfg)
if err != nil {
return nil, fmt.Errorf("cannot populate TLS config: %w", err)
return nil, fmt.Errorf("cannot populate OAuth2 config for remoteWrite idx: %d, err: %w", argIdx, err)
}
tlsCfg := cfg.NewTLSConfig()
return tlsCfg, nil
return authCfg, nil
}
func (c *client) runWorker() {
@@ -178,7 +206,9 @@ func (c *client) runWorker() {
return
}
go func() {
startTime := time.Now()
ch <- c.sendBlock(block)
c.sendDuration.Add(time.Since(startTime).Seconds())
}()
select {
case ok := <-ch:
@@ -226,8 +256,8 @@ again:
h.Set("Content-Type", "application/x-protobuf")
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
if c.authHeader != "" {
req.Header.Set("Authorization", c.authHeader)
if ah := c.authCfg.GetAuthHeader(); ah != "" {
req.Header.Set("Authorization", ah)
}
startTime := time.Now()
@@ -239,7 +269,7 @@ again:
if retryDuration > time.Minute {
retryDuration = time.Minute
}
logger.Errorf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
logger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
t := timerpool.Get(retryDuration)
select {
@@ -260,11 +290,9 @@ again:
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode == 409 || statusCode == 400 {
// Just drop block on 409 status code like Prometheus does.
// Just drop block on 409 and 400 status codes like Prometheus does.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
// drop block on 400 status code,
// not expected that remote server will be able to handle it on retry
// should fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
_ = resp.Body.Close()
c.packetsDropped.Inc()
return true

View File

@@ -27,6 +27,9 @@ var (
// the maximum number of rows to send per each block.
const maxRowsPerBlock = 10000
// the maximum number of labels to send per each block.
const maxLabelsPerBlock = 10 * maxRowsPerBlock
type pendingSeries struct {
mu sync.Mutex
wr writeRequest
@@ -153,7 +156,7 @@ func (wr *writeRequest) push(src []prompbmarshal.TimeSeries) {
for i := range src {
tssDst = append(tssDst, prompbmarshal.TimeSeries{})
wr.copyTimeSeries(&tssDst[len(tssDst)-1], &src[i])
if len(wr.samples) >= maxRowsPerBlock {
if len(wr.samples) >= maxRowsPerBlock || len(wr.labels) >= maxLabelsPerBlock {
wr.tss = tssDst
wr.flush()
tssDst = wr.tss

View File

@@ -16,8 +16,13 @@ var (
unparsedLabelsGlobal = flagutil.NewArray("remoteWrite.label", "Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. "+
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabel_config entries. These entries are applied to all the metrics "+
"before sending them to -remoteWrite.url. See https://victoriametrics.github.io/vmagent.html#relabeling for details")
"before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details")
relabelDebugGlobal = flag.Bool("remoteWrite.relabelDebug", false, "Whether to log metrics before and after relabeling with -remoteWrite.relabelConfig. "+
"If the -remoteWrite.relabelDebug is enabled, then the metrics aren't sent to remote storage. This is useful for debugging the relabeling configs")
relabelConfigPaths = flagutil.NewArray("remoteWrite.urlRelabelConfig", "Optional path to relabel config for the corresponding -remoteWrite.url")
relabelDebug = flagutil.NewArrayBool("remoteWrite.urlRelabelDebug", "Whether to log metrics before and after relabeling with -remoteWrite.urlRelabelConfig. "+
"If the -remoteWrite.urlRelabelDebug is enabled, then the metrics aren't sent to the corresponding -remoteWrite.url. "+
"This is useful for debugging the relabeling configs")
)
var labelsGlobal []prompbmarshal.Label
@@ -31,23 +36,23 @@ func CheckRelabelConfigs() error {
func loadRelabelConfigs() (*relabelConfigs, error) {
var rcs relabelConfigs
if *relabelConfigPathGlobal != "" {
global, err := promrelabel.LoadRelabelConfigs(*relabelConfigPathGlobal)
global, err := promrelabel.LoadRelabelConfigs(*relabelConfigPathGlobal, *relabelDebugGlobal)
if err != nil {
return nil, fmt.Errorf("cannot load -remoteWrite.relabelConfig=%q: %w", *relabelConfigPathGlobal, err)
}
rcs.global = global
}
if len(*relabelConfigPaths) > len(*remoteWriteURLs) {
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url args: %d",
len(*relabelConfigPaths), len(*remoteWriteURLs))
if len(*relabelConfigPaths) > (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)) {
return nil, fmt.Errorf("too many -remoteWrite.urlRelabelConfig args: %d; it mustn't exceed the number of -remoteWrite.url or -remoteWrite.multitenantURL args: %d",
len(*relabelConfigPaths), (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)))
}
rcs.perURL = make([]*promrelabel.ParsedConfigs, len(*remoteWriteURLs))
rcs.perURL = make([]*promrelabel.ParsedConfigs, (len(*remoteWriteURLs) + len(*remoteWriteMultitenantURLs)))
for i, path := range *relabelConfigPaths {
if len(path) == 0 {
// Skip empty relabel config.
continue
}
prc, err := promrelabel.LoadRelabelConfigs(path)
prc, err := promrelabel.LoadRelabelConfigs(path, relabelDebug.GetOptionalArg(i))
if err != nil {
return nil, fmt.Errorf("cannot load relabel configs from -remoteWrite.urlRelabelConfig=%q: %w", path, err)
}

View File

@@ -3,9 +3,14 @@ package remotewrite
import (
"flag"
"fmt"
"strconv"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -13,6 +18,8 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
xxhash "github.com/cespare/xxhash/v2"
)
@@ -20,11 +27,14 @@ import (
var (
remoteWriteURLs = flagutil.NewArray("remoteWrite.url", "Remote storage URL to write data to. It must support Prometheus remote_write API. "+
"It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . "+
"Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems")
"Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL")
remoteWriteMultitenantURLs = flagutil.NewArray("remoteWrite.multitenantURL", "Base path for multitenant remote storage URL to write data to. "+
"See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . "+
"Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
queues = flag.Int("remoteWrite.queues", 4, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
maxPendingBytesPerURL = flagutil.NewBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
@@ -38,16 +48,39 @@ var (
"Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. "+
"By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. "+
"This option may be used for improving data compression for the stored metrics")
sortLabels = flag.Bool("sortLabels", false, `Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. `+
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
maxHourlySeries = flag.Int("remoteWrite.maxHourlySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
maxDailySeries = flag.Int("remoteWrite.maxDailySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter")
)
var rwctxs []*remoteWriteCtx
var (
// rwctxsDefault contains statically populated entries when -remoteWrite.url is specified.
rwctxsDefault []*remoteWriteCtx
// rwctxsMap contains dynamically populated entries when -remoteWrite.multitenantURL is specified.
rwctxsMap = make(map[tenantmetrics.TenantID][]*remoteWriteCtx)
rwctxsMapLock sync.Mutex
// Data without tenant id is written to defaultAuthToken if -remoteWrite.multitenantURL is specified.
defaultAuthToken = &auth.Token{}
)
// MultitenancyEnabled returns true if -remoteWrite.multitenantURL is specified.
func MultitenancyEnabled() bool {
return len(*remoteWriteMultitenantURLs) > 0
}
// Contains the current relabelConfigs.
var allRelabelConfigs atomic.Value
// maxQueues limits the maximum value for `-remoteWrite.queues`. There is no sense in setting too high value,
// since it may lead to high memory usage due to big number of buffers.
var maxQueues = cgroup.AvailableCPUs() * 4
var maxQueues = cgroup.AvailableCPUs() * 16
// InitSecretFlags must be called after flag.Parse and before any logging.
func InitSecretFlags() {
@@ -63,8 +96,29 @@ func InitSecretFlags() {
//
// Stop must be called for graceful shutdown.
func Init() {
if len(*remoteWriteURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
if len(*remoteWriteURLs) == 0 && len(*remoteWriteMultitenantURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` or `-remoteWrite.multitenantURL` command-line flag must be set")
}
if len(*remoteWriteURLs) > 0 && len(*remoteWriteMultitenantURLs) > 0 {
logger.Fatalf("cannot set both `-remoteWrite.url` and `-remoteWrite.multitenantURL` command-line flags")
}
if *maxHourlySeries > 0 {
hourlySeriesLimiter = bloomfilter.NewLimiter(*maxHourlySeries, time.Hour)
_ = metrics.NewGauge(`vmagent_hourly_series_limit_max_series`, func() float64 {
return float64(hourlySeriesLimiter.MaxItems())
})
_ = metrics.NewGauge(`vmagent_hourly_series_limit_current_series`, func() float64 {
return float64(hourlySeriesLimiter.CurrentItems())
})
}
if *maxDailySeries > 0 {
dailySeriesLimiter = bloomfilter.NewLimiter(*maxDailySeries, 24*time.Hour)
_ = metrics.NewGauge(`vmagent_daily_series_limit_max_series`, func() float64 {
return float64(dailySeriesLimiter.MaxItems())
})
_ = metrics.NewGauge(`vmagent_daily_series_limit_current_series`, func() float64 {
return float64(dailySeriesLimiter.CurrentItems())
})
}
if *queues > maxQueues {
*queues = maxQueues
@@ -73,33 +127,23 @@ func Init() {
*queues = 1
}
initLabelsGlobal()
// Register SIGHUP handler for config reload before loadRelabelConfigs.
// This guarantees that the config will be re-read if the signal arrives just after loadRelabelConfig.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
rcs, err := loadRelabelConfigs()
if err != nil {
logger.Fatalf("cannot load relabel configs: %s", err)
}
allRelabelConfigs.Store(rcs)
maxInmemoryBlocks := memory.Allowed() / len(*remoteWriteURLs) / maxRowsPerBlock / 100
if maxInmemoryBlocks > 200 {
// There is no much sense in keeping higher number of blocks in memory,
// since this means that the producer outperforms consumer and the queue
// will continue growing. It is better storing the queue to file.
maxInmemoryBlocks = 200
}
if maxInmemoryBlocks < 2 {
maxInmemoryBlocks = 2
}
for i, remoteWriteURL := range *remoteWriteURLs {
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
if *showRemoteWriteURL {
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
}
rwctx := newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
rwctxs = append(rwctxs, rwctx)
if len(*remoteWriteURLs) > 0 {
rwctxsDefault = newRemoteWriteCtxs(nil, *remoteWriteURLs)
}
// Start config reloader.
sighupCh := procutil.NewSighupChan()
configReloaderWG.Add(1)
go func() {
defer configReloaderWG.Done()
@@ -121,6 +165,37 @@ func Init() {
}()
}
func newRemoteWriteCtxs(at *auth.Token, urls []string) []*remoteWriteCtx {
if len(urls) == 0 {
logger.Panicf("BUG: urls must be non-empty")
}
maxInmemoryBlocks := memory.Allowed() / len(urls) / maxRowsPerBlock / 100
if maxInmemoryBlocks > 400 {
// There is no much sense in keeping higher number of blocks in memory,
// since this means that the producer outperforms consumer and the queue
// will continue growing. It is better storing the queue to file.
maxInmemoryBlocks = 400
}
if maxInmemoryBlocks < 2 {
maxInmemoryBlocks = 2
}
rwctxs := make([]*remoteWriteCtx, len(urls))
for i, remoteWriteURL := range urls {
sanitizedURL := fmt.Sprintf("%d:secret-url", i+1)
if at != nil {
// Construct full remote_write url for the given tenant according to https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
remoteWriteURL = fmt.Sprintf("%s/insert/%d:%d/prometheus/api/v1/write", remoteWriteURL, at.AccountID, at.ProjectID)
sanitizedURL = fmt.Sprintf("%s:%d:%d", sanitizedURL, at.AccountID, at.ProjectID)
}
if *showRemoteWriteURL {
sanitizedURL = fmt.Sprintf("%d:%s", i+1, remoteWriteURL)
}
rwctxs[i] = newRemoteWriteCtx(i, remoteWriteURL, maxInmemoryBlocks, sanitizedURL)
}
return rwctxs
}
var stopCh = make(chan struct{})
var configReloaderWG sync.WaitGroup
@@ -131,16 +206,62 @@ func Stop() {
close(stopCh)
configReloaderWG.Wait()
for _, rwctx := range rwctxs {
for _, rwctx := range rwctxsDefault {
rwctx.MustStop()
}
rwctxs = nil
rwctxsDefault = nil
// There is no need in locking rwctxsMapLock here, since nobody should call Push during the Stop call.
for _, rwctxs := range rwctxsMap {
for _, rwctx := range rwctxs {
rwctx.MustStop()
}
}
rwctxsMap = nil
if sl := hourlySeriesLimiter; sl != nil {
sl.MustStop()
}
if sl := dailySeriesLimiter; sl != nil {
sl.MustStop()
}
}
// Push sends wr to remote storage systems set via `-remoteWrite.url`.
//
// Note that wr may be modified by Push due to relabeling and rounding.
func Push(wr *prompbmarshal.WriteRequest) {
PushWithAuthToken(nil, wr)
}
// PushWithAuthToken sends wr to remote storage systems set via `-remoteWrite.multitenantURL`.
//
// Note that wr may be modified by Push due to relabeling and rounding.
func PushWithAuthToken(at *auth.Token, wr *prompbmarshal.WriteRequest) {
if at == nil && len(*remoteWriteMultitenantURLs) > 0 {
// Write data to default tenant if at isn't set while -remoteWrite.multitenantURL is set.
at = defaultAuthToken
}
var rwctxs []*remoteWriteCtx
if at == nil {
rwctxs = rwctxsDefault
} else {
if len(*remoteWriteMultitenantURLs) == 0 {
logger.Panicf("BUG: remoteWriteMultitenantURLs must be non-empty for non-nil at")
}
rwctxsMapLock.Lock()
tenantID := tenantmetrics.TenantID{
AccountID: at.AccountID,
ProjectID: at.ProjectID,
}
rwctxs = rwctxsMap[tenantID]
if rwctxs == nil {
rwctxs = newRemoteWriteCtxs(at, *remoteWriteMultitenantURLs)
rwctxsMap[tenantID] = rwctxs
}
rwctxsMapLock.Unlock()
}
var rctx *relabelCtx
rcs := allRelabelConfigs.Load().(*relabelConfigs)
pcsGlobal := rcs.global
@@ -151,11 +272,13 @@ func Push(wr *prompbmarshal.WriteRequest) {
for len(tss) > 0 {
// Process big tss in smaller blocks in order to reduce the maximum memory usage
samplesCount := 0
labelsCount := 0
i := 0
for i < len(tss) {
samplesCount += len(tss[i].Samples)
labelsCount += len(tss[i].Labels)
i++
if samplesCount > maxRowsPerBlock {
if samplesCount >= maxRowsPerBlock || labelsCount >= maxLabelsPerBlock {
break
}
}
@@ -171,8 +294,12 @@ func Push(wr *prompbmarshal.WriteRequest) {
tssBlock = rctx.applyRelabeling(tssBlock, labelsGlobal, pcsGlobal)
globalRelabelMetricsDropped.Add(tssBlockLen - len(tssBlock))
}
for _, rwctx := range rwctxs {
rwctx.Push(tssBlock)
sortLabelsIfNeeded(tssBlock)
tssBlock = limitSeriesCardinality(tssBlock)
if len(tssBlock) > 0 {
for _, rwctx := range rwctxs {
rwctx.Push(tssBlock)
}
}
if rctx != nil {
rctx.reset()
@@ -183,6 +310,87 @@ func Push(wr *prompbmarshal.WriteRequest) {
}
}
// sortLabelsIfNeeded sorts labels if -sortLabels command-line flag is set.
func sortLabelsIfNeeded(tss []prompbmarshal.TimeSeries) {
if !*sortLabels {
return
}
for i := range tss {
promrelabel.SortLabels(tss[i].Labels)
}
}
func limitSeriesCardinality(tss []prompbmarshal.TimeSeries) []prompbmarshal.TimeSeries {
if hourlySeriesLimiter == nil && dailySeriesLimiter == nil {
return tss
}
dst := make([]prompbmarshal.TimeSeries, 0, len(tss))
for i := range tss {
labels := tss[i].Labels
h := getLabelsHash(labels)
if hourlySeriesLimiter != nil && !hourlySeriesLimiter.Add(h) {
hourlySeriesLimitRowsDropped.Add(len(tss[i].Samples))
logSkippedSeries(labels, "-remoteWrite.maxHourlySeries", hourlySeriesLimiter.MaxItems())
continue
}
if dailySeriesLimiter != nil && !dailySeriesLimiter.Add(h) {
dailySeriesLimitRowsDropped.Add(len(tss[i].Samples))
logSkippedSeries(labels, "-remoteWrite.maxDailySeries", dailySeriesLimiter.MaxItems())
continue
}
dst = append(dst, tss[i])
}
return dst
}
var (
hourlySeriesLimiter *bloomfilter.Limiter
dailySeriesLimiter *bloomfilter.Limiter
hourlySeriesLimitRowsDropped = metrics.NewCounter(`vmagent_hourly_series_limit_rows_dropped_total`)
dailySeriesLimitRowsDropped = metrics.NewCounter(`vmagent_daily_series_limit_rows_dropped_total`)
)
func getLabelsHash(labels []prompbmarshal.Label) uint64 {
bb := labelsHashBufPool.Get()
b := bb.B[:0]
for _, label := range labels {
b = append(b, label.Name...)
b = append(b, label.Value...)
}
h := xxhash.Sum64(b)
bb.B = b
labelsHashBufPool.Put(bb)
return h
}
var labelsHashBufPool bytesutil.ByteBufferPool
func logSkippedSeries(labels []prompbmarshal.Label, flagName string, flagValue int) {
select {
case <-logSkippedSeriesTicker.C:
logger.Warnf("skip series %s because %s=%d reached", labelsToString(labels), flagName, flagValue)
default:
}
}
var logSkippedSeriesTicker = time.NewTicker(5 * time.Second)
func labelsToString(labels []prompbmarshal.Label) string {
var b []byte
b = append(b, '{')
for i, label := range labels {
b = append(b, label.Name...)
b = append(b, '=')
b = strconv.AppendQuote(b, label.Value)
if i+1 < len(labels) {
b = append(b, ',')
}
}
b = append(b, '}')
return string(b)
}
var globalRelabelMetricsDropped = metrics.NewCounter("vmagent_remotewrite_global_relabel_metrics_dropped_total")
type remoteWriteCtx struct {
@@ -208,7 +416,13 @@ func newRemoteWriteCtx(argIdx int, remoteWriteURL string, maxInmemoryBlocks int,
c := newClient(argIdx, remoteWriteURL, sanitizedURL, fq, *queues)
sf := significantFigures.GetOptionalArgOrDefault(argIdx, 0)
rd := roundDigits.GetOptionalArgOrDefault(argIdx, 100)
pss := make([]*pendingSeries, *queues)
pssLen := *queues
if n := cgroup.AvailableCPUs(); pssLen > n {
// There is no sense in running more than availableCPUs concurrent pendingSeries,
// since every pendingSeries can saturate up to a single CPU.
pssLen = n
}
pss := make([]*pendingSeries, pssLen)
for i := range pss {
pss[i] = newPendingSeries(fq.MustWriteBlock, sf, rd)
}

View File

@@ -1,40 +1,54 @@
package vmimport
import (
"io"
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
parser "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/writeconcurrencylimiter"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="vmimport"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="vmimport"}`)
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="vmimport"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="vmimport"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="vmimport"}`)
)
// InsertHandler processes `/api/v1/import` request.
//
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/6
func InsertHandler(req *http.Request) error {
func InsertHandler(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(req, func(rows []parser.Row) error {
return insertRows(rows, extraLabels)
isGzipped := req.Header.Get("Content-Encoding") == "gzip"
return parser.ParseStream(req.Body, isGzipped, func(rows []parser.Row) error {
return insertRows(at, rows, extraLabels)
})
})
}
func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
// InsertHandlerForReader processes metrics from given reader
func InsertHandlerForReader(r io.Reader, isGzipped bool) error {
return writeconcurrencylimiter.Do(func() error {
return parser.ParseStream(r, isGzipped, func(rows []parser.Row) error {
return insertRows(nil, rows, nil)
})
})
}
func insertRows(at *auth.Token, rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
@@ -74,8 +88,11 @@ func insertRows(rows []parser.Row, extraLabels []prompbmarshal.Label) error {
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
remotewrite.Push(&ctx.WriteRequest)
remotewrite.PushWithAuthToken(at, &ctx.WriteRequest)
rowsInserted.Add(rowsTotal)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsTotal)
}
rowsPerInsert.Update(float64(rowsTotal))
return nil
}

View File

@@ -56,6 +56,7 @@ test-vmalert:
go test -v -race -cover ./app/vmalert/datasource
go test -v -race -cover ./app/vmalert/notifier
go test -v -race -cover ./app/vmalert/config
go test -v -race -cover ./app/vmalert/remotewrite
run-vmalert: vmalert
./bin/vmalert -rule=app/vmalert/config/testdata/rules2-good.rules \
@@ -66,7 +67,17 @@ run-vmalert: vmalert
-remoteRead.url=http://localhost:8428 \
-external.label=cluster=east-1 \
-external.label=replica=a \
-evaluationInterval=3s
-evaluationInterval=3s \
-rule.configCheckInterval=10s
replay-vmalert: vmalert
./bin/vmalert -rule=app/vmalert/config/testdata/rules-replay-good.rules \
-datasource.url=http://localhost:8428 \
-remoteWrite.url=http://localhost:8428 \
-external.label=cluster=east-1 \
-external.label=replica=a \
-replay.timeFrom=2021-05-11T07:21:43Z \
-replay.timeTo=2021-05-29T18:40:43Z
vmalert-amd64:
CGO_ENABLED=1 GOARCH=amd64 $(MAKE) vmalert-local-with-goarch

View File

@@ -1,30 +1,31 @@
## vmalert
# vmalert
`vmalert` executes a list of given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
`vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
rules against configured address.
rules against configured address. It is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
implementation and aims to be compatible with its syntax.
### Features:
## Features
* Integration with [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) TSDB;
* VictoriaMetrics [MetricsQL](https://victoriametrics.github.io/MetricsQL.html)
* VictoriaMetrics [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html)
support and expressions validation;
* Prometheus [alerting rules definition format](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#defining-alerting-rules)
support;
* Integration with [Alertmanager](https://github.com/prometheus/alertmanager);
* Keeps the alerts [state on restarts](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmalert#alerts-state-on-restarts);
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite) for details.
* Keeps the alerts [state on restarts](#alerts-state-on-restarts);
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite);
* Recording and Alerting rules backfilling (aka `replay`). See [these docs](#rules-backfilling);
* Lightweight without extra dependencies.
### Limitations:
## Limitations
* `vmalert` execute queries against remote datasource which has reliability risks because of network.
It is recommended to configure alerts thresholds and rules expressions with understanding that network request
may fail;
* by default, rules execution is sequential within one group, but persisting of execution results to remote
storage is asynchronous. Hence, user shouldn't rely on recording rules chaining when result of previous
recording rule is reused in next one;
* `vmalert` has no UI, just an API for getting groups and rules statuses.
### QuickStart
## QuickStart
To build `vmalert` from sources:
```
@@ -39,21 +40,23 @@ To start using `vmalert` you will need the following things:
* datasource address - reachable VictoriaMetrics instance for rules execution;
* notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
aggregating alerts and sending notifications.
* remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional.
* remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries.
Then configure `vmalert` accordingly:
```
./bin/vmalert -rule=alert.rules \
./bin/vmalert -rule=alert.rules \ # Path to the file with rules configuration. Supports wildcard
-datasource.url=http://localhost:8428 \ # PromQL compatible datasource
-notifier.url=http://localhost:9093 \ # AlertManager URL
-notifier.url=http://127.0.0.1:9093 \ # AlertManager replica URL
-remoteWrite.url=http://localhost:8428 \ # remote write compatible storage to persist rules
-remoteRead.url=http://localhost:8428 \ # PromQL compatible datasource to restore alerts state from
-remoteWrite.url=http://localhost:8428 \ # Remote write compatible storage to persist rules
-remoteRead.url=http://localhost:8428 \ # MetricsQL compatible datasource to restore alerts state from
-external.label=cluster=east-1 \ # External label to be applied for each rule
-external.label=replica=a # Multiple external labels may be set
```
See the fill list of configuration flags in [configuration](#configuration) section.
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
@@ -61,23 +64,23 @@ Configuration for [recording](https://prometheus.io/docs/prometheus/latest/confi
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
similar to Prometheus rules and configured using YAML. Configuration examples may be found
in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder.
Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups:
Every `rule` belongs to a `group` and every configuration file may contain arbitrary number of groups:
```yaml
groups:
[ - <rule_group> ]
```
#### Groups
### Groups
Each group has following attributes:
Each group has the following attributes:
```yaml
# The name of the group. Must be unique within a file.
name: <string>
# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]
[ interval: <duration> | default = -evaluationInterval flag ]
# How many rules execute at once. Increasing concurrency may speed
# How many rules execute at once within a group. Increasing concurrency may speed
# up round execution speed.
[ concurrency: <integer> | default = 1 ]
@@ -85,26 +88,44 @@ name: <string>
# By default "prometheus" rule type is used.
[ type: <string> ]
# Optional list of label filters applied to every rule's
# request withing a group. Is compatible only with VM datasource.
# See more details at https://docs.victoriametrics.com#prometheus-querying-api-enhancements
extra_filter_labels:
[ <labelname>: <labelvalue> ... ]
# Optional list of labels added to every rule within a group.
# It has priority over the external labels.
# Labels are commonly used for adding environment
# or tenant-specific tag.
labels:
[ <labelname>: <labelvalue> ... ]
rules:
[ - <rule> ... ]
```
#### Rules
### Rules
Every rule contains `expr` field for [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/)
or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) expression. Vmalert will execute the configured
expression and then act according to the Rule type.
There are two types of Rules:
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allows to define alert conditions via [MetricsQL](https://victoriametrics.github.io/MetricsQL.html)
and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager).
Alerting rules allows to define alert conditions via `expr` field and to send notifications
[Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty.
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow you to precompute frequently needed or computationally expensive expressions
and save their result as a new set of time series.
Recording rules allows to define `expr` which result will be than backfilled to configured
`-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series.
`vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels
within one group.
##### Alerting rules
#### Alerting rules
The syntax for alerting rule is following:
The syntax for alerting rule is the following:
```yaml
# The name of the alert. Must be a valid metric name.
alert: <string>
@@ -114,12 +135,14 @@ alert: <string>
[ type: <string> ]
# The expression to evaluate. The expression language depends on the type value.
# By default MetricsQL expression is used. If type="graphite", then the expression
# By default PromQL/MetricsQL expression is used. If type="graphite", then the expression
# must contain valid Graphite expression.
expr: <string>
# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
# If param is omitted or set to 0 then alerts will be immediately considered
# as firing once they return.
[ for: <duration> | default = 0s ]
# Labels to add or overwrite for each alert.
@@ -131,7 +154,12 @@ annotations:
[ <labelname>: <tmpl_string> ]
```
##### Recording rules
It is allowed to use [Go templating](https://golang.org/pkg/text/template/) in annotations
to format data, iterate over it or execute expressions.
Additionally, `vmalert` provides some extra templating functions
listed [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/notifier/template_func.go).
#### Recording rules
The syntax for recording rules is following:
```yaml
@@ -152,28 +180,69 @@ labels:
[ <labelname>: <labelvalue> ]
```
For recording rules to work `-remoteWrite.url` must specified.
For recording rules to work `-remoteWrite.url` must be specified.
#### Alerts state on restarts
### Alerts state on restarts
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert`
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after restart of `vmalert`
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or VMInsert (Cluster). `vmalert` will persist alerts state
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
These are regular time series and may be queried from VM just as any other time series.
The state stored to the configured address on every rule evaluation.
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or VMSelect (Cluster). `vmalert` will try to restore alerts state
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state
from configured address by querying time series with name `ALERTS_FOR_STATE`.
Both flags are required for the proper state restoring. Restore process may fail if time series are missing
in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert`
rules configuration.
in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`)
or received state doesn't match current `vmalert` rules configuration.
#### WEB
### Multitenancy
There are the following approaches for alerting and recording rules across
[multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy):
* To run a separate `vmalert` instance per each tenant.
The corresponding tenant must be specified in `-datasource.url` command-line flag
according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
For example, `/path/to/vmalert -datasource.url=http://vmselect:8481/select/123/prometheus`
would run alerts against `AccountID=123`. For recording rules the `-remoteWrite.url` command-line
flag must contain the url for the specific tenant as well.
For example, `-remoteWrite.url=http://vminsert:8480/insert/123/prometheus` would write recording
rules to `AccountID=123`.
* To specify `tenant` parameter per each alerting and recording group if
[enterprise version of vmalert](https://victoriametrics.com/enterprise.html) is used
with `-clusterMode` command-line flag. For example:
```yaml
groups:
- name: rules_for_tenant_123
tenant: "123"
rules:
# Rules for accountID=123
- name: rules_for_tenant_456:789
tenant: "456:789"
rules:
# Rules for accountID=456, projectID=789
```
If `-clusterMode` is enabled, then `-datasource.url`, `-remoteRead.url` and `-remoteWrite.url` must
contain only the hostname without tenant id. For example: `-datasource.url=http://vmselect:8481`.
`vmalert` automatically adds the specified tenant to urls per each recording rule in this case.
The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz` files
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### WEB
`vmalert` runs a web-server (`-httpListenAddr`) for serving metrics and alerts endpoints:
* `http://<vmalert-addr>` - UI;
* `http://<vmalert-addr>/api/v1/groups` - list of all loaded groups and rules;
* `http://<vmalert-addr>/api/v1/alerts` - list of all active alerts;
* `http://<vmalert-addr>/api/v1/<groupID>/<alertID>/status" ` - get alert status by ID.
@@ -182,7 +251,7 @@ Used as alert source in AlertManager.
* `http://<vmalert-addr>/-/reload` - hot configuration reload.
### Graphite
## Graphite
vmalert sends requests to `<-datasource.url>/render?format=json` during evaluation of alerting and recording rules
if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render`
@@ -190,25 +259,136 @@ implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/rende
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on query type.
## Rules backfilling
### Configuration
vmalert supports alerting and recording rules backfilling (aka `replay`). In replay mode vmalert
can read the same rules configuration as normally, evaluate them on the given time range and backfill
results via remote write to the configured storage. vmalert supports any PromQL/MetricsQL compatible
data source for backfilling.
### How it works
In `replay` mode vmalert works as a cli-tool and exits immediately after work is done.
To run vmalert in `replay` mode:
```
./bin/vmalert -rule=path/to/your.rules \ # path to files with rules you usually use with vmalert
-datasource.url=http://localhost:8428 \ # PromQL/MetricsQL compatible datasource
-remoteWrite.url=http://localhost:8428 \ # remote write compatible storage to persist results
-replay.timeFrom=2021-05-11T07:21:43Z \ # time from begin replay
-replay.timeTo=2021-05-29T18:40:43Z # time to finish replay
```
The output of the command will look like the following:
```
Replay mode:
from: 2021-05-11 07:21:43 +0000 UTC # set by -replay.timeFrom
to: 2021-05-29 18:40:43 +0000 UTC # set by -replay.timeTo
max data points per request: 1000 # set by -replay.maxDatapointsPerQuery
Group "ReplayGroup"
interval: 1m0s
requests to make: 27
max range per request: 16h40m0s
> Rule "type:vm_cache_entries:rate5m" (ID: 1792509946081842725)
27 / 27 [----------------------------------------------------------------------------------------------------] 100.00% 78 p/s
> Rule "go_cgo_calls_count:rate5m" (ID: 17958425467471411582)
27 / 27 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
Group "vmsingleReplay"
interval: 30s
requests to make: 54
max range per request: 8h20m0s
> Rule "RequestErrorsToAPI" (ID: 17645863024999990222)
54 / 54 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
> Rule "TooManyLogs" (ID: 9042195394653477652)
54 / 54 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
2021-06-07T09:59:12.098Z info app/vmalert/replay.go:68 replay finished! Imported 511734 samples
```
In `replay` mode all groups are executed sequentially one-by-one. Rules within the group are
executed sequentially as well (`concurrency` setting is ignored). Vmalert sends rule's expression
to [/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) endpoint
of the configured `-datasource.url`. Returned data then processed according to the rule type and
backfilled to `-remoteWrite.url` via [Remote Write protocol](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations).
Vmalert respects `evaluationInterval` value set by flag or per-group during the replay.
Vmalert automatically disables caching on VictoriaMetrics side by sending `nocache=1` param. It allows
to prevent cache pollution and unwanted time range boundaries adjustment during backfilling.
#### Recording rules
Result of recording rules `replay` should match with results of normal rules evaluation.
#### Alerting rules
Result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts).
To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression:
```
ALERTS{alertname="your_alertname", alertstate="firing"}
```
Execute the query against storage which was used for `-remoteWrite.url` during the `replay`.
### Additional configuration
There are following non-required `replay` flags:
* `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request.
In two words, it affects the max time range for every `/query_range` request. The higher the value,
the less requests will be issued during `replay`.
* `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries
per rule before giving up.
* `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining
(rules which depend on each other) rules. It is expected, that remote storage will be able to persist
previously accepted data during the delay, so data will be available for the subsequent queries.
Keep it equal or bigger than `-remoteWrite.flushInterval`.
See full description for these flags in `./vmalert --help`.
### Limitations
* Graphite engine isn't supported yet;
* `query` template function is disabled for performance reasons (might be changed in future);
## Monitoring
`vmalert` exports various metrics in Prometheus exposition format at `http://vmalert-host:8880/metrics` page.
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
metrics may be analyzed later.
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview.
If you have suggestions for improvements or have found a bug - please open an issue on github or add
a review to the dashboard.
## Configuration
Pass `-help` to `vmalert` in order to see the full list of supported
command-line flags with their descriptions.
The shortlist of configuration flags is the following:
```
-datasource.appendTypePrefix
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to VMSelect URL.
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
-datasource.basicAuth.password string
Optional basic auth password for -datasource.url
-datasource.basicAuth.passwordFile string
Optional path to basic auth password to use for -datasource.url
-datasource.basicAuth.username string
Optional basic auth username for -datasource.url
-datasource.bearerToken string
Optional bearer auth token to use for -datasource.url.
-datasource.bearerTokenFile string
Optional path to bearer token file to use for -datasource.url.
-datasource.lookback duration
Lookback defines how far to look into past when evaluating queries. For example, if datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
-datasource.maxIdleConnections int
Defines the number of idle (keep-alive connections) to configured datasource.Consider to set this value equal to the value: groups_total * group.concurrency. Too low value may result into high number of sockets in TIME_WAIT state. (default 100)
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
-datasource.queryStep duration
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
-datasource.roundDigits int
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
-datasource.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default system CA is used
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
-datasource.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
-datasource.tlsInsecureSkipVerify
@@ -216,15 +396,17 @@ The shortlist of configuration flags is the following:
-datasource.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
-datasource.tlsServerName string
Optional TLS server name to use for connections to -datasource.url. By default the server name from -datasource.url is used
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
-datasource.url string
Victoria Metrics or VMSelect url. Required parameter. E.g. http://127.0.0.1:8428
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
-disableAlertgroupLabel
Whether to disable adding group's name as label to generated alerts and time series.
-dryRun -rule
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-evaluationInterval duration
@@ -234,23 +416,23 @@ The shortlist of configuration flags is the following:
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-external.url string
External URL is used as alert's source for sent alerts to the notifier
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
@@ -260,7 +442,7 @@ The shortlist of configuration flags is the following:
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
@@ -270,44 +452,52 @@ The shortlist of configuration flags is the following:
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-notifier.basicAuth.password array
Optional basic auth password for -notifier.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.basicAuth.username array
Optional basic auth username for -notifier.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsInsecureSkipVerify array
Whether to skip tls verification when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
-notifier.tlsKeyFile array
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsServerName array
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-remoteRead.basicAuth.password string
Optional basic auth password for -remoteRead.url
-remoteRead.basicAuth.passwordFile string
Optional path to basic auth password to use for -remoteRead.url
-remoteRead.basicAuth.username string
Optional basic auth username for -remoteRead.url
-remoteRead.bearerToken string
Optional bearer auth token to use for -remoteRead.url.
-remoteRead.bearerTokenFile string
Optional path to bearer token file to use for -remoteRead.url.
-remoteRead.ignoreRestoreErrors
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.tlsCAFile string
@@ -321,13 +511,21 @@ The shortlist of configuration flags is the following:
-remoteRead.tlsServerName string
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
-remoteRead.url vmalert
Optional URL to Victoria Metrics or VMSelect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
-remoteWrite.basicAuth.password string
Optional basic auth password for -remoteWrite.url
-remoteWrite.basicAuth.passwordFile string
Optional path to basic auth password to use for -remoteWrite.url
-remoteWrite.basicAuth.username string
Optional basic auth username for -remoteWrite.url
-remoteWrite.bearerToken string
Optional bearer auth token to use for -remoteWrite.url.
-remoteWrite.bearerTokenFile string
Optional path to bearer token file to use for -remoteWrite.url.
-remoteWrite.concurrency int
Defines number of writers for concurrent writing into remote querier (default 1)
-remoteWrite.disablePathAppend
Whether to disable automatic appending of '/api/v1/write' path to the configured -remoteWrite.url.
-remoteWrite.flushInterval duration
Defines interval of flushes to remote write endpoint (default 5s)
-remoteWrite.maxBatchSize int
@@ -345,7 +543,17 @@ The shortlist of configuration flags is the following:
-remoteWrite.tlsServerName string
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
-remoteWrite.url string
Optional URL to Victoria Metrics or VMInsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. For example, if -remoteWrite.url=http://127.0.0.1:8428 is specified, then the alerts state will be written to http://127.0.0.1:8428/api/v1/write . See also -remoteWrite.disablePathAppend
-replay.maxDatapointsPerQuery int
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
-replay.ruleRetryAttempts int
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
-replay.rulesDelay duration
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule.Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
-replay.timeFrom string
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
-replay.timeTo string
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
@@ -354,7 +562,11 @@ The shortlist of configuration flags is the following:
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Supports array of values separated by comma or specified via multiple flags.
Supports an array of values separated by comma or specified via multiple flags.
-rule.configCheckInterval duration
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-rule.maxResolveDuration duration
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
@@ -362,56 +574,56 @@ The shortlist of configuration flags is the following:
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```
Pass `-help` to `vmalert` in order to see the full list of supported
command-line flags with their descriptions.
`vmalert` supports "hot" config reload via the following methods:
* send SIGHUP signal to `vmalert` process;
* send GET request to `/-/reload` endpoint;
* configure `-rule.configCheckInterval` flag for periodic reload
on config change.
To reload configuration without `vmalert` restart send SIGHUP signal
or send GET request to `/-/reload` endpoint.
### Contributing
## Contributing
`vmalert` is mostly designed and built by VictoriaMetrics community.
Feel free to share your experience and ideas for improving this
software. Please keep simplicity as the main priority.
### How to build from sources
## How to build from sources
It is recommended using
[binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
- `vmalert` is located in `vmutils-*` archives there.
#### Development build
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmalert` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert` binary and puts it into the `bin` folder.
#### Production build
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmalert-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert-prod` binary and puts it into the `bin` folder.
#### ARM build
### ARM build
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
#### Development ARM build
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmalert-arm` or `make vmalert-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert-arm` or `vmalert-arm64` binary respectively and puts it into the `bin` folder.
#### Production ARM build
### Production ARM build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmalert-arm-prod` or `make vmalert-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).

View File

@@ -19,15 +19,18 @@ import (
// AlertingRule is basic alert entity
type AlertingRule struct {
Type datasource.Type
RuleID uint64
Name string
Expr string
For time.Duration
Labels map[string]string
Annotations map[string]string
GroupID uint64
GroupName string
Type datasource.Type
RuleID uint64
Name string
Expr string
For time.Duration
Labels map[string]string
Annotations map[string]string
GroupID uint64
GroupName string
EvalInterval time.Duration
q datasource.Querier
// guard status fields
mu sync.RWMutex
@@ -39,6 +42,9 @@ type AlertingRule struct {
// resets on every successful Exec
// may be used as Health state
lastExecError error
// stores the number of samples returned during
// the last evaluation
lastExecSamples int
metrics *alertingRuleMetrics
}
@@ -47,28 +53,35 @@ type alertingRuleMetrics struct {
errors *gauge
pending *gauge
active *gauge
samples *gauge
}
func newAlertingRule(group *Group, cfg config.Rule) *AlertingRule {
func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *AlertingRule {
ar := &AlertingRule{
Type: cfg.Type,
RuleID: cfg.ID,
Name: cfg.Alert,
Expr: cfg.Expr,
For: cfg.For.Duration(),
Labels: cfg.Labels,
Annotations: cfg.Annotations,
GroupID: group.ID(),
GroupName: group.Name,
alerts: make(map[uint64]*notifier.Alert),
metrics: &alertingRuleMetrics{},
Type: cfg.Type,
RuleID: cfg.ID,
Name: cfg.Alert,
Expr: cfg.Expr,
For: cfg.For.Duration(),
Labels: cfg.Labels,
Annotations: cfg.Annotations,
GroupID: group.ID(),
GroupName: group.Name,
EvalInterval: group.Interval,
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: &cfg.Type,
EvaluationInterval: group.Interval,
ExtraLabels: group.ExtraFilterLabels,
}),
alerts: make(map[uint64]*notifier.Alert),
metrics: &alertingRuleMetrics{},
}
labels := fmt.Sprintf(`alertname=%q, group=%q, id="%d"`, ar.Name, group.Name, ar.ID())
ar.metrics.pending = getOrCreateGauge(fmt.Sprintf(`vmalert_alerts_pending{%s}`, labels),
func() float64 {
ar.mu.Lock()
defer ar.mu.Unlock()
ar.mu.RLock()
defer ar.mu.RUnlock()
var num int
for _, a := range ar.alerts {
if a.State == notifier.StatePending {
@@ -79,8 +92,8 @@ func newAlertingRule(group *Group, cfg config.Rule) *AlertingRule {
})
ar.metrics.active = getOrCreateGauge(fmt.Sprintf(`vmalert_alerts_firing{%s}`, labels),
func() float64 {
ar.mu.Lock()
defer ar.mu.Unlock()
ar.mu.RLock()
defer ar.mu.RUnlock()
var num int
for _, a := range ar.alerts {
if a.State == notifier.StateFiring {
@@ -89,15 +102,21 @@ func newAlertingRule(group *Group, cfg config.Rule) *AlertingRule {
}
return float64(num)
})
ar.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_alerts_error{%s}`, labels),
ar.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_alerting_rules_error{%s}`, labels),
func() float64 {
ar.mu.Lock()
defer ar.mu.Unlock()
ar.mu.RLock()
defer ar.mu.RUnlock()
if ar.lastExecError == nil {
return 0
}
return 1
})
ar.metrics.samples = getOrCreateGauge(fmt.Sprintf(`vmalert_alerting_rules_last_evaluation_samples{%s}`, labels),
func() float64 {
ar.mu.RLock()
defer ar.mu.RUnlock()
return float64(ar.lastExecSamples)
})
return ar
}
@@ -106,6 +125,7 @@ func (ar *AlertingRule) Close() {
metrics.UnregisterMetric(ar.metrics.active.name)
metrics.UnregisterMetric(ar.metrics.pending.name)
metrics.UnregisterMetric(ar.metrics.errors.name)
metrics.UnregisterMetric(ar.metrics.samples.name)
}
// String implements Stringer interface
@@ -119,15 +139,73 @@ func (ar *AlertingRule) ID() uint64 {
return ar.RuleID
}
// ExecRange executes alerting rule on the given time range similarly to Exec.
// It doesn't update internal states of the Rule and meant to be used just
// to get time series for backfilling.
// It returns ALERT and ALERT_FOR_STATE time series as result.
func (ar *AlertingRule) ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
series, err := ar.q.QueryRange(ctx, ar.Expr, start, end)
if err != nil {
return nil, err
}
var result []prompbmarshal.TimeSeries
qFn := func(query string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported in replay mode")
}
for _, s := range series {
// extra labels could contain templates, so we expand them first
labels, err := expandLabels(s, qFn, ar)
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %s", err)
}
for k, v := range labels {
// apply extra labels to datasource
// so the hash key will be consistent on restore
s.SetLabel(k, v)
}
a, err := ar.newAlert(s, time.Time{}, qFn) // initial alert
if err != nil {
return nil, fmt.Errorf("failed to create alert: %s", err)
}
if ar.For == 0 { // if alert is instant
a.State = notifier.StateFiring
for i := range s.Values {
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
}
continue
}
// if alert with For > 0
prevT := time.Time{}
//activeAt := time.Time{}
for i := range s.Values {
at := time.Unix(s.Timestamps[i], 0)
if at.Sub(prevT) > ar.EvalInterval {
// reset to Pending if there are gaps > EvalInterval between DPs
a.State = notifier.StatePending
//activeAt = at
a.Start = at
} else if at.Sub(a.Start) >= ar.For {
a.State = notifier.StateFiring
}
prevT = at
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
}
}
return result, nil
}
// Exec executes AlertingRule expression via the given Querier.
// Based on the Querier results AlertingRule maintains notifier.Alerts
func (ar *AlertingRule) Exec(ctx context.Context, q datasource.Querier, series bool) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := q.Query(ctx, ar.Expr, ar.Type)
func (ar *AlertingRule) Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := ar.q.Query(ctx, ar.Expr)
ar.mu.Lock()
defer ar.mu.Unlock()
ar.lastExecError = err
ar.lastExecTime = time.Now()
ar.lastExecSamples = len(qMetrics)
if err != nil {
return nil, fmt.Errorf("failed to execute query %q: %w", ar.Expr, err)
}
@@ -139,7 +217,7 @@ func (ar *AlertingRule) Exec(ctx context.Context, q datasource.Querier, series b
}
}
qFn := func(query string) ([]datasource.Metric, error) { return q.Query(ctx, query, ar.Type) }
qFn := func(query string) ([]datasource.Metric, error) { return ar.q.Query(ctx, query) }
updated := make(map[uint64]struct{})
// update list of active alerts
for _, m := range qMetrics {
@@ -161,9 +239,9 @@ func (ar *AlertingRule) Exec(ctx context.Context, q datasource.Querier, series b
}
updated[h] = struct{}{}
if a, ok := ar.alerts[h]; ok {
if a.Value != m.Value {
if a.Value != m.Values[0] {
// update Value field with latest value
a.Value = m.Value
a.Value = m.Values[0]
// and re-exec template since Value can be used
// in annotations
a.Annotations, err = a.ExecTemplate(qFn, ar.Annotations)
@@ -201,10 +279,7 @@ func (ar *AlertingRule) Exec(ctx context.Context, q datasource.Querier, series b
alertsFired.Inc()
}
}
if series {
return ar.toTimeSeries(ar.lastExecTime), nil
}
return nil, nil
return ar.toTimeSeries(ar.lastExecTime.Unix()), nil
}
func expandLabels(m datasource.Metric, q notifier.QueryFn, ar *AlertingRule) (map[string]string, error) {
@@ -214,13 +289,13 @@ func expandLabels(m datasource.Metric, q notifier.QueryFn, ar *AlertingRule) (ma
}
tpl := notifier.AlertTplData{
Labels: metricLabels,
Value: m.Value,
Value: m.Values[0],
Expr: ar.Expr,
}
return notifier.ExecTemplate(q, ar.Labels, tpl)
}
func (ar *AlertingRule) toTimeSeries(timestamp time.Time) []prompbmarshal.TimeSeries {
func (ar *AlertingRule) toTimeSeries(timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
for _, a := range ar.alerts {
if a.State == notifier.StateInactive {
@@ -244,6 +319,8 @@ func (ar *AlertingRule) UpdateWith(r Rule) error {
ar.For = nr.For
ar.Labels = nr.Labels
ar.Annotations = nr.Annotations
ar.EvalInterval = nr.EvalInterval
ar.q = nr.q
return nil
}
@@ -271,13 +348,15 @@ func (ar *AlertingRule) newAlert(m datasource.Metric, start time.Time, qFn notif
GroupID: ar.GroupID,
Name: ar.Name,
Labels: map[string]string{},
Value: m.Value,
Value: m.Values[0],
Start: start,
Expr: ar.Expr,
}
// label defined here to make override possible by
// time series labels.
a.Labels[alertGroupNameLabel] = ar.GroupName
if !*disableAlertGroupLabel && ar.GroupName != "" {
a.Labels[alertGroupNameLabel] = ar.GroupName
}
for _, l := range m.Labels {
// drop __name__ to be consistent with Prometheus alerting
if l.Name == "__name__" {
@@ -317,6 +396,7 @@ func (ar *AlertingRule) RuleAPI() APIAlertingRule {
Expression: ar.Expr,
For: ar.For.String(),
LastError: lastErr,
LastSamples: ar.lastExecSamples,
LastExec: ar.lastExecTime,
Labels: ar.Labels,
Annotations: ar.Annotations,
@@ -339,6 +419,7 @@ func (ar *AlertingRule) newAlertAPI(a notifier.Alert) *APIAlert {
// encode as strings to avoid rounding
ID: fmt.Sprintf("%d", a.ID),
GroupID: fmt.Sprintf("%d", a.GroupID),
RuleID: fmt.Sprintf("%d", ar.RuleID),
Name: a.Name,
Expression: ar.Expr,
@@ -346,7 +427,7 @@ func (ar *AlertingRule) newAlertAPI(a notifier.Alert) *APIAlert {
Annotations: a.Annotations,
State: a.State.String(),
ActiveAt: a.Start,
Value: strconv.FormatFloat(a.Value, 'e', -1, 64),
Value: strconv.FormatFloat(a.Value, 'f', -1, 32),
}
}
@@ -366,7 +447,7 @@ const (
)
// alertToTimeSeries converts the given alert with the given timestamp to timeseries
func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp time.Time) []prompbmarshal.TimeSeries {
func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
tss = append(tss, alertToTimeSeries(ar.Name, a, timestamp))
if ar.For > 0 {
@@ -375,7 +456,7 @@ func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp time.Time
return tss
}
func alertToTimeSeries(name string, a *notifier.Alert, timestamp time.Time) prompbmarshal.TimeSeries {
func alertToTimeSeries(name string, a *notifier.Alert, timestamp int64) prompbmarshal.TimeSeries {
labels := make(map[string]string)
for k, v := range a.Labels {
labels[k] = v
@@ -383,19 +464,19 @@ func alertToTimeSeries(name string, a *notifier.Alert, timestamp time.Time) prom
labels["__name__"] = alertMetricName
labels[alertNameLabel] = name
labels[alertStateLabel] = a.State.String()
return newTimeSeries(1, labels, timestamp)
return newTimeSeries([]float64{1}, []int64{timestamp}, labels)
}
// alertForToTimeSeries returns a timeseries that represents
// state of active alerts, where value is time when alert become active
func alertForToTimeSeries(name string, a *notifier.Alert, timestamp time.Time) prompbmarshal.TimeSeries {
func alertForToTimeSeries(name string, a *notifier.Alert, timestamp int64) prompbmarshal.TimeSeries {
labels := make(map[string]string)
for k, v := range a.Labels {
labels[k] = v
}
labels["__name__"] = alertForStateMetricName
labels[alertNameLabel] = name
return newTimeSeries(float64(a.Start.Unix()), labels, timestamp)
return newTimeSeries([]float64{float64(a.Start.Unix())}, []int64{timestamp}, labels)
}
// Restore restores the state of active alerts basing on previously written timeseries.
@@ -407,7 +488,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
return fmt.Errorf("querier is nil")
}
qFn := func(query string) ([]datasource.Metric, error) { return q.Query(ctx, query, ar.Type) }
qFn := func(query string) ([]datasource.Metric, error) { return ar.q.Query(ctx, query) }
// account for external labels in filter
var labelsFilter string
@@ -420,7 +501,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
// remote write protocol which is used for state persistence in vmalert.
expr := fmt.Sprintf("last_over_time(%s{alertname=%q%s}[%ds])",
alertForStateMetricName, ar.Name, labelsFilter, int(lookback.Seconds()))
qMetrics, err := q.Query(ctx, expr, ar.Type)
qMetrics, err := q.Query(ctx, expr)
if err != nil {
return err
}
@@ -437,7 +518,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
m.Labels = append(m.Labels, l)
}
a, err := ar.newAlert(m, time.Unix(int64(m.Value), 0), qFn)
a, err := ar.newAlert(m, time.Unix(int64(m.Values[0]), 0), qFn)
if err != nil {
return fmt.Errorf("failed to create alert: %w", err)
}

View File

@@ -24,11 +24,11 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
newTestAlertingRule("instant", 0),
&notifier.Alert{State: notifier.StateFiring},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "instant",
}, timestamp),
}),
},
},
{
@@ -38,13 +38,13 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
"instance": "bar",
}},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "instant extra labels",
"job": "foo",
"instance": "bar",
}, timestamp),
}),
},
},
{
@@ -54,48 +54,52 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
"__name__": "bar",
}},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "instant labels override",
}, timestamp),
}),
},
},
{
newTestAlertingRule("for", time.Second),
&notifier.Alert{State: notifier.StateFiring, Start: timestamp.Add(time.Second)},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "for",
}, timestamp),
newTimeSeries(float64(timestamp.Add(time.Second).Unix()), map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for",
}, timestamp),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for",
}),
},
},
{
newTestAlertingRule("for pending", 10*time.Second),
&notifier.Alert{State: notifier.StatePending, Start: timestamp.Add(time.Second)},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StatePending.String(),
alertNameLabel: "for pending",
}, timestamp),
newTimeSeries(float64(timestamp.Add(time.Second).Unix()), map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for pending",
}, timestamp),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for pending",
}),
},
},
}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
tc.rule.alerts[tc.alert.ID] = tc.alert
tss := tc.rule.toTimeSeries(timestamp)
tss := tc.rule.toTimeSeries(timestamp.Unix())
if err := compareTimeSeries(t, tc.expTS, tss); err != nil {
t.Fatalf("timeseries missmatch: %s", err)
}
@@ -118,7 +122,7 @@ func TestAlertingRule_Exec(t *testing.T) {
{
newTestAlertingRule("empty labels", 0),
[][]datasource.Metric{
{datasource.Metric{}},
{datasource.Metric{Values: []float64{1}, Timestamps: []int64{1}}},
},
map[uint64]*notifier.Alert{
hash(datasource.Metric{}): {State: notifier.StateFiring},
@@ -294,11 +298,12 @@ func TestAlertingRule_Exec(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
for _, step := range tc.steps {
fq.reset()
fq.add(step...)
if _, err := tc.rule.Exec(context.TODO(), fq, false); err != nil {
if _, err := tc.rule.Exec(context.TODO()); err != nil {
t.Fatalf("unexpected err: %s", err)
}
// artificial delay between applying steps
@@ -320,6 +325,166 @@ func TestAlertingRule_Exec(t *testing.T) {
}
}
func TestAlertingRule_ExecRange(t *testing.T) {
testCases := []struct {
rule *AlertingRule
data []datasource.Metric
expAlerts []*notifier.Alert
}{
{
newTestAlertingRule("empty", 0),
[]datasource.Metric{},
nil,
},
{
newTestAlertingRule("empty labels", 0),
[]datasource.Metric{
{Values: []float64{1}, Timestamps: []int64{1}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
},
},
{
newTestAlertingRule("single-firing", 0),
[]datasource.Metric{
metricWithLabels(t, "name", "foo"),
},
[]*notifier.Alert{
{
Labels: map[string]string{"name": "foo"},
State: notifier.StateFiring,
},
},
},
{
newTestAlertingRule("single-firing-on-range", 0),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1e3, 2e3, 3e3}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
{State: notifier.StateFiring},
{State: notifier.StateFiring},
},
},
{
newTestAlertingRule("for-pending", time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(3, 0)},
{State: notifier.StatePending, Start: time.Unix(5, 0)},
},
},
{
newTestAlertingRule("for-firing", 3*time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StateFiring, Start: time.Unix(1, 0)},
},
},
{
newTestAlertingRule("for=>pending=>firing=>pending=>firing=>pending", time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1, 1, 1}, Timestamps: []int64{1, 2, 5, 6, 20}},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StateFiring, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(5, 0)},
{State: notifier.StateFiring, Start: time.Unix(5, 0)},
{State: notifier.StatePending, Start: time.Unix(20, 0)},
},
},
{
newTestAlertingRule("multi-series-for=>pending=>pending=>firing", 3*time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
{Values: []float64{1, 1}, Timestamps: []int64{1, 5},
Labels: []datasource.Label{{Name: "foo", Value: "bar"}},
},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StateFiring, Start: time.Unix(1, 0)},
//
{State: notifier.StatePending, Start: time.Unix(1, 0),
Labels: map[string]string{
"foo": "bar",
}},
{State: notifier.StatePending, Start: time.Unix(5, 0),
Labels: map[string]string{
"foo": "bar",
}},
},
},
{
newTestRuleWithLabels("multi-series-firing", "source", "vm"),
[]datasource.Metric{
{Values: []float64{1, 1}, Timestamps: []int64{1, 100}},
{Values: []float64{1, 1}, Timestamps: []int64{1, 5},
Labels: []datasource.Label{{Name: "foo", Value: "bar"}},
},
},
[]*notifier.Alert{
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
//
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
},
},
}
fakeGroup := Group{Name: "TestRule_ExecRange"}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
fq.add(tc.data...)
gotTS, err := tc.rule.ExecRange(context.TODO(), time.Now(), time.Now())
if err != nil {
t.Fatalf("unexpected err: %s", err)
}
var expTS []prompbmarshal.TimeSeries
var j int
for _, series := range tc.data {
for _, timestamp := range series.Timestamps {
expTS = append(expTS, tc.rule.alertToTimeSeries(tc.expAlerts[j], timestamp)...)
j++
}
}
if len(gotTS) != len(expTS) {
t.Fatalf("expected %d time series; got %d", len(expTS), len(gotTS))
}
for i := range expTS {
got, exp := gotTS[i], expTS[i]
if !reflect.DeepEqual(got, exp) {
t.Fatalf("%d: expected \n%v but got \n%v", i, exp, got)
}
}
})
}
}
func TestAlertingRule_Restore(t *testing.T) {
testCases := []struct {
rule *AlertingRule
@@ -410,6 +575,7 @@ func TestAlertingRule_Restore(t *testing.T) {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if err := tc.rule.Restore(context.TODO(), fq, time.Hour, nil); err != nil {
t.Fatalf("unexpected err: %s", err)
@@ -437,17 +603,18 @@ func TestAlertingRule_Exec_Negative(t *testing.T) {
fq := &fakeQuerier{}
ar := newTestAlertingRule("test", 0)
ar.Labels = map[string]string{"job": "test"}
ar.q = fq
// successful attempt
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "bar"))
_, err := ar.Exec(context.TODO(), fq, false)
_, err := ar.Exec(context.TODO())
if err != nil {
t.Fatal(err)
}
// label `job` will collide with rule extra label and will make both time series equal
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "baz"))
_, err = ar.Exec(context.TODO(), fq, false)
_, err = ar.Exec(context.TODO())
if !errors.Is(err, errDuplicate) {
t.Fatalf("expected to have %s error; got %s", errDuplicate, err)
}
@@ -456,7 +623,7 @@ func TestAlertingRule_Exec_Negative(t *testing.T) {
expErr := "connection reset by peer"
fq.setErr(errors.New(expErr))
_, err = ar.Exec(context.TODO(), fq, false)
_, err = ar.Exec(context.TODO())
if err == nil {
t.Fatalf("expected to get err; got nil")
}
@@ -481,17 +648,15 @@ func TestAlertingRule_Template(t *testing.T) {
hash(metricWithLabels(t, "region", "east", "instance", "foo")): {
Annotations: map[string]string{},
Labels: map[string]string{
alertGroupNameLabel: "",
"region": "east",
"instance": "foo",
"region": "east",
"instance": "foo",
},
},
hash(metricWithLabels(t, "region", "east", "instance", "bar")): {
Annotations: map[string]string{},
Labels: map[string]string{
alertGroupNameLabel: "",
"region": "east",
"instance": "bar",
"region": "east",
"instance": "bar",
},
},
},
@@ -516,9 +681,8 @@ func TestAlertingRule_Template(t *testing.T) {
map[uint64]*notifier.Alert{
hash(metricWithLabels(t, "region", "east", "instance", "foo")): {
Labels: map[string]string{
alertGroupNameLabel: "",
"instance": "foo",
"region": "east",
"instance": "foo",
"region": "east",
},
Annotations: map[string]string{
"summary": `Too high connection number for "foo" for region east`,
@@ -527,9 +691,8 @@ func TestAlertingRule_Template(t *testing.T) {
},
hash(metricWithLabels(t, "region", "east", "instance", "bar")): {
Labels: map[string]string{
alertGroupNameLabel: "",
"instance": "bar",
"region": "east",
"instance": "bar",
"region": "east",
},
Annotations: map[string]string{
"summary": `Too high connection number for "bar" for region east`,
@@ -544,8 +707,9 @@ func TestAlertingRule_Template(t *testing.T) {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if _, err := tc.rule.Exec(context.TODO(), fq, false); err != nil {
if _, err := tc.rule.Exec(context.TODO()); err != nil {
t.Fatalf("unexpected err: %s", err)
}
for hash, expAlert := range tc.expAlerts {
@@ -575,5 +739,5 @@ func newTestRuleWithLabels(name string, labels ...string) *AlertingRule {
}
func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
return &AlertingRule{Name: name, alerts: make(map[uint64]*notifier.Alert), For: waitFor}
return &AlertingRule{Name: name, alerts: make(map[uint64]*notifier.Alert), For: waitFor, EvalInterval: waitFor}
}

View File

@@ -8,7 +8,6 @@ import (
"path/filepath"
"sort"
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
@@ -16,7 +15,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metricsql"
"gopkg.in/yaml.v2"
)
@@ -25,10 +23,17 @@ import (
type Group struct {
Type datasource.Type `yaml:"type,omitempty"`
File string
Name string `yaml:"name"`
Interval time.Duration `yaml:"interval,omitempty"`
Rules []Rule `yaml:"rules"`
Concurrency int `yaml:"concurrency"`
Name string `yaml:"name"`
Interval utils.PromDuration `yaml:"interval"`
Rules []Rule `yaml:"rules"`
Concurrency int `yaml:"concurrency"`
// ExtraFilterLabels is a list label filters applied to every rule
// request withing a group. Is compatible only with VM datasources.
// See https://docs.victoriametrics.com#prometheus-querying-api-enhancements
ExtraFilterLabels map[string]string `yaml:"extra_filter_labels"`
// Labels is a set of label value pairs, that will be added to every rule.
// It has priority over the external labels.
Labels map[string]string `yaml:"labels"`
// Checksum stores the hash of yaml definition for this group.
// May be used to detect any changes like rules re-ordering etc.
Checksum string
@@ -115,54 +120,18 @@ func (g *Group) Validate(validateAnnotations, validateExpressions bool) error {
// recording rule or alerting rule.
type Rule struct {
ID uint64
Type datasource.Type `yaml:"type,omitempty"`
Record string `yaml:"record,omitempty"`
Alert string `yaml:"alert,omitempty"`
Expr string `yaml:"expr"`
For PromDuration `yaml:"for"`
Labels map[string]string `yaml:"labels,omitempty"`
Annotations map[string]string `yaml:"annotations,omitempty"`
Type datasource.Type `yaml:"type,omitempty"`
Record string `yaml:"record,omitempty"`
Alert string `yaml:"alert,omitempty"`
Expr string `yaml:"expr"`
For utils.PromDuration `yaml:"for"`
Labels map[string]string `yaml:"labels,omitempty"`
Annotations map[string]string `yaml:"annotations,omitempty"`
// Catches all undefined fields and must be empty after parsing.
XXX map[string]interface{} `yaml:",inline"`
}
// PromDuration is Prometheus duration.
type PromDuration struct {
milliseconds int64
}
// NewPromDuration returns PromDuration for given d.
func NewPromDuration(d time.Duration) PromDuration {
return PromDuration{
milliseconds: d.Milliseconds(),
}
}
// MarshalYAML implements yaml.Marshaler interface.
func (pd PromDuration) MarshalYAML() (interface{}, error) {
return pd.Duration().String(), nil
}
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (pd *PromDuration) UnmarshalYAML(unmarshal func(interface{}) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
ms, err := metricsql.DurationValue(s, 0)
if err != nil {
return err
}
pd.milliseconds = ms
return nil
}
// Duration returns duration for pd.
func (pd *PromDuration) Duration() time.Duration {
return time.Duration(pd.milliseconds) * time.Millisecond
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (r *Rule) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rule Rule

View File

@@ -8,10 +8,9 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"gopkg.in/yaml.v2"
)
func TestMain(m *testing.M) {
@@ -264,7 +263,7 @@ func TestGroup_Validate(t *testing.T) {
Rules: []Rule{
{
Expr: "sumSeries(time('foo.bar',10))",
For: PromDuration{milliseconds: 10},
For: utils.NewPromDuration(10 * time.Millisecond),
},
{
Expr: "sum(up == 0 ) by (host)",
@@ -280,7 +279,7 @@ func TestGroup_Validate(t *testing.T) {
Rules: []Rule{
{
Expr: "sum(up == 0 ) by (host)",
For: PromDuration{milliseconds: 10},
For: utils.NewPromDuration(10 * time.Millisecond),
},
{
Expr: "sumSeries(time('foo.bar',10))",
@@ -348,7 +347,7 @@ func TestHashRule(t *testing.T) {
true,
},
{
Rule{Alert: "alert", Expr: "up == 1", For: NewPromDuration(time.Minute)},
Rule{Alert: "alert", Expr: "up == 1", For: utils.NewPromDuration(time.Minute)},
Rule{Alert: "alert", Expr: "up == 1"},
true,
},
@@ -437,7 +436,7 @@ rules:
`)
})
t.Run("Ok, `for` must change cs", func(t *testing.T) {
t.Run("`for` change", func(t *testing.T) {
f(t, `
name: TestGroup
rules:
@@ -451,5 +450,34 @@ rules:
expr: sum by(job) (up == 1)
`)
})
t.Run("`interval` change", func(t *testing.T) {
f(t, `
name: TestGroup
interval: 2s
rules:
- alert: ExampleAlertWithFor
expr: sum by(job) (up == 1)
`, `
name: TestGroup
interval: 4s
rules:
- alert: ExampleAlertWithFor
expr: sum by(job) (up == 1)
`)
})
t.Run("`concurrency` change", func(t *testing.T) {
f(t, `
name: TestGroup
concurrency: 2
rules:
- alert: ExampleAlertWithFor
expr: sum by(job) (up == 1)
`, `
name: TestGroup
concurrency: 16
rules:
- alert: ExampleAlertWithFor
expr: sum by(job) (up == 1)
`)
})
}

View File

@@ -3,6 +3,8 @@ groups:
interval: 2s
concurrency: 2
type: prometheus
labels:
cluster: main
rules:
- alert: up
expr: up == 0

View File

@@ -1,5 +1,7 @@
groups:
- name: duplicatedGroupDiffFiles
labels:
dc: gcp
rules:
- alert: VMRows
for: 5m

View File

@@ -0,0 +1,15 @@
groups:
- name: alertmanager.rules
rules:
- alert: AlertmanagerConfigInconsistent
annotations:
message: |
The configuration of the instances of the Alertmanager cluster `{{ $labels.namespace }}/{{ $labels.service }}` are out of sync.
{{ range printf "alertmanager_config_hash{namespace=\"%s\",service=\"%s\"}" $labels.namespace $labels.service | query }}
Configuration hash for pod {{ .Labels.pod }} is "{{ printf "%.f" .Value }}"
{{ end }}
expr: |
count by(namespace,service) (count_values by(namespace,service) ("config_hash", alertmanager_config_hash{job="alertmanager-main",namespace="openshift-monitoring"})) != 1
for: 5m
labels:
severity: critical

View File

@@ -0,0 +1,39 @@
groups:
- name: ReplayGroup
interval: 1m
concurrency: 1
rules:
- record: type:vm_cache_entries:rate5m
expr: sum(rate(vm_cache_entries[5m])) by (type)
labels:
recording: true
- record: go_cgo_calls_count:rate5m
expr: rate(go_cgo_calls_count{job="vmdb"}[5m])
labels:
recording: true
- name: vmsingleReplay
interval: 30s
concurrency: 2
rules:
- alert: RequestErrorsToAPI
expr: increase(vm_http_request_errors_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance }}"
summary: "Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance }})"
description: "Requests to path {{ $labels.path }} are receiving errors.
Please verify if clients are sending correct requests."
- alert: TooManyLogs
expr: sum(increase(vm_log_messages_total{level!="info"}[5m])) by (job, instance) > 0
for: 15m
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=67&var-instance={{ $labels.instance }}"
summary: "Too many logs printed for job \"{{ $labels.job }}\" ({{ $labels.instance }})"
description: "Logging rate for job \"{{ $labels.job }}\" ({{ $labels.instance }}) is {{ $value }} for last 15m.\n
Worth to check logs for specific error messages."

View File

@@ -2,6 +2,8 @@ groups:
- name: TestGroup
interval: 2s
concurrency: 2
extra_filter_labels:
job: victoriametrics
rules:
- alert: Conns
expr: sum(vm_tcplistener_conns) by(instance) > 1

View File

@@ -2,21 +2,33 @@ package datasource
import (
"context"
"time"
)
// Querier interface wraps Query method which
// executes given query and returns list of Metrics
// as result
// Querier interface wraps Query and QueryRange methods
type Querier interface {
Query(ctx context.Context, query string, engine Type) ([]Metric, error)
Query(ctx context.Context, query string) ([]Metric, error)
QueryRange(ctx context.Context, query string, from, to time.Time) ([]Metric, error)
}
// QuerierBuilder builds Querier with given params.
type QuerierBuilder interface {
BuildWithParams(params QuerierParams) Querier
}
// QuerierParams params for Querier.
type QuerierParams struct {
DataSourceType *Type
EvaluationInterval time.Duration
// see https://docs.victoriametrics.com/#prometheus-querying-api-enhancements
ExtraLabels map[string]string
}
// Metric is the basic entity which should be return by datasource
// It represents single data point with full list of labels
type Metric struct {
Labels []Label
Timestamp int64
Value float64
Labels []Label
Timestamps []int64
Values []float64
}
// SetLabel adds or updates existing one label

View File

@@ -0,0 +1,18 @@
package datasource
import "testing"
func TestMetric_Label(t *testing.T) {
m := &Metric{}
m.AddLabel("foo", "bar")
checkEqualString(t, "bar", m.Label("foo"))
m.SetLabel("foo", "baz")
checkEqualString(t, "baz", m.Label("foo"))
m.SetLabel("qux", "quux")
checkEqualString(t, "quux", m.Label("qux"))
checkEqualString(t, "", m.Label("non-existing"))
}

View File

@@ -4,43 +4,75 @@ import (
"flag"
"fmt"
"net/http"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
var (
addr = flag.String("datasource.url", "", "Victoria Metrics or VMSelect url. Required parameter."+
" E.g. http://127.0.0.1:8428")
appendTypePrefix = flag.Bool("datasource.appendTypePrefix", false, "Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to VMSelect URL.")
basicAuthUsername = flag.String("datasource.basicAuth.username", "", "Optional basic auth username for -datasource.url")
basicAuthPassword = flag.String("datasource.basicAuth.password", "", "Optional basic auth password for -datasource.url")
addr = flag.String("datasource.url", "", "VictoriaMetrics or vmselect url. Required parameter. "+
"E.g. http://127.0.0.1:8428")
appendTypePrefix = flag.Bool("datasource.appendTypePrefix", false, "Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.")
basicAuthUsername = flag.String("datasource.basicAuth.username", "", "Optional basic auth username for -datasource.url")
basicAuthPassword = flag.String("datasource.basicAuth.password", "", "Optional basic auth password for -datasource.url")
basicAuthPasswordFile = flag.String("datasource.basicAuth.passwordFile", "", "Optional path to basic auth password to use for -datasource.url")
bearerToken = flag.String("datasource.bearerToken", "", "Optional bearer auth token to use for -datasource.url.")
bearerTokenFile = flag.String("datasource.bearerTokenFile", "", "Optional path to bearer token file to use for -datasource.url.")
tlsInsecureSkipVerify = flag.Bool("datasource.tlsInsecureSkipVerify", false, "Whether to skip tls verification when connecting to -datasource.url")
tlsCertFile = flag.String("datasource.tlsCertFile", "", "Optional path to client-side TLS certificate file to use when connecting to -datasource.url")
tlsKeyFile = flag.String("datasource.tlsKeyFile", "", "Optional path to client-side TLS certificate key to use when connecting to -datasource.url")
tlsCAFile = flag.String("datasource.tlsCAFile", "", "Optional path to TLS CA file to use for verifying connections to -datasource.url. "+
"By default system CA is used")
tlsServerName = flag.String("datasource.tlsServerName", "", "Optional TLS server name to use for connections to -datasource.url. "+
"By default the server name from -datasource.url is used")
tlsCAFile = flag.String("datasource.tlsCAFile", "", `Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used`)
tlsServerName = flag.String("datasource.tlsServerName", "", `Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used`)
lookBack = flag.Duration("datasource.lookback", 0, "Lookback defines how far to look into past when evaluating queries. "+
"For example, if datasource.lookback=5m then param \"time\" with value now()-5m will be added to every query.")
lookBack = flag.Duration("datasource.lookback", 0, `Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.`)
queryStep = flag.Duration("datasource.queryStep", 0, "queryStep defines how far a value can fallback to when evaluating queries. "+
"For example, if datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query.")
maxIdleConnections = flag.Int("datasource.maxIdleConnections", 100, "Defines the number of idle (keep-alive connections) to configured datasource."+
"Consider to set this value equal to the value: groups_total * group.concurrency. Too low value may result into high number of sockets in TIME_WAIT state.")
"For example, if datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query."+
"If queryStep isn't specified, rule's evaluationInterval will be used instead.")
maxIdleConnections = flag.Int("datasource.maxIdleConnections", 100, `Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state.`)
roundDigits = flag.Int("datasource.roundDigits", 0, `Adds "round_digits" GET param to datasource requests. `+
`In VM "round_digits" limits the number of digits after the decimal point in response values.`)
)
// Param represents an HTTP GET param
type Param struct {
Key, Value string
}
// Init creates a Querier from provided flag values.
func Init() (Querier, error) {
// Provided extraParams will be added as GET params to
// each request.
func Init(extraParams []Param) (QuerierBuilder, error) {
if *addr == "" {
return nil, fmt.Errorf("datasource.url is empty")
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
tr.MaxIdleConns = *maxIdleConnections
c := &http.Client{Transport: tr}
return NewVMStorage(*addr, *basicAuthUsername, *basicAuthPassword, *lookBack, *queryStep, *appendTypePrefix, c), nil
tr.MaxIdleConnsPerHost = *maxIdleConnections
if *roundDigits > 0 {
extraParams = append(extraParams, Param{
Key: "round_digits",
Value: fmt.Sprintf("%d", *roundDigits),
})
}
authCfg, err := utils.AuthConfig(*basicAuthUsername, *basicAuthPassword, *basicAuthPasswordFile, *bearerToken, *bearerTokenFile)
if err != nil {
return nil, fmt.Errorf("failed to configure auth: %w", err)
}
return &VMStorage{
c: &http.Client{Transport: tr},
authCfg: authCfg,
datasourceURL: strings.TrimSuffix(*addr, "/"),
appendTypePrefix: *appendTypePrefix,
lookBack: *lookBack,
queryStep: *queryStep,
dataSourceType: NewPrometheusType(),
extraParams: extraParams,
}, nil
}

View File

@@ -2,205 +2,156 @@ package datasource
import (
"context"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strconv"
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
type response struct {
Status string `json:"status"`
Data struct {
ResultType string `json:"resultType"`
Result []struct {
Labels map[string]string `json:"metric"`
TV [2]interface{} `json:"value"`
} `json:"result"`
} `json:"data"`
ErrorType string `json:"errorType"`
Error string `json:"error"`
}
func (r response) metrics() ([]Metric, error) {
var ms []Metric
var m Metric
var f float64
var err error
for i, res := range r.Data.Result {
f, err = strconv.ParseFloat(res.TV[1].(string), 64)
if err != nil {
return nil, fmt.Errorf("metric %v, unable to parse float64 from %s: %w", res, res.TV[1], err)
}
m.Labels = nil
for k, v := range r.Data.Result[i].Labels {
m.AddLabel(k, v)
}
m.Timestamp = int64(res.TV[0].(float64))
m.Value = f
ms = append(ms, m)
}
return ms, nil
}
type graphiteResponse []graphiteResponseTarget
type graphiteResponseTarget struct {
Target string `json:"target"`
Tags map[string]string `json:"tags"`
DataPoints [][2]float64 `json:"datapoints"`
}
func (r graphiteResponse) metrics() []Metric {
var ms []Metric
for _, res := range r {
if len(res.DataPoints) < 1 {
continue
}
var m Metric
// add only last value to the result.
last := res.DataPoints[len(res.DataPoints)-1]
m.Value = last[0]
m.Timestamp = int64(last[1])
for k, v := range res.Tags {
m.AddLabel(k, v)
}
ms = append(ms, m)
}
return ms
}
// VMStorage represents vmstorage entity with ability to read and write metrics
type VMStorage struct {
c *http.Client
authCfg *promauth.Config
datasourceURL string
basicAuthUser string
basicAuthPass string
appendTypePrefix bool
lookBack time.Duration
queryStep time.Duration
dataSourceType Type
evaluationInterval time.Duration
extraLabels []string
extraParams []Param
}
const queryPath = "/api/v1/query"
const graphitePath = "/render"
// Clone makes clone of VMStorage, shares http client.
func (s *VMStorage) Clone() *VMStorage {
return &VMStorage{
c: s.c,
authCfg: s.authCfg,
datasourceURL: s.datasourceURL,
lookBack: s.lookBack,
queryStep: s.queryStep,
appendTypePrefix: s.appendTypePrefix,
dataSourceType: s.dataSourceType,
}
}
const prometheusPrefix = "/prometheus"
const graphitePrefix = "/graphite"
// ApplyParams - changes given querier params.
func (s *VMStorage) ApplyParams(params QuerierParams) *VMStorage {
if params.DataSourceType != nil {
s.dataSourceType = *params.DataSourceType
}
s.evaluationInterval = params.EvaluationInterval
for k, v := range params.ExtraLabels {
s.extraLabels = append(s.extraLabels, fmt.Sprintf("%s=%s", k, v))
}
return s
}
// BuildWithParams - implements interface.
func (s *VMStorage) BuildWithParams(params QuerierParams) Querier {
return s.Clone().ApplyParams(params)
}
// NewVMStorage is a constructor for VMStorage
func NewVMStorage(baseURL, basicAuthUser, basicAuthPass string, lookBack time.Duration, queryStep time.Duration, appendTypePrefix bool, c *http.Client) *VMStorage {
func NewVMStorage(baseURL string, authCfg *promauth.Config, lookBack time.Duration, queryStep time.Duration, appendTypePrefix bool, c *http.Client) *VMStorage {
return &VMStorage{
c: c,
basicAuthUser: basicAuthUser,
basicAuthPass: basicAuthPass,
authCfg: authCfg,
datasourceURL: strings.TrimSuffix(baseURL, "/"),
appendTypePrefix: appendTypePrefix,
lookBack: lookBack,
queryStep: queryStep,
dataSourceType: NewPrometheusType(),
}
}
// Query reads metrics from datasource by given query and type
func (s *VMStorage) Query(ctx context.Context, query string, dataSourceType Type) ([]Metric, error) {
switch dataSourceType.name {
// Query executes the given query and returns parsed response
func (s *VMStorage) Query(ctx context.Context, query string) ([]Metric, error) {
req, err := s.newRequestPOST()
if err != nil {
return nil, err
}
ts := time.Now()
switch s.dataSourceType.name {
case "", prometheusType:
return s.queryDataSource(ctx, query, s.setPrometheusReqParams, parsePrometheusResponse)
s.setPrometheusInstantReqParams(req, query, ts)
case graphiteType:
return s.queryDataSource(ctx, query, s.setGraphiteReqParams, parseGraphiteResponse)
s.setGraphiteReqParams(req, query, ts)
default:
return nil, fmt.Errorf("engine not found: %q", dataSourceType)
return nil, fmt.Errorf("engine not found: %q", s.dataSourceType.name)
}
resp, err := s.do(ctx, req)
if err != nil {
return nil, err
}
defer func() {
_ = resp.Body.Close()
}()
parseFn := parsePrometheusResponse
if s.dataSourceType.name != prometheusType {
parseFn = parseGraphiteResponse
}
return parseFn(req, resp)
}
func (s *VMStorage) queryDataSource(
ctx context.Context,
query string,
setReqParams func(r *http.Request, query string),
processResponse func(r *http.Request, resp *http.Response,
) ([]Metric, error)) ([]Metric, error) {
// QueryRange executes the given query on the given time range.
// For Prometheus type see https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries
// Graphite type isn't supported.
func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end time.Time) ([]Metric, error) {
if s.dataSourceType.name != prometheusType {
return nil, fmt.Errorf("%q is not supported for QueryRange", s.dataSourceType.name)
}
req, err := s.newRequestPOST()
if err != nil {
return nil, err
}
if start.IsZero() {
return nil, fmt.Errorf("start param is missing")
}
if end.IsZero() {
return nil, fmt.Errorf("end param is missing")
}
s.setPrometheusRangeReqParams(req, query, start, end)
resp, err := s.do(ctx, req)
if err != nil {
return nil, err
}
defer func() {
_ = resp.Body.Close()
}()
return parsePrometheusResponse(req, resp)
}
func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response, error) {
resp, err := s.c.Do(req.WithContext(ctx))
if err != nil {
return nil, fmt.Errorf("error getting response from %s: %w", req.URL, err)
}
if resp.StatusCode != http.StatusOK {
body, _ := ioutil.ReadAll(resp.Body)
_ = resp.Body.Close()
return nil, fmt.Errorf("unexpected response code %d for %s. Response body %s", resp.StatusCode, req.URL, body)
}
return resp, nil
}
func (s *VMStorage) newRequestPOST() (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json; charset=utf-8")
if s.basicAuthPass != "" {
req.SetBasicAuth(s.basicAuthUser, s.basicAuthPass)
if s.authCfg != nil {
if auth := s.authCfg.GetAuthHeader(); auth != "" {
req.Header.Set("Authorization", auth)
}
}
setReqParams(req, query)
resp, err := s.c.Do(req.WithContext(ctx))
if err != nil {
return nil, fmt.Errorf("error getting response from %s: %w", req.URL, err)
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
body, _ := ioutil.ReadAll(resp.Body)
return nil, fmt.Errorf("datasource returns unexpected response code %d for %s. Response body %s", resp.StatusCode, req.URL, body)
}
return processResponse(req, resp)
}
func (s *VMStorage) setPrometheusReqParams(r *http.Request, query string) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
}
r.URL.Path += queryPath
q := r.URL.Query()
q.Set("query", query)
if s.lookBack > 0 {
lookBack := time.Now().Add(-s.lookBack)
q.Set("time", fmt.Sprintf("%d", lookBack.Unix()))
}
if s.queryStep > 0 {
q.Set("step", s.queryStep.String())
}
r.URL.RawQuery = q.Encode()
}
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string) {
if s.appendTypePrefix {
r.URL.Path += graphitePrefix
}
r.URL.Path += graphitePath
q := r.URL.Query()
q.Set("format", "json")
q.Set("target", query)
from := "-5min"
if s.lookBack > 0 {
lookBack := time.Now().Add(-s.lookBack)
from = strconv.FormatInt(lookBack.Unix(), 10)
}
q.Set("from", from)
q.Set("until", "now")
r.URL.RawQuery = q.Encode()
}
const (
statusSuccess, statusError, rtVector = "success", "error", "vector"
)
func parsePrometheusResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &response{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL, err)
}
if r.Status == statusError {
return nil, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL, r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return nil, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
}
if r.Data.ResultType != rtVector {
return nil, fmt.Errorf("unknown result type:%s. Expected vector", r.Data.ResultType)
}
return r.metrics()
}
func parseGraphiteResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &graphiteResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL, err)
}
return r.metrics(), nil
return req, nil
}

View File

@@ -0,0 +1,67 @@
package datasource
import (
"encoding/json"
"fmt"
"net/http"
"strconv"
"time"
)
type graphiteResponse []graphiteResponseTarget
type graphiteResponseTarget struct {
Target string `json:"target"`
Tags map[string]string `json:"tags"`
DataPoints [][2]float64 `json:"datapoints"`
}
func (r graphiteResponse) metrics() []Metric {
var ms []Metric
for _, res := range r {
if len(res.DataPoints) < 1 {
continue
}
var m Metric
// add only last value to the result.
last := res.DataPoints[len(res.DataPoints)-1]
m.Values = append(m.Values, last[0])
m.Timestamps = append(m.Timestamps, int64(last[1]))
for k, v := range res.Tags {
m.AddLabel(k, v)
}
ms = append(ms, m)
}
return ms
}
func parseGraphiteResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &graphiteResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL, err)
}
return r.metrics(), nil
}
const (
graphitePath = "/render"
graphitePrefix = "/graphite"
)
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += graphitePrefix
}
r.URL.Path += graphitePath
q := r.URL.Query()
q.Set("format", "json")
q.Set("target", query)
from := "-5min"
if s.lookBack > 0 {
lookBack := timestamp.Add(-s.lookBack)
from = strconv.FormatInt(lookBack.Unix(), 10)
}
q.Set("from", from)
q.Set("until", "now")
r.URL.RawQuery = q.Encode()
}

View File

@@ -0,0 +1,165 @@
package datasource
import (
"encoding/json"
"fmt"
"net/http"
"strconv"
"time"
)
type promResponse struct {
Status string `json:"status"`
ErrorType string `json:"errorType"`
Error string `json:"error"`
Data struct {
ResultType string `json:"resultType"`
Result json.RawMessage `json:"result"`
} `json:"data"`
}
type promInstant struct {
Result []struct {
Labels map[string]string `json:"metric"`
TV [2]interface{} `json:"value"`
} `json:"result"`
}
type promRange struct {
Result []struct {
Labels map[string]string `json:"metric"`
TVs [][2]interface{} `json:"values"`
} `json:"result"`
}
func (r promInstant) metrics() ([]Metric, error) {
var result []Metric
for i, res := range r.Result {
f, err := strconv.ParseFloat(res.TV[1].(string), 64)
if err != nil {
return nil, fmt.Errorf("metric %v, unable to parse float64 from %s: %w", res, res.TV[1], err)
}
var m Metric
for k, v := range r.Result[i].Labels {
m.AddLabel(k, v)
}
m.Timestamps = append(m.Timestamps, int64(res.TV[0].(float64)))
m.Values = append(m.Values, f)
result = append(result, m)
}
return result, nil
}
func (r promRange) metrics() ([]Metric, error) {
var result []Metric
for i, res := range r.Result {
var m Metric
for _, tv := range res.TVs {
f, err := strconv.ParseFloat(tv[1].(string), 64)
if err != nil {
return nil, fmt.Errorf("metric %v, unable to parse float64 from %s: %w", res, tv[1], err)
}
m.Values = append(m.Values, f)
m.Timestamps = append(m.Timestamps, int64(tv[0].(float64)))
}
if len(m.Values) < 1 || len(m.Timestamps) < 1 {
return nil, fmt.Errorf("metric %v contains no values", res)
}
m.Labels = nil
for k, v := range r.Result[i].Labels {
m.AddLabel(k, v)
}
result = append(result, m)
}
return result, nil
}
const (
statusSuccess, statusError = "success", "error"
rtVector, rtMatrix = "vector", "matrix"
)
func parsePrometheusResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &promResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL, err)
}
if r.Status == statusError {
return nil, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL, r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return nil, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
}
switch r.Data.ResultType {
case rtVector:
var pi promInstant
if err := json.Unmarshal(r.Data.Result, &pi.Result); err != nil {
return nil, fmt.Errorf("umarshal err %s; \n %#v", err, string(r.Data.Result))
}
return pi.metrics()
case rtMatrix:
var pr promRange
if err := json.Unmarshal(r.Data.Result, &pr.Result); err != nil {
return nil, err
}
return pr.metrics()
default:
return nil, fmt.Errorf("unknown result type %q", r.Data.ResultType)
}
}
const (
prometheusInstantPath = "/api/v1/query"
prometheusRangePath = "/api/v1/query_range"
prometheusPrefix = "/prometheus"
)
func (s *VMStorage) setPrometheusInstantReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
}
r.URL.Path += prometheusInstantPath
q := r.URL.Query()
if s.lookBack > 0 {
timestamp = timestamp.Add(-s.lookBack)
}
if s.evaluationInterval > 0 {
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
timestamp = timestamp.Truncate(s.evaluationInterval)
}
q.Set("time", fmt.Sprintf("%d", timestamp.Unix()))
r.URL.RawQuery = q.Encode()
s.setPrometheusReqParams(r, query)
}
func (s *VMStorage) setPrometheusRangeReqParams(r *http.Request, query string, start, end time.Time) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
}
r.URL.Path += prometheusRangePath
q := r.URL.Query()
q.Add("start", fmt.Sprintf("%d", start.Unix()))
q.Add("end", fmt.Sprintf("%d", end.Unix()))
r.URL.RawQuery = q.Encode()
s.setPrometheusReqParams(r, query)
}
func (s *VMStorage) setPrometheusReqParams(r *http.Request, query string) {
q := r.URL.Query()
q.Set("query", query)
if s.evaluationInterval > 0 {
// set step as evaluationInterval by default
q.Set("step", s.evaluationInterval.String())
}
if s.queryStep > 0 {
// override step with user-specified value
q.Set("step", s.queryStep.String())
}
for _, l := range s.extraLabels {
q.Add("extra_label", l)
}
for _, p := range s.extraParams {
q.Add(p.Key, p.Value)
}
r.URL.RawQuery = q.Encode()
}

View File

@@ -2,22 +2,31 @@ package datasource
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"reflect"
"strconv"
"strings"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
var (
ctx = context.Background()
basicAuthName = "foo"
basicAuthPass = "bar"
query = "vm_rows"
queryRender = "constantLine(10)"
baCfg = &promauth.BasicAuthConfig{
Username: basicAuthName,
Password: basicAuthPass,
}
query = "vm_rows"
queryRender = "constantLine(10)"
)
func TestVMSelectQuery(t *testing.T) {
func TestVMInstantQuery(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Errorf("should not be called")
@@ -63,32 +72,141 @@ func TestVMSelectQuery(t *testing.T) {
case 5:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix"}}`))
case 6:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows"},"value":[1583786142,"13763"]}]}}`))
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows"},"value":[1583786142,"13763"]},{"metric":{"__name__":"vm_requests"},"value":[1583786140,"2000"]}]}}`))
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
am := NewVMStorage(srv.URL, basicAuthName, basicAuthPass, time.Minute, 0, false, srv.Client())
if _, err := am.Query(ctx, query, NewPrometheusType()); err == nil {
authCfg, err := promauth.NewConfig(".", nil, baCfg, "", "", nil, nil)
if err != nil {
t.Fatalf("unexpected: %s", err)
}
s := NewVMStorage(srv.URL, authCfg, time.Minute, 0, false, srv.Client())
p := NewPrometheusType()
pq := s.BuildWithParams(QuerierParams{DataSourceType: &p, EvaluationInterval: 15 * time.Second})
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected connection error got nil")
}
if _, err := am.Query(ctx, query, NewPrometheusType()); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected invalid response status error got nil")
}
if _, err := am.Query(ctx, query, NewPrometheusType()); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected response body error got nil")
}
if _, err := am.Query(ctx, query, NewPrometheusType()); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected error status got nil")
}
if _, err := am.Query(ctx, query, NewPrometheusType()); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected unknown status got nil")
}
if _, err := am.Query(ctx, query, NewPrometheusType()); err == nil {
if _, err := pq.Query(ctx, query); err == nil {
t.Fatalf("expected non-vector resultType error got nil")
}
m, err := am.Query(ctx, query, NewPrometheusType())
m, err := pq.Query(ctx, query)
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 2 {
t.Fatalf("expected 2 metrics got %d in %+v", len(m), m)
}
expected := []Metric{
{
Labels: []Label{{Value: "vm_rows", Name: "__name__"}},
Timestamps: []int64{1583786142},
Values: []float64{13763},
},
{
Labels: []Label{{Value: "vm_requests", Name: "__name__"}},
Timestamps: []int64{1583786140},
Values: []float64{2000},
},
}
if !reflect.DeepEqual(m, expected) {
t.Fatalf("unexpected metric %+v want %+v", m, expected)
}
g := NewGraphiteType()
gq := s.BuildWithParams(QuerierParams{DataSourceType: &g})
m, err = gq.Query(ctx, queryRender)
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}
exp := Metric{
Labels: []Label{{Value: "constantLine(10)", Name: "name"}},
Timestamps: []int64{1611758403},
Values: []float64{10},
}
if !reflect.DeepEqual(m[0], exp) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
}
func TestVMRangeQuery(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Errorf("should not be called")
})
c := -1
mux.HandleFunc("/api/v1/query_range", func(w http.ResponseWriter, r *http.Request) {
c++
if r.Method != http.MethodPost {
t.Errorf("expected POST method got %s", r.Method)
}
if name, pass, _ := r.BasicAuth(); name != basicAuthName || pass != basicAuthPass {
t.Errorf("expected %s:%s as basic auth got %s:%s", basicAuthName, basicAuthPass, name, pass)
}
if r.URL.Query().Get("query") != query {
t.Errorf("expected %s in query param, got %s", query, r.URL.Query().Get("query"))
}
startTS := r.URL.Query().Get("start")
if startTS == "" {
t.Errorf("expected 'start' in query param, got nil instead")
}
if _, err := strconv.ParseInt(startTS, 10, 64); err != nil {
t.Errorf("failed to parse 'start' query param: %s", err)
}
endTS := r.URL.Query().Get("end")
if endTS == "" {
t.Errorf("expected 'end' in query param, got nil instead")
}
if _, err := strconv.ParseInt(endTS, 10, 64); err != nil {
t.Errorf("failed to parse 'end' query param: %s", err)
}
switch c {
case 0:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"vm_rows"},"values":[[1583786142,"13763"]]}]}}`))
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
authCfg, err := promauth.NewConfig(".", nil, baCfg, "", "", nil, nil)
if err != nil {
t.Fatalf("unexpected: %s", err)
}
s := NewVMStorage(srv.URL, authCfg, time.Minute, 0, false, srv.Client())
p := NewPrometheusType()
pq := s.BuildWithParams(QuerierParams{DataSourceType: &p, EvaluationInterval: 15 * time.Second})
_, err = pq.QueryRange(ctx, query, time.Now(), time.Time{})
expectError(t, err, "is missing")
_, err = pq.QueryRange(ctx, query, time.Time{}, time.Now())
expectError(t, err, "is missing")
start, end := time.Now().Add(-time.Minute), time.Now()
m, err := pq.QueryRange(ctx, query, start, end)
if err != nil {
t.Fatalf("unexpected %s", err)
}
@@ -96,32 +214,275 @@ func TestVMSelectQuery(t *testing.T) {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}
expected := Metric{
Labels: []Label{{Value: "vm_rows", Name: "__name__"}},
Timestamp: 1583786142,
Value: 13763,
Labels: []Label{{Value: "vm_rows", Name: "__name__"}},
Timestamps: []int64{1583786142},
Values: []float64{13763},
}
if m[0].Timestamp != expected.Timestamp &&
m[0].Value != expected.Value &&
m[0].Labels[0].Value != expected.Labels[0].Value &&
m[0].Labels[0].Name != expected.Labels[0].Name {
if !reflect.DeepEqual(m[0], expected) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
m, err = am.Query(ctx, queryRender, NewGraphiteType())
g := NewGraphiteType()
gq := s.BuildWithParams(QuerierParams{DataSourceType: &g})
_, err = gq.QueryRange(ctx, queryRender, start, end)
expectError(t, err, "is not supported")
}
func TestRequestParams(t *testing.T) {
authCfg, err := promauth.NewConfig(".", nil, baCfg, "", "", nil, nil)
if err != nil {
t.Fatalf("unexpected %s", err)
t.Fatalf("unexpected: %s", err)
}
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
query := "up"
timestamp := time.Date(2001, 2, 3, 4, 5, 6, 0, time.UTC)
testCases := []struct {
name string
queryRange bool
vm *VMStorage
checkFn func(t *testing.T, r *http.Request)
}{
{
"prometheus path",
false,
&VMStorage{
dataSourceType: NewPrometheusType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusInstantPath, r.URL.Path)
},
},
{
"prometheus prefix",
false,
&VMStorage{
dataSourceType: NewPrometheusType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusPrefix+prometheusInstantPath, r.URL.Path)
},
},
{
"prometheus range path",
true,
&VMStorage{
dataSourceType: NewPrometheusType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusRangePath, r.URL.Path)
},
},
{
"prometheus range prefix",
true,
&VMStorage{
dataSourceType: NewPrometheusType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusPrefix+prometheusRangePath, r.URL.Path)
},
},
{
"graphite path",
false,
&VMStorage{
dataSourceType: NewGraphiteType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, graphitePath, r.URL.Path)
},
},
{
"graphite prefix",
false,
&VMStorage{
dataSourceType: NewGraphiteType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, graphitePrefix+graphitePath, r.URL.Path)
},
},
{
"default params",
false,
&VMStorage{},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"default range params",
true,
&VMStorage{},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("end=%d&query=%s&start=%d", timestamp.Unix(), query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"basic auth",
false,
&VMStorage{authCfg: authCfg},
func(t *testing.T, r *http.Request) {
u, p, _ := r.BasicAuth()
checkEqualString(t, "foo", u)
checkEqualString(t, "bar", p)
},
},
{
"basic auth range",
true,
&VMStorage{authCfg: authCfg},
func(t *testing.T, r *http.Request) {
u, p, _ := r.BasicAuth()
checkEqualString(t, "foo", u)
checkEqualString(t, "bar", p)
},
},
{
"lookback",
false,
&VMStorage{
lookBack: time.Minute,
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&time=%d", query, timestamp.Add(-time.Minute).Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"evaluation interval",
false,
&VMStorage{
evaluationInterval: 15 * time.Second,
},
func(t *testing.T, r *http.Request) {
evalInterval := 15 * time.Second
tt := timestamp.Truncate(evalInterval)
exp := fmt.Sprintf("query=%s&step=%v&time=%d", query, evalInterval, tt.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"lookback + evaluation interval",
false,
&VMStorage{
lookBack: time.Minute,
evaluationInterval: 15 * time.Second,
},
func(t *testing.T, r *http.Request) {
evalInterval := 15 * time.Second
tt := timestamp.Add(-time.Minute)
tt = tt.Truncate(evalInterval)
exp := fmt.Sprintf("query=%s&step=%v&time=%d", query, evalInterval, tt.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"step override",
false,
&VMStorage{
queryStep: time.Minute,
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&step=%v&time=%d", query, time.Minute, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"round digits",
false,
&VMStorage{
extraParams: []Param{{"round_digits", "10"}},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&round_digits=10&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"extra labels",
false,
&VMStorage{
extraLabels: []string{
"env=prod",
"query=es=cape",
},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("extra_label=env%%3Dprod&extra_label=query%%3Des%%3Dcape&query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"extra labels range",
true,
&VMStorage{
extraLabels: []string{
"env=prod",
"query=es=cape",
},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("end=%d&extra_label=env%%3Dprod&extra_label=query%%3Des%%3Dcape&query=%s&start=%d",
timestamp.Unix(), query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"extra params",
false,
&VMStorage{
extraParams: []Param{
{Key: "nocache", Value: "1"},
{Key: "max_lookback", Value: "1h"},
},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("max_lookback=1h&nocache=1&query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
}
expected = Metric{
Labels: []Label{{Value: "constantLine(10)", Name: "name"}},
Timestamp: 1611758403,
Value: 10,
}
if m[0].Timestamp != expected.Timestamp &&
m[0].Value != expected.Value &&
m[0].Labels[0].Value != expected.Labels[0].Value &&
m[0].Labels[0].Name != expected.Labels[0].Name {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
req, err := tc.vm.newRequestPOST()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
switch tc.vm.dataSourceType.name {
case "", prometheusType:
if tc.queryRange {
tc.vm.setPrometheusRangeReqParams(req, query, timestamp, timestamp)
} else {
tc.vm.setPrometheusInstantReqParams(req, query, timestamp)
}
case graphiteType:
tc.vm.setGraphiteReqParams(req, query, timestamp)
}
tc.checkFn(t, req)
})
}
}
func checkEqualString(t *testing.T, exp, got string) {
t.Helper()
if got != exp {
t.Errorf("expected to get %q; got %q", exp, got)
}
}
func expectError(t *testing.T, err error, exp string) {
t.Helper()
if err == nil {
t.Errorf("expected non-nil error")
}
if !strings.Contains(err.Error(), exp) {
t.Errorf("expected error %q to contain %q", err, exp)
}
}

View File

@@ -27,6 +27,9 @@ type Group struct {
Concurrency int
Checksum string
ExtraFilterLabels map[string]string
Labels map[string]string
doneCh chan struct{}
finishedCh chan struct{}
// channel accepts new Group obj
@@ -49,17 +52,37 @@ func newGroupMetrics(name, file string) *groupMetrics {
return m
}
func newGroup(cfg config.Group, defaultInterval time.Duration, labels map[string]string) *Group {
// merges group rule labels into result map
// set2 has priority over set1.
func mergeLabels(groupName, ruleName string, set1, set2 map[string]string) map[string]string {
r := map[string]string{}
for k, v := range set1 {
r[k] = v
}
for k, v := range set2 {
if prevV, ok := r[k]; ok {
logger.Infof("label %q=%q for rule %q.%q overwritten with external label %q=%q",
k, prevV, groupName, ruleName, k, v)
}
r[k] = v
}
return r
}
func newGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval time.Duration, labels map[string]string) *Group {
g := &Group{
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval,
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
updateCh: make(chan *Group),
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval.Duration(),
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
ExtraFilterLabels: cfg.ExtraFilterLabels,
Labels: cfg.Labels,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
updateCh: make(chan *Group),
}
g.metrics = newGroupMetrics(g.Name, g.File)
if g.Interval == 0 {
@@ -70,28 +93,31 @@ func newGroup(cfg config.Group, defaultInterval time.Duration, labels map[string
}
rules := make([]Rule, len(cfg.Rules))
for i, r := range cfg.Rules {
// override rule labels with external labels
for k, v := range labels {
if prevV, ok := r.Labels[k]; ok {
logger.Infof("label %q=%q for rule %q.%q overwritten with external label %q=%q",
k, prevV, g.Name, r.Name(), k, v)
}
if r.Labels == nil {
r.Labels = map[string]string{}
}
r.Labels[k] = v
var extraLabels map[string]string
// apply external labels
if len(labels) > 0 {
extraLabels = labels
}
rules[i] = g.newRule(r)
// apply group labels, it has priority on external labels
if len(cfg.Labels) > 0 {
extraLabels = mergeLabels(g.Name, r.Name(), extraLabels, g.Labels)
}
// apply rules labels, it has priority on other labels
if len(extraLabels) > 0 {
r.Labels = mergeLabels(g.Name, r.Name(), extraLabels, r.Labels)
}
rules[i] = g.newRule(qb, r)
}
g.Rules = rules
return g
}
func (g *Group) newRule(rule config.Rule) Rule {
func (g *Group) newRule(qb datasource.QuerierBuilder, rule config.Rule) Rule {
if rule.Alert != "" {
return newAlertingRule(g, rule)
return newAlertingRule(qb, g, rule)
}
return newRecordingRule(g, rule)
return newRecordingRule(qb, g, rule)
}
// ID return unique group ID that consists of
@@ -106,7 +132,8 @@ func (g *Group) ID() uint64 {
}
// Restore restores alerts state for group rules
func (g *Group) Restore(ctx context.Context, q datasource.Querier, lookback time.Duration, labels map[string]string) error {
func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, lookback time.Duration, labels map[string]string) error {
labels = mergeLabels(g.Name, "", labels, g.Labels)
for _, rule := range g.Rules {
rr, ok := rule.(*AlertingRule)
if !ok {
@@ -115,6 +142,9 @@ func (g *Group) Restore(ctx context.Context, q datasource.Querier, lookback time
if rr.For < 1 {
continue
}
// ignore g.ExtraFilterLabels on purpose, so it
// won't affect the restore procedure.
q := qb.BuildWithParams(datasource.QuerierParams{})
if err := rr.Restore(ctx, q, lookback, labels); err != nil {
return fmt.Errorf("error while restoring rule %q: %w", rule, err)
}
@@ -160,19 +190,18 @@ func (g *Group) updateWith(newGroup *Group) error {
for _, nr := range rulesRegistry {
newRules = append(newRules, nr)
}
// note that g.Interval is not updated here
// so the value can be compared later in
// group.Start function
g.Type = newGroup.Type
g.Concurrency = newGroup.Concurrency
g.ExtraFilterLabels = newGroup.ExtraFilterLabels
g.Labels = newGroup.Labels
g.Checksum = newGroup.Checksum
g.Rules = newRules
return nil
}
var (
alertsFired = metrics.NewCounter(`vmalert_alerts_fired_total`)
alertsSent = metrics.NewCounter(`vmalert_alerts_sent_total`)
alertsSendErrors = metrics.NewCounter(`vmalert_alerts_send_errors_total`)
)
func (g *Group) close() {
if g.doneCh == nil {
return
@@ -189,7 +218,7 @@ func (g *Group) close() {
var skipRandSleepOnGroupStart bool
func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []notifier.Notifier, rw *remotewrite.Client) {
func (g *Group) start(ctx context.Context, nts []notifier.Notifier, rw *remotewrite.Client) {
defer func() { close(g.finishedCh) }()
// Spread group rules evaluation over time in order to reduce load on VictoriaMetrics.
@@ -213,7 +242,16 @@ func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []not
}
logger.Infof("group %q started; interval=%v; concurrency=%d", g.Name, g.Interval, g.Concurrency)
e := &executor{querier, nts, rw}
e := &executor{rw: rw}
for _, nt := range nts {
ent := eNotifier{
Notifier: nt,
alertsSent: getOrCreateCounter(fmt.Sprintf("vmalert_alerts_sent_total{addr=%q}", nt.Addr())),
alertsSendErrors: getOrCreateCounter(fmt.Sprintf("vmalert_alerts_send_errors_total{addr=%q}", nt.Addr())),
}
e.notifiers = append(e.notifiers, ent)
}
t := time.NewTicker(g.Interval)
defer t.Stop()
for {
@@ -242,8 +280,8 @@ func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []not
case <-t.C:
g.metrics.iterationTotal.Inc()
iterationStart := time.Now()
errs := e.execConcurrently(ctx, g.Rules, g.Concurrency, g.Interval)
resolveDuration := getResolveDuration(g.Interval)
errs := e.execConcurrently(ctx, g.Rules, g.Concurrency, resolveDuration)
for err := range errs {
if err != nil {
logger.Errorf("group %q: %s", g.Name, err)
@@ -255,23 +293,34 @@ func (g *Group) start(ctx context.Context, querier datasource.Querier, nts []not
}
}
// resolveDuration for alerts is equal to 3 interval evaluations
// so in case if vmalert stops sending updates for some reason,
// notifier could automatically resolve the alert.
func getResolveDuration(groupInterval time.Duration) time.Duration {
resolveInterval := groupInterval * 3
if *maxResolveDuration > 0 && (resolveInterval > *maxResolveDuration) {
return *maxResolveDuration
}
return resolveInterval
}
type executor struct {
querier datasource.Querier
notifiers []notifier.Notifier
notifiers []eNotifier
rw *remotewrite.Client
}
func (e *executor) execConcurrently(ctx context.Context, rules []Rule, concurrency int, interval time.Duration) chan error {
res := make(chan error, len(rules))
var returnSeries bool
if e.rw != nil {
returnSeries = true
}
type eNotifier struct {
notifier.Notifier
alertsSent *counter
alertsSendErrors *counter
}
func (e *executor) execConcurrently(ctx context.Context, rules []Rule, concurrency int, resolveDuration time.Duration) chan error {
res := make(chan error, len(rules))
if concurrency == 1 {
// fast path
for _, rule := range rules {
res <- e.exec(ctx, rule, returnSeries, interval)
res <- e.exec(ctx, rule, resolveDuration)
}
close(res)
return res
@@ -284,7 +333,7 @@ func (e *executor) execConcurrently(ctx context.Context, rules []Rule, concurren
sem <- struct{}{}
wg.Add(1)
go func(r Rule) {
res <- e.exec(ctx, r, returnSeries, interval)
res <- e.exec(ctx, r, resolveDuration)
<-sem
wg.Done()
}(rule)
@@ -296,21 +345,18 @@ func (e *executor) execConcurrently(ctx context.Context, rules []Rule, concurren
}
var (
execTotal = metrics.NewCounter(`vmalert_execution_total`)
execErrors = metrics.NewCounter(`vmalert_execution_errors_total`)
execDuration = metrics.NewSummary(`vmalert_execution_duration_seconds`)
alertsFired = metrics.NewCounter(`vmalert_alerts_fired_total`)
execTotal = metrics.NewCounter(`vmalert_execution_total`)
execErrors = metrics.NewCounter(`vmalert_execution_errors_total`)
remoteWriteErrors = metrics.NewCounter(`vmalert_remotewrite_errors_total`)
)
func (e *executor) exec(ctx context.Context, rule Rule, returnSeries bool, interval time.Duration) error {
func (e *executor) exec(ctx context.Context, rule Rule, resolveDuration time.Duration) error {
execTotal.Inc()
execStart := time.Now()
defer func() {
execDuration.UpdateDuration(execStart)
}()
tss, err := rule.Exec(ctx, e.querier, returnSeries)
tss, err := rule.Exec(ctx)
if err != nil {
execErrors.Inc()
return fmt.Errorf("rule %q: failed to execute: %w", rule, err)
@@ -333,10 +379,7 @@ func (e *executor) exec(ctx context.Context, rule Rule, returnSeries bool, inter
for _, a := range ar.alerts {
switch a.State {
case notifier.StateFiring:
// set End to execStart + 3 intervals
// so notifier can resolve it automatically if `vmalert`
// won't be able to send resolve for some reason
a.End = time.Now().Add(3 * interval)
a.End = time.Now().Add(resolveDuration)
alerts = append(alerts, *a)
case notifier.StateInactive:
// set End to execStart to notify
@@ -349,11 +392,11 @@ func (e *executor) exec(ctx context.Context, rule Rule, returnSeries bool, inter
return nil
}
alertsSent.Add(len(alerts))
errGr := new(utils.ErrGroup)
for _, nt := range e.notifiers {
nt.alertsSent.Add(len(alerts))
if err := nt.Send(ctx, alerts); err != nil {
alertsSendErrors.Inc()
nt.alertsSendErrors.Inc()
errGr.Add(fmt.Errorf("rule %q: failed to send alerts: %w", rule, err))
}
}

View File

@@ -2,12 +2,15 @@ package main
import (
"context"
"fmt"
"sort"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
func init() {
@@ -32,7 +35,7 @@ func TestUpdateWith(t *testing.T) {
[]config.Rule{{
Alert: "foo",
Expr: "up > 0",
For: config.NewPromDuration(time.Second),
For: utils.NewPromDuration(time.Second),
Labels: map[string]string{
"bar": "baz",
},
@@ -44,7 +47,7 @@ func TestUpdateWith(t *testing.T) {
[]config.Rule{{
Alert: "foo",
Expr: "up > 10",
For: config.NewPromDuration(time.Second),
For: utils.NewPromDuration(time.Second),
Labels: map[string]string{
"baz": "bar",
},
@@ -105,20 +108,32 @@ func TestUpdateWith(t *testing.T) {
{Record: "foo5"},
},
},
{
"update datasource type",
[]config.Rule{
{Alert: "foo1", Type: datasource.NewPrometheusType()},
{Alert: "foo3", Type: datasource.NewGraphiteType()},
},
[]config.Rule{
{Alert: "foo1", Type: datasource.NewGraphiteType()},
{Alert: "foo10", Type: datasource.NewPrometheusType()},
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
g := &Group{Name: "test"}
qb := &fakeQuerier{}
for _, r := range tc.currentRules {
r.ID = config.HashRule(r)
g.Rules = append(g.Rules, g.newRule(r))
g.Rules = append(g.Rules, g.newRule(qb, r))
}
ng := &Group{Name: "test"}
for _, r := range tc.newRules {
r.ID = config.HashRule(r)
ng.Rules = append(ng.Rules, ng.newRule(r))
ng.Rules = append(ng.Rules, ng.newRule(qb, r))
}
err := g.updateWith(ng)
@@ -156,11 +171,11 @@ func TestGroupStart(t *testing.T) {
t.Fatalf("failed to parse rules: %s", err)
}
const evalInterval = time.Millisecond
g := newGroup(groups[0], evalInterval, map[string]string{"cluster": "east-1"})
g.Concurrency = 2
fn := &fakeNotifier{}
fs := &fakeQuerier{}
fn := &fakeNotifier{}
g := newGroup(groups[0], fs, evalInterval, map[string]string{"cluster": "east-1"})
g.Concurrency = 2
const inst1, inst2, job = "foo", "bar", "baz"
m1 := metricWithLabels(t, "instance", inst1, "job", job)
@@ -195,7 +210,7 @@ func TestGroupStart(t *testing.T) {
fs.add(m1)
fs.add(m2)
go func() {
g.start(context.Background(), fs, []notifier.Notifier{fn}, nil)
g.start(context.Background(), []notifier.Notifier{fn}, nil)
close(finished)
}()
@@ -221,3 +236,27 @@ func TestGroupStart(t *testing.T) {
g.close()
<-finished
}
func TestResolveDuration(t *testing.T) {
testCases := []struct {
groupInterval time.Duration
maxDuration time.Duration
expected time.Duration
}{
{time.Minute, 0, 3 * time.Minute},
{3 * time.Minute, 0, 9 * time.Minute},
{time.Minute, 2 * time.Minute, 2 * time.Minute},
{0, 0, 0},
}
defaultResolveDuration := *maxResolveDuration
defer func() { *maxResolveDuration = defaultResolveDuration }()
for _, tc := range testCases {
t.Run(fmt.Sprintf("%v-%v-%v", tc.groupInterval, tc.expected, tc.maxDuration), func(t *testing.T) {
*maxResolveDuration = tc.maxDuration
got := getResolveDuration(tc.groupInterval)
if got != tc.expected {
t.Errorf("expected to have %v; got %v", tc.expected, got)
}
})
}
}

View File

@@ -7,6 +7,7 @@ import (
"sort"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
@@ -38,7 +39,15 @@ func (fq *fakeQuerier) add(metrics ...datasource.Metric) {
fq.Unlock()
}
func (fq *fakeQuerier) Query(_ context.Context, _ string, _ datasource.Type) ([]datasource.Metric, error) {
func (fq *fakeQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fq
}
func (fq *fakeQuerier) QueryRange(ctx context.Context, q string, _, _ time.Time) ([]datasource.Metric, error) {
return fq.Query(ctx, q)
}
func (fq *fakeQuerier) Query(_ context.Context, _ string) ([]datasource.Metric, error) {
fq.Lock()
defer fq.Unlock()
if fq.err != nil {
@@ -54,6 +63,7 @@ type fakeNotifier struct {
alerts []notifier.Alert
}
func (*fakeNotifier) Addr() string { return "" }
func (fn *fakeNotifier) Send(_ context.Context, alerts []notifier.Alert) error {
fn.Lock()
defer fn.Unlock()
@@ -68,9 +78,16 @@ func (fn *fakeNotifier) getAlerts() []notifier.Alert {
}
func metricWithValueAndLabels(t *testing.T, value float64, labels ...string) datasource.Metric {
return metricWithValuesAndLabels(t, []float64{value}, labels...)
}
func metricWithValuesAndLabels(t *testing.T, values []float64, labels ...string) datasource.Metric {
t.Helper()
m := metricWithLabels(t, labels...)
m.Value = value
m.Values = values
for i := range values {
m.Timestamps = append(m.Timestamps, int64(i))
}
return m
}
@@ -79,7 +96,7 @@ func metricWithLabels(t *testing.T, labels ...string) datasource.Metric {
if len(labels) == 0 || len(labels)%2 != 0 {
t.Fatalf("expected to get even number of labels")
}
m := datasource.Metric{}
m := datasource.Metric{Values: []float64{1}, Timestamps: []int64{1}}
for i := 0; i < len(labels); i += 2 {
m.Labels = append(m.Labels, datasource.Label{
Name: labels[i],
@@ -160,6 +177,9 @@ func compareAlertingRules(t *testing.T, a, b *AlertingRule) error {
if !reflect.DeepEqual(a.Labels, b.Labels) {
return fmt.Errorf("expected to have labels %#v; got %#v", a.Labels, b.Labels)
}
if a.Type.String() != b.Type.String() {
return fmt.Errorf("expected to have Type %#v; got %#v", a.Type.String(), b.Type.String())
}
return nil
}

View File

@@ -34,11 +34,16 @@ Examples:
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
rulesCheckInterval = flag.Duration("rule.configCheckInterval", 0, "Interval for checking for changes in '-rule' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
validateTemplates = flag.Bool("rule.validateTemplates", true, "Whether to validate annotation and label templates")
validateExpressions = flag.Bool("rule.validateExpressions", true, "Whether to validate rules expressions via MetricsQL engine")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
"which is by default equal to 3 evaluation intervals of the parent group.")
externalURL = flag.String("external.url", "", "External URL is used as alert's source for sent alerts to the notifier")
externalAlertSource = flag.String("external.alert.source", "", `External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used`)
@@ -47,6 +52,9 @@ eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup.")
disableAlertGroupLabel = flag.Bool("disableAlertgroupLabel", false, "Whether to disable adding group's name as label to generated alerts and time series.")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmalert. The rules file are validated. The `-rule` flag must be specified.")
)
@@ -64,42 +72,57 @@ func main() {
notifier.InitTemplateFunc(u)
groups, err := config.Parse(*rulePath, true, true)
if err != nil {
logger.Fatalf(err.Error())
logger.Fatalf("failed to parse %q: %s", *rulePath, err)
}
if len(groups) == 0 {
logger.Fatalf("No rules for validation. Please specify path to file(s) with alerting and/or recording rules using `-rule` flag")
}
return
}
if *replayFrom != "" || *replayTo != "" {
rw, err := remotewrite.Init(context.Background())
if err != nil {
logger.Fatalf("failed to init remoteWrite: %s", err)
}
eu, err := getExternalURL(*externalURL, *httpListenAddr, httpserver.IsTLS())
if err != nil {
logger.Fatalf("failed to init `external.url`: %s", err)
}
notifier.InitTemplateFunc(eu)
groupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
if err != nil {
logger.Fatalf("cannot parse configuration file: %s", err)
}
// prevent queries from caching and boundaries aligning
// when querying VictoriaMetrics datasource.
noCache := datasource.Param{Key: "nocache", Value: "1"}
q, err := datasource.Init([]datasource.Param{noCache})
if err != nil {
logger.Fatalf("failed to init datasource: %s", err)
}
if err := replay(groupsCfg, q, rw); err != nil {
logger.Fatalf("replay failed: %s", err)
}
return
}
ctx, cancel := context.WithCancel(context.Background())
manager, err := newManager(ctx)
if err != nil {
logger.Fatalf("failed to init: %s", err)
}
if err := manager.start(ctx, *rulePath, *validateTemplates, *validateExpressions); err != nil {
logger.Infof("reading rules configuration file from %q", strings.Join(*rulePath, ";"))
groupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
if err != nil {
logger.Fatalf("cannot parse configuration file: %s", err)
}
if err := manager.start(ctx, groupsCfg); err != nil {
logger.Fatalf("failed to start: %s", err)
}
go func() {
// init reload metrics with positive values to improve alerting conditions
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
sigHup := procutil.NewSighupChan()
for {
<-sigHup
configReloads.Inc()
logger.Infof("SIGHUP received. Going to reload rules %q ...", *rulePath)
if err := manager.update(ctx, *rulePath, *validateTemplates, *validateExpressions, false); err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)
logger.Errorf("error while reloading rules: %s", err)
continue
}
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
logger.Infof("Rules reloaded successfully from %q", *rulePath)
}
}()
go configReload(ctx, manager, groupsCfg)
rh := &requestHandler{m: manager}
go httpserver.Serve(*httpListenAddr, rh.handler)
@@ -121,7 +144,7 @@ var (
)
func newManager(ctx context.Context) (*manager, error) {
q, err := datasource.Init()
q, err := datasource.Init(nil)
if err != nil {
return nil, fmt.Errorf("failed to init datasource: %w", err)
}
@@ -140,10 +163,10 @@ func newManager(ctx context.Context) (*manager, error) {
}
manager := &manager{
groups: make(map[uint64]*Group),
querier: q,
notifiers: nts,
labels: map[string]string{},
groups: make(map[uint64]*Group),
querierBuilder: q,
notifiers: nts,
labels: map[string]string{},
}
rw, err := remotewrite.Init(ctx)
if err != nil {
@@ -218,7 +241,71 @@ func usage() {
const s = `
vmalert processes alerts and recording rules.
See the docs at https://victoriametrics.github.io/vmalert.html .
See the docs at https://docs.victoriametrics.com/vmalert.html .
`
flagutil.Usage(s)
}
func configReload(ctx context.Context, m *manager, groupsCfg []config.Group) {
// Register SIGHUP handler for config re-read just before manager.start call.
// This guarantees that the config will be re-read if the signal arrives during manager.start call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
var configCheckCh <-chan time.Time
if *rulesCheckInterval > 0 {
ticker := time.NewTicker(*rulesCheckInterval)
configCheckCh = ticker.C
defer ticker.Stop()
}
// init reload metrics with positive values to improve alerting conditions
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
for {
select {
case <-ctx.Done():
return
case <-sighupCh:
logger.Infof("SIGHUP received. Going to reload rules %q ...", *rulePath)
configReloads.Inc()
case <-configCheckCh:
}
newGroupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
if err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)
logger.Errorf("cannot parse configuration file: %s", err)
continue
}
if configsEqual(newGroupsCfg, groupsCfg) {
// set success to 1 since previous reload
// could have been unsuccessful
configSuccess.Set(1)
// config didn't change - skip it
continue
}
groupsCfg = newGroupsCfg
if err := m.update(ctx, groupsCfg, false); err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)
logger.Errorf("error while reloading rules: %s", err)
continue
}
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
logger.Infof("Rules reloaded successfully from %q", *rulePath)
}
}
func configsEqual(a, b []config.Group) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i].Checksum != b[i].Checksum {
return false
}
}
return true
}

View File

@@ -1,12 +1,16 @@
package main
import (
"context"
"fmt"
"io/ioutil"
"net/url"
"os"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
func TestGetExternalURL(t *testing.T) {
@@ -51,3 +55,102 @@ func TestGetAlertURLGenerator(t *testing.T) {
t.Errorf("unexpected url want %s, got %s", exp, fn(testAlert))
}
}
func TestConfigReload(t *testing.T) {
originalRulePath := *rulePath
defer func() {
*rulePath = originalRulePath
}()
const (
rules1 = `
groups:
- name: group-1
rules:
- alert: ExampleAlertAlwaysFiring
expr: sum by(job) (up == 1)
- record: handler:requests:rate5m
expr: sum(rate(prometheus_http_requests_total[5m])) by (handler)
`
rules2 = `
groups:
- name: group-1
rules:
- alert: ExampleAlertAlwaysFiring
expr: sum by(job) (up == 1)
- name: group-2
rules:
- record: handler:requests:rate5m
expr: sum(rate(prometheus_http_requests_total[5m])) by (handler)
`
)
f, err := ioutil.TempFile("", "")
if err != nil {
t.Fatal(err)
}
writeToFile(t, f.Name(), rules1)
*rulesCheckInterval = 200 * time.Millisecond
*rulePath = []string{f.Name()}
ctx, cancel := context.WithCancel(context.Background())
m := &manager{
querierBuilder: &fakeQuerier{},
groups: make(map[uint64]*Group),
labels: map[string]string{},
}
syncCh := make(chan struct{})
go func() {
configReload(ctx, m, nil)
close(syncCh)
}()
lenLocked := func(m *manager) int {
m.groupsMu.RLock()
defer m.groupsMu.RUnlock()
return len(m.groups)
}
time.Sleep(*rulesCheckInterval * 2)
groupsLen := lenLocked(m)
if groupsLen != 1 {
t.Fatalf("expected to have exactly 1 group loaded; got %d", groupsLen)
}
writeToFile(t, f.Name(), rules2)
time.Sleep(*rulesCheckInterval * 2)
groupsLen = lenLocked(m)
if groupsLen != 2 {
fmt.Println(m.groups)
t.Fatalf("expected to have exactly 2 groups loaded; got %d", groupsLen)
}
writeToFile(t, f.Name(), rules1)
procutil.SelfSIGHUP()
time.Sleep(*rulesCheckInterval / 2)
groupsLen = lenLocked(m)
if groupsLen != 1 {
t.Fatalf("expected to have exactly 1 group loaded; got %d", groupsLen)
}
writeToFile(t, f.Name(), `corrupted`)
procutil.SelfSIGHUP()
time.Sleep(*rulesCheckInterval / 2)
groupsLen = lenLocked(m)
if groupsLen != 1 { // should remain unchanged
t.Fatalf("expected to have exactly 1 group loaded; got %d", groupsLen)
}
cancel()
<-syncCh
}
func writeToFile(t *testing.T, file, b string) {
t.Helper()
err := ioutil.WriteFile(file, []byte(b), 0644)
if err != nil {
t.Fatal(err)
}
}

View File

@@ -3,7 +3,6 @@ package main
import (
"context"
"fmt"
"strings"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
@@ -15,11 +14,12 @@ import (
// manager controls group states
type manager struct {
querier datasource.Querier
notifiers []notifier.Notifier
querierBuilder datasource.QuerierBuilder
notifiers []notifier.Notifier
rw *remotewrite.Client
rr datasource.Querier
// remote read builder.
rr datasource.QuerierBuilder
wg sync.WaitGroup
labels map[string]string
@@ -49,8 +49,8 @@ func (m *manager) AlertAPI(gID, aID uint64) (*APIAlert, error) {
return nil, fmt.Errorf("can't find alert with id %q in group %q", aID, g.Name)
}
func (m *manager) start(ctx context.Context, path []string, validateTpl, validateExpr bool) error {
return m.update(ctx, path, validateTpl, validateExpr, true)
func (m *manager) start(ctx context.Context, groupsCfg []config.Group) error {
return m.update(ctx, groupsCfg, true)
}
func (m *manager) close() {
@@ -63,10 +63,13 @@ func (m *manager) close() {
m.wg.Wait()
}
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) {
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) error {
if restore && m.rr != nil {
err := group.Restore(ctx, m.rr, *remoteReadLookBack, m.labels)
if err != nil {
if !*remoteReadIgnoreRestoreErrors {
return fmt.Errorf("failed to restore state for group %q: %w", group.Name, err)
}
logger.Errorf("error while restoring state for group %q: %s", group.Name, err)
}
}
@@ -74,22 +77,17 @@ func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) {
m.wg.Add(1)
id := group.ID()
go func() {
group.start(ctx, m.querier, m.notifiers, m.rw)
group.start(ctx, m.notifiers, m.rw)
m.wg.Done()
}()
m.groups[id] = group
return nil
}
func (m *manager) update(ctx context.Context, path []string, validateTpl, validateExpr, restore bool) error {
logger.Infof("reading rules configuration file from %q", strings.Join(path, ";"))
groupsCfg, err := config.Parse(path, validateTpl, validateExpr)
if err != nil {
return fmt.Errorf("cannot parse configuration file: %w", err)
}
func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore bool) error {
groupsRegistry := make(map[uint64]*Group)
for _, cfg := range groupsCfg {
ng := newGroup(cfg, *evaluationInterval, m.labels)
ng := newGroup(cfg, m.querierBuilder, *evaluationInterval, m.labels)
groupsRegistry[ng.ID()] = ng
}
@@ -116,7 +114,9 @@ func (m *manager) update(ctx context.Context, path []string, validateTpl, valida
}
}
for _, ng := range groupsRegistry {
m.startGroup(ctx, ng, restore)
if err := m.startGroup(ctx, ng, restore); err != nil {
return err
}
}
m.groupsMu.Unlock()
@@ -140,12 +140,15 @@ func (g *Group) toAPI() APIGroup {
ag := APIGroup{
// encode as string to avoid rounding
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.String(),
Concurrency: g.Concurrency,
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.String(),
Concurrency: g.Concurrency,
ExtraFilterLabels: g.ExtraFilterLabels,
Labels: g.Labels,
}
for _, r := range g.Rules {
switch v := r.(type) {

View File

@@ -9,8 +9,8 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
)
@@ -25,9 +25,8 @@ func TestMain(m *testing.M) {
// starting with empty rules folder
func TestManagerEmptyRulesDir(t *testing.T) {
m := &manager{groups: make(map[uint64]*Group)}
path := []string{"foo/bar"}
err := m.update(context.Background(), path, true, true, false)
if err != nil {
cfg := loadCfg(t, []string{"foo/bar"}, true, true)
if err := m.update(context.Background(), cfg, false); err != nil {
t.Fatalf("expected to load succesfully with empty rules dir; got err instead: %v", err)
}
}
@@ -37,9 +36,9 @@ func TestManagerEmptyRulesDir(t *testing.T) {
// Should be executed with -race flag
func TestManagerUpdateConcurrent(t *testing.T) {
m := &manager{
groups: make(map[uint64]*Group),
querier: &fakeQuerier{},
notifiers: []notifier.Notifier{&fakeNotifier{}},
groups: make(map[uint64]*Group),
querierBuilder: &fakeQuerier{},
notifiers: []notifier.Notifier{&fakeNotifier{}},
}
paths := []string{
"config/testdata/dir/rules0-good.rules",
@@ -50,8 +49,11 @@ func TestManagerUpdateConcurrent(t *testing.T) {
"config/testdata/rules1-good.rules",
"config/testdata/rules2-good.rules",
}
evalInterval := *evaluationInterval
defer func() { *evaluationInterval = evalInterval }()
*evaluationInterval = time.Millisecond
if err := m.start(context.Background(), []string{paths[0]}, true, true); err != nil {
cfg := loadCfg(t, []string{paths[0]}, true, true)
if err := m.start(context.Background(), cfg); err != nil {
t.Fatalf("failed to start: %s", err)
}
@@ -64,8 +66,11 @@ func TestManagerUpdateConcurrent(t *testing.T) {
defer wg.Done()
for i := 0; i < iterations; i++ {
rnd := rand.Intn(len(paths))
path := []string{paths[rnd]}
_ = m.update(context.Background(), path, true, true, false)
cfg, err := config.Parse([]string{paths[rnd]}, true, true)
if err != nil { // update can fail and this is expected
continue
}
_ = m.update(context.Background(), cfg, false)
}
}()
}
@@ -143,7 +148,7 @@ func TestManagerUpdate(t *testing.T) {
Name: "VMRows",
Expr: "vm_rows > 0",
For: 5 * time.Minute,
Labels: map[string]string{"label": "bar"},
Labels: map[string]string{"dc": "gcp", "label": "bar"},
Annotations: map[string]string{
"summary": "{{ $value }}",
"description": "{{$labels}}",
@@ -242,14 +247,17 @@ func TestManagerUpdate(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, cancel := context.WithCancel(context.TODO())
m := &manager{groups: make(map[uint64]*Group), querier: &fakeQuerier{}}
path := []string{tc.initPath}
if err := m.update(ctx, path, true, true, false); err != nil {
m := &manager{groups: make(map[uint64]*Group), querierBuilder: &fakeQuerier{}}
cfgInit := loadCfg(t, []string{tc.initPath}, true, true)
if err := m.update(ctx, cfgInit, false); err != nil {
t.Fatalf("failed to complete initial rules update: %s", err)
}
path = []string{tc.updatePath}
_ = m.update(ctx, path, true, true, false)
cfgUpdate, err := config.Parse([]string{tc.updatePath}, true, true)
if err == nil { // update can fail and that's expected
_ = m.update(ctx, cfgUpdate, false)
}
if len(tc.want) != len(m.groups) {
t.Fatalf("\nwant number of groups: %d;\ngot: %d ", len(tc.want), len(m.groups))
}
@@ -267,3 +275,12 @@ func TestManagerUpdate(t *testing.T) {
})
}
}
func loadCfg(t *testing.T, path []string, validateAnnotations, validateExpressions bool) []config.Group {
t.Helper()
cfg, err := config.Parse(path, validateAnnotations, validateExpressions)
if err != nil {
t.Fatal(err)
}
return cfg
}

View File

@@ -14,17 +14,26 @@ import (
// Alert the triggered alert
// TODO: Looks like alert name isn't unique
type Alert struct {
GroupID uint64
Name string
Labels map[string]string
// GroupID contains the ID of the parent rules group
GroupID uint64
// Name represents Alert name
Name string
// Labels is the list of label-value pairs attached to the Alert
Labels map[string]string
// Annotations is the list of annotations generated on Alert evaluation
Annotations map[string]string
State AlertState
Expr string
// State represents the current state of the Alert
State AlertState
// Expr contains expression that was executed to generate the Alert
Expr string
// Start defines the moment of time when Alert has triggered
Start time.Time
End time.Time
// End defines the moment of time when Alert supposed to expire
End time.Time
// Value stores the value returned from evaluating expression from Expr field
Value float64
ID uint64
// ID is the unique identifer for the Alert
ID uint64
}
// AlertState type indicates the Alert state
@@ -90,13 +99,13 @@ func templateAnnotations(annotations map[string]string, data AlertTplData, funcs
eg := new(utils.ErrGroup)
r := make(map[string]string, len(annotations))
for key, text := range annotations {
r[key] = text
buf.Reset()
builder.Reset()
builder.Grow(len(tplHeader) + len(text))
builder.WriteString(tplHeader)
builder.WriteString(text)
if err := templateAnnotation(&buf, builder.String(), data, funcs); err != nil {
r[key] = text
eg.Add(fmt.Errorf("key %q, template %q: %w", key, text, err))
continue
}

View File

@@ -83,14 +83,16 @@ func TestAlert_ExecTemplate(t *testing.T) {
{Name: "foo", Value: "bar"},
{Name: "baz", Value: "qux"},
},
Value: 1,
Values: []float64{1},
Timestamps: []int64{1},
},
{
Labels: []datasource.Label{
{Name: "foo", Value: "garply"},
{Name: "baz", Value: "fred"},
},
Value: 2,
Values: []float64{2},
Timestamps: []int64{1},
},
}, nil
}

View File

@@ -12,6 +12,7 @@ import (
// AlertManager represents integration provider with Prometheus alert manager
// https://github.com/prometheus/alertmanager
type AlertManager struct {
addr string
alertURL string
basicAuthUser string
basicAuthPass string
@@ -19,6 +20,9 @@ type AlertManager struct {
client *http.Client
}
// Addr returns address where alerts are sent.
func (am AlertManager) Addr() string { return am.addr }
// Send an alert or resolve message
func (am *AlertManager) Send(ctx context.Context, alerts []Alert) error {
b := &bytes.Buffer{}
@@ -57,9 +61,10 @@ const alertManagerPath = "/api/v2/alerts"
// NewAlertManager is a constructor for AlertManager
func NewAlertManager(alertManagerURL, user, pass string, fn AlertURLGenerator, c *http.Client) *AlertManager {
addr := strings.TrimSuffix(alertManagerURL, "/") + alertManagerPath
url := strings.TrimSuffix(alertManagerURL, "/") + alertManagerPath
return &AlertManager{
alertURL: addr,
addr: alertManagerURL,
alertURL: url,
argFunc: fn,
client: c,
basicAuthUser: user,

View File

@@ -10,6 +10,14 @@ import (
"time"
)
func TestAlertManager_Addr(t *testing.T) {
const addr = "http://localhost"
am := NewAlertManager(addr, "", "", nil, nil)
if am.Addr() != addr {
t.Errorf("expected to have %q; got %q", addr, am.Addr())
}
}
func TestAlertManager_Send(t *testing.T) {
const baUser, baPass = "foo", "bar"
mux := http.NewServeMux()

View File

@@ -2,7 +2,12 @@ package notifier
import "context"
// Notifier is common interface for alert manager provider
// Notifier is a common interface for alert manager provider
type Notifier interface {
// Send sends the given list of alerts.
// Returns an error if fails to send the alerts.
// Must unblock if the given ctx is cancelled.
Send(ctx context.Context, alerts []Alert) error
// Addr returns address where alerts are sent.
Addr() string
}

View File

@@ -28,44 +28,73 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
)
// metric is private copy of datasource.Metric,
// it is used for templating annotations,
// Labels as map simplifies templates evaluation.
type metric struct {
Labels map[string]string
Timestamp int64
Value float64
}
// datasourceMetricsToTemplateMetrics converts Metrics from datasource package to private copy for templating.
func datasourceMetricsToTemplateMetrics(ms []datasource.Metric) []metric {
mss := make([]metric, 0, len(ms))
for _, m := range ms {
labelsMap := make(map[string]string, len(m.Labels))
for _, labelValue := range m.Labels {
labelsMap[labelValue.Name] = labelValue.Value
}
mss = append(mss, metric{
Labels: labelsMap,
Timestamp: m.Timestamps[0],
Value: m.Values[0]})
}
return mss
}
// QueryFn is used to wrap a call to datasource into simple-to-use function
// for templating functions.
type QueryFn func(query string) ([]datasource.Metric, error)
func funcsWithQuery(query QueryFn) textTpl.FuncMap {
fm := make(textTpl.FuncMap)
for k, fn := range tmplFunc {
fm[k] = fn
}
fm["query"] = func(q string) ([]datasource.Metric, error) {
return query(q)
}
return fm
}
var tmplFunc textTpl.FuncMap
// InitTemplateFunc initiates template helper functions
func InitTemplateFunc(externalURL *url.URL) {
tmplFunc = textTpl.FuncMap{
"args": func(args ...interface{}) map[string]interface{} {
result := make(map[string]interface{})
for i, a := range args {
result[fmt.Sprintf("arg%d", i)] = a
}
return result
},
/* Strings */
// reReplaceAll ReplaceAllString returns a copy of src, replacing matches of the Regexp with
// the replacement string repl. Inside repl, $ signs are interpreted as in Expand,
// so for instance $1 represents the text of the first submatch.
// alias for https://golang.org/pkg/regexp/#Regexp.ReplaceAllString
"reReplaceAll": func(pattern, repl, text string) string {
re := regexp.MustCompile(pattern)
return re.ReplaceAllString(text, repl)
},
"safeHtml": func(text string) htmlTpl.HTML {
return htmlTpl.HTML(text)
},
"match": regexp.MatchString,
"title": strings.Title,
// match reports whether the string s
// contains any match of the regular expression pattern.
// alias for https://golang.org/pkg/regexp/#MatchString
"match": regexp.MatchString,
// title returns a copy of the string s with all Unicode letters
// that begin words mapped to their Unicode title case.
// alias for https://golang.org/pkg/strings/#Title
"title": strings.Title,
// toUpper returns s with all Unicode letters mapped to their upper case.
// alias for https://golang.org/pkg/strings/#ToUpper
"toUpper": strings.ToUpper,
// toLower returns s with all Unicode letters mapped to their lower case.
// alias for https://golang.org/pkg/strings/#ToLower
"toLower": strings.ToLower,
/* Numbers */
// humanize converts given number to a human readable format
// by adding metric prefixes https://en.wikipedia.org/wiki/Metric_prefix
"humanize": func(v float64) string {
if v == 0 || math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -91,6 +120,8 @@ func InitTemplateFunc(externalURL *url.URL) {
}
return fmt.Sprintf("%.4g%s", v, prefix)
},
// humanize1024 converts given number to a human readable format with 1024 as base
"humanize1024": func(v float64) string {
if math.Abs(v) <= 1 || math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -105,6 +136,8 @@ func InitTemplateFunc(externalURL *url.URL) {
}
return fmt.Sprintf("%.4g%s", v, prefix)
},
// humanizeDuration converts given seconds to a human readable duration
"humanizeDuration": func(v float64) string {
if math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -145,9 +178,13 @@ func InitTemplateFunc(externalURL *url.URL) {
}
return fmt.Sprintf("%.4g%ss", v, prefix)
},
// humanizePercentage converts given ratio value to a fraction of 100
"humanizePercentage": func(v float64) string {
return fmt.Sprintf("%.4g%%", v*100)
},
// humanizeTimestamp converts given timestamp to a human readable time equivalent
"humanizeTimestamp": func(v float64) string {
if math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
@@ -155,48 +192,115 @@ func InitTemplateFunc(externalURL *url.URL) {
t := TimeFromUnixNano(int64(v * 1e9)).Time().UTC()
return fmt.Sprint(t)
},
"pathPrefix": func() string {
return externalURL.Path
},
/* URLs */
// externalURL returns value of `external.url` flag
"externalURL": func() string {
return externalURL.String()
},
// pathPrefix returns a Path segment from the URL value in `external.url` flag
"pathPrefix": func() string {
return externalURL.Path
},
// pathEscape escapes the string so it can be safely placed inside a URL path segment,
// replacing special characters (including /) with %XX sequences as needed.
// alias for https://golang.org/pkg/net/url/#PathEscape
"pathEscape": func(u string) string {
return url.PathEscape(u)
},
// queryEscape escapes the string so it can be safely placed
// inside a URL query.
// alias for https://golang.org/pkg/net/url/#QueryEscape
"queryEscape": func(q string) string {
return url.QueryEscape(q)
},
// crlfEscape replaces new line chars to skip URL encoding.
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
"crlfEscape": func(q string) string {
q = strings.Replace(q, "\n", `\n`, -1)
return strings.Replace(q, "\r", `\r`, -1)
},
// quotesEscape escapes quote char
"quotesEscape": func(q string) string {
return strings.Replace(q, `"`, `\"`, -1)
},
// query function supposed to be substituted at funcsWithQuery().
// it is present here only for validation purposes, when there is no
// provided datasource.
"query": func(q string) ([]datasource.Metric, error) {
// query executes the MetricsQL/PromQL query against
// configured `datasource.url` address.
// For example, {{ query "foo" | first | value }} will
// execute "/api/v1/query?query=foo" request and will return
// the first value in response.
"query": func(q string) ([]metric, error) {
// query function supposed to be substituted at funcsWithQuery().
// it is present here only for validation purposes, when there is no
// provided datasource.
//
// return non-empty slice to pass validation with chained functions in template
// see issue #989 for details
return []datasource.Metric{{}}, nil
return []metric{{}}, nil
},
"first": func(metrics []datasource.Metric) (datasource.Metric, error) {
// first returns the first by order element from the given metrics list.
// usually used alongside with `query` template function.
"first": func(metrics []metric) (metric, error) {
if len(metrics) > 0 {
return metrics[0], nil
}
return datasource.Metric{}, errors.New("first() called on vector with no elements")
return metric{}, errors.New("first() called on vector with no elements")
},
"label": func(label string, m datasource.Metric) string {
return m.Label(label)
// label returns the value of the given label name for the given metric.
// usually used alongside with `query` template function.
"label": func(label string, m metric) string {
return m.Labels[label]
},
"value": func(m datasource.Metric) float64 {
// value returns the value of the given metric.
// usually used alongside with `query` template function.
"value": func(m metric) float64 {
return m.Value
},
/* Helpers */
// Converts a list of objects to a map with keys arg0, arg1 etc.
// This is intended to allow multiple arguments to be passed to templates.
"args": func(args ...interface{}) map[string]interface{} {
result := make(map[string]interface{})
for i, a := range args {
result[fmt.Sprintf("arg%d", i)] = a
}
return result
},
// safeHtml marks string as HTML not requiring auto-escaping.
"safeHtml": func(text string) htmlTpl.HTML {
return htmlTpl.HTML(text)
},
}
}
func funcsWithQuery(query QueryFn) textTpl.FuncMap {
fm := make(textTpl.FuncMap)
for k, fn := range tmplFunc {
fm[k] = fn
}
fm["query"] = func(q string) ([]metric, error) {
result, err := query(q)
if err != nil {
return nil, err
}
return datasourceMetricsToTemplateMetrics(result), nil
}
return fm
}
// Time is the number of milliseconds since the epoch
// (1970-01-01 00:00 UTC) excluding leap seconds.
type Time int64

View File

@@ -3,8 +3,8 @@ package main
import (
"context"
"fmt"
"hash/fnv"
"sort"
"strings"
"sync"
"time"
@@ -25,6 +25,8 @@ type RecordingRule struct {
Labels map[string]string
GroupID uint64
q datasource.Querier
// guard status fields
mu sync.RWMutex
// stores last moment of time Exec was called
@@ -33,12 +35,16 @@ type RecordingRule struct {
// resets on every successful Exec
// may be used as Health state
lastExecError error
// stores the number of samples returned during
// the last evaluation
lastExecSamples int
metrics *recordingRuleMetrics
}
type recordingRuleMetrics struct {
errors *gauge
errors *gauge
samples *gauge
}
// String implements Stringer interface
@@ -52,7 +58,7 @@ func (rr *RecordingRule) ID() uint64 {
return rr.RuleID
}
func newRecordingRule(group *Group, cfg config.Rule) *RecordingRule {
func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *RecordingRule {
rr := &RecordingRule{
Type: cfg.Type,
RuleID: cfg.ID,
@@ -61,72 +67,108 @@ func newRecordingRule(group *Group, cfg config.Rule) *RecordingRule {
Labels: cfg.Labels,
GroupID: group.ID(),
metrics: &recordingRuleMetrics{},
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: &cfg.Type,
EvaluationInterval: group.Interval,
ExtraLabels: group.ExtraFilterLabels,
}),
}
labels := fmt.Sprintf(`recording=%q, group=%q, id="%d"`, rr.Name, group.Name, rr.ID())
rr.metrics.errors = getOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_error{%s}`, labels),
func() float64 {
rr.mu.Lock()
defer rr.mu.Unlock()
rr.mu.RLock()
defer rr.mu.RUnlock()
if rr.lastExecError == nil {
return 0
}
return 1
})
rr.metrics.samples = getOrCreateGauge(fmt.Sprintf(`vmalert_recording_rules_last_evaluation_samples{%s}`, labels),
func() float64 {
rr.mu.RLock()
defer rr.mu.RUnlock()
return float64(rr.lastExecSamples)
})
return rr
}
// Close unregisters rule metrics
func (rr *RecordingRule) Close() {
metrics.UnregisterMetric(rr.metrics.errors.name)
metrics.UnregisterMetric(rr.metrics.samples.name)
}
// Exec executes RecordingRule expression via the given Querier.
func (rr *RecordingRule) Exec(ctx context.Context, q datasource.Querier, series bool) ([]prompbmarshal.TimeSeries, error) {
if !series {
return nil, nil
}
qMetrics, err := q.Query(ctx, rr.Expr, rr.Type)
rr.mu.Lock()
defer rr.mu.Unlock()
rr.lastExecTime = time.Now()
rr.lastExecError = err
// ExecRange executes recording rule on the given time range similarly to Exec.
// It doesn't update internal states of the Rule and meant to be used just
// to get time series for backfilling.
func (rr *RecordingRule) ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
series, err := rr.q.QueryRange(ctx, rr.Expr, start, end)
if err != nil {
return nil, fmt.Errorf("failed to execute query %q: %w", rr.Expr, err)
return nil, err
}
duplicates := make(map[uint64]prompbmarshal.TimeSeries, len(qMetrics))
duplicates := make(map[string]struct{}, len(series))
var tss []prompbmarshal.TimeSeries
for _, r := range qMetrics {
ts := rr.toTimeSeries(r, rr.lastExecTime)
h := hashTimeSeries(ts)
if _, ok := duplicates[h]; ok {
rr.lastExecError = errDuplicate
return nil, errDuplicate
for _, s := range series {
ts := rr.toTimeSeries(s)
key := stringifyLabels(ts)
if _, ok := duplicates[key]; ok {
return nil, fmt.Errorf("original metric %v; resulting labels %q: %w", s.Labels, key, errDuplicate)
}
duplicates[h] = ts
duplicates[key] = struct{}{}
tss = append(tss, ts)
}
return tss, nil
}
func hashTimeSeries(ts prompbmarshal.TimeSeries) uint64 {
hash := fnv.New64a()
labels := ts.Labels
sort.Slice(labels, func(i, j int) bool {
return labels[i].Name < labels[j].Name
})
for _, l := range labels {
hash.Write([]byte(l.Name))
hash.Write([]byte(l.Value))
hash.Write([]byte("\xff"))
// Exec executes RecordingRule expression via the given Querier.
func (rr *RecordingRule) Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := rr.q.Query(ctx, rr.Expr)
rr.mu.Lock()
defer rr.mu.Unlock()
rr.lastExecTime = time.Now()
rr.lastExecError = err
rr.lastExecSamples = len(qMetrics)
if err != nil {
return nil, fmt.Errorf("failed to execute query %q: %w", rr.Expr, err)
}
return hash.Sum64()
duplicates := make(map[string]struct{}, len(qMetrics))
var tss []prompbmarshal.TimeSeries
for _, r := range qMetrics {
ts := rr.toTimeSeries(r)
key := stringifyLabels(ts)
if _, ok := duplicates[key]; ok {
rr.lastExecError = errDuplicate
return nil, fmt.Errorf("original metric %v; resulting labels %q: %w", r, key, errDuplicate)
}
duplicates[key] = struct{}{}
tss = append(tss, ts)
}
return tss, nil
}
func (rr *RecordingRule) toTimeSeries(m datasource.Metric, timestamp time.Time) prompbmarshal.TimeSeries {
func stringifyLabels(ts prompbmarshal.TimeSeries) string {
labels := ts.Labels
if len(labels) > 1 {
sort.Slice(labels, func(i, j int) bool {
return labels[i].Name < labels[j].Name
})
}
b := strings.Builder{}
for i, l := range labels {
b.WriteString(l.Name)
b.WriteString("=")
b.WriteString(l.Value)
if i != len(labels)-1 {
b.WriteString(",")
}
}
return b.String()
}
func (rr *RecordingRule) toTimeSeries(m datasource.Metric) prompbmarshal.TimeSeries {
labels := make(map[string]string)
for _, l := range m.Labels {
labels[l.Name] = l.Value
@@ -136,12 +178,10 @@ func (rr *RecordingRule) toTimeSeries(m datasource.Metric, timestamp time.Time)
for k, v := range rr.Labels {
labels[k] = v
}
return newTimeSeries(m.Value, labels, timestamp)
return newTimeSeries(m.Values, m.Timestamps, labels)
}
// UpdateWith copies all significant fields.
// alerts state isn't copied since
// it should be updated in next 2 Execs
func (rr *RecordingRule) UpdateWith(r Rule) error {
nr, ok := r.(*RecordingRule)
if !ok {
@@ -149,6 +189,7 @@ func (rr *RecordingRule) UpdateWith(r Rule) error {
}
rr.Expr = nr.Expr
rr.Labels = nr.Labels
rr.q = nr.q
return nil
}
@@ -161,13 +202,14 @@ func (rr *RecordingRule) RuleAPI() APIRecordingRule {
}
return APIRecordingRule{
// encode as strings to avoid rounding
ID: fmt.Sprintf("%d", rr.ID()),
GroupID: fmt.Sprintf("%d", rr.GroupID),
Name: rr.Name,
Type: rr.Type.String(),
Expression: rr.Expr,
LastError: lastErr,
LastExec: rr.lastExecTime,
Labels: rr.Labels,
ID: fmt.Sprintf("%d", rr.ID()),
GroupID: fmt.Sprintf("%d", rr.GroupID),
Name: rr.Name,
Type: rr.Type.String(),
Expression: rr.Expr,
LastError: lastErr,
LastSamples: rr.lastExecSamples,
LastExec: rr.lastExecTime,
Labels: rr.Labels,
}
}

View File

@@ -11,7 +11,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
func TestRecoridngRule_ToTimeSeries(t *testing.T) {
func TestRecoridngRule_Exec(t *testing.T) {
timestamp := time.Now()
testCases := []struct {
rule *RecordingRule
@@ -24,9 +24,9 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
"__name__", "bar",
)},
[]prompbmarshal.TimeSeries{
newTimeSeries(10, map[string]string{
newTimeSeries([]float64{10}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foo",
}, timestamp),
}),
},
},
{
@@ -37,18 +37,18 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
metricWithValueAndLabels(t, 3, "__name__", "baz", "job", "baz"),
},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "foo",
}, timestamp),
newTimeSeries(2, map[string]string{
}),
newTimeSeries([]float64{2}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "bar",
}, timestamp),
newTimeSeries(3, map[string]string{
}),
newTimeSeries([]float64{3}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "baz",
}, timestamp),
}),
},
},
{
@@ -59,16 +59,16 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "foo"),
metricWithValueAndLabels(t, 1, "__name__", "bar", "job", "bar")},
[]prompbmarshal.TimeSeries{
newTimeSeries(2, map[string]string{
newTimeSeries([]float64{2}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "foo",
"source": "test",
}, timestamp),
newTimeSeries(1, map[string]string{
}),
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "bar",
"source": "test",
}, timestamp),
}),
},
},
}
@@ -76,7 +76,8 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tss, err := tc.rule.Exec(context.TODO(), fq, true)
tc.rule.q = fq
tss, err := tc.rule.Exec(context.TODO())
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
}
@@ -87,7 +88,88 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
}
}
func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
func TestRecoridngRule_ExecRange(t *testing.T) {
timestamp := time.Now()
testCases := []struct {
rule *RecordingRule
metrics []datasource.Metric
expTS []prompbmarshal.TimeSeries
}{
{
&RecordingRule{Name: "foo"},
[]datasource.Metric{metricWithValuesAndLabels(t, []float64{10, 20, 30},
"__name__", "bar",
)},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{10, 20, 30},
[]int64{timestamp.UnixNano(), timestamp.UnixNano(), timestamp.UnixNano()},
map[string]string{
"__name__": "foo",
}),
},
},
{
&RecordingRule{Name: "foobarbaz"},
[]datasource.Metric{
metricWithValuesAndLabels(t, []float64{1}, "__name__", "foo", "job", "foo"),
metricWithValuesAndLabels(t, []float64{2, 3}, "__name__", "bar", "job", "bar"),
metricWithValuesAndLabels(t, []float64{4, 5, 6}, "__name__", "baz", "job", "baz"),
},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "foo",
}),
newTimeSeries([]float64{2, 3}, []int64{timestamp.UnixNano(), timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "bar",
}),
newTimeSeries([]float64{4, 5, 6},
[]int64{timestamp.UnixNano(), timestamp.UnixNano(), timestamp.UnixNano()},
map[string]string{
"__name__": "foobarbaz",
"job": "baz",
}),
},
},
{
&RecordingRule{Name: "job:foo", Labels: map[string]string{
"source": "test",
}},
[]datasource.Metric{
metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "foo"),
metricWithValueAndLabels(t, 1, "__name__", "bar", "job", "bar")},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{2}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "foo",
"source": "test",
}),
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "bar",
"source": "test",
}),
},
},
}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tc.rule.q = fq
tss, err := tc.rule.ExecRange(context.TODO(), time.Now(), time.Now())
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
}
if err := compareTimeSeries(t, tc.expTS, tss); err != nil {
t.Fatalf("timeseries missmatch: %s", err)
}
})
}
}
func TestRecoridngRule_ExecNegative(t *testing.T) {
rr := &RecordingRule{Name: "job:foo", Labels: map[string]string{
"job": "test",
}}
@@ -95,8 +177,8 @@ func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
fq := &fakeQuerier{}
expErr := "connection reset by peer"
fq.setErr(errors.New(expErr))
_, err := rr.Exec(context.TODO(), fq, true)
rr.q = fq
_, err := rr.Exec(context.TODO())
if err == nil {
t.Fatalf("expected to get err; got nil")
}
@@ -111,7 +193,7 @@ func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "foo"))
fq.add(metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "bar"))
_, err = rr.Exec(context.TODO(), fq, true)
_, err = rr.Exec(context.TODO())
if err == nil {
t.Fatalf("expected to get err; got nil")
}

View File

@@ -10,11 +10,15 @@ import (
)
var (
addr = flag.String("remoteRead.url", "", "Optional URL to Victoria Metrics or VMSelect that will be used to restore alerts"+
" state. This configuration makes sense only if `vmalert` was configured with `remoteWrite.url` before and has been successfully persisted its state."+
" E.g. http://127.0.0.1:8428")
addr = flag.String("remoteRead.url", "", "Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts "+
"state. This configuration makes sense only if `vmalert` was configured with `remoteWrite.url` before and has been successfully persisted its state. "+
"E.g. http://127.0.0.1:8428")
basicAuthUsername = flag.String("remoteRead.basicAuth.username", "", "Optional basic auth username for -remoteRead.url")
basicAuthPassword = flag.String("remoteRead.basicAuth.password", "", "Optional basic auth password for -remoteRead.url")
basicAuthPasswordFile = flag.String("remoteRead.basicAuth.passwordFile", "", "Optional path to basic auth password to use for -remoteRead.url")
bearerToken = flag.String("remoteRead.bearerToken", "", "Optional bearer auth token to use for -remoteRead.url.")
bearerTokenFile = flag.String("remoteRead.bearerTokenFile", "", "Optional path to bearer token file to use for -remoteRead.url.")
tlsInsecureSkipVerify = flag.Bool("remoteRead.tlsInsecureSkipVerify", false, "Whether to skip tls verification when connecting to -remoteRead.url")
tlsCertFile = flag.String("remoteRead.tlsCertFile", "", "Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url")
tlsKeyFile = flag.String("remoteRead.tlsKeyFile", "", "Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url")
@@ -26,7 +30,7 @@ var (
// Init creates a Querier from provided flag values.
// Returns nil if addr flag wasn't set.
func Init() (datasource.Querier, error) {
func Init() (datasource.QuerierBuilder, error) {
if *addr == "" {
return nil, nil
}
@@ -34,6 +38,10 @@ func Init() (datasource.Querier, error) {
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
authCfg, err := utils.AuthConfig(*basicAuthUsername, *basicAuthPassword, *basicAuthPasswordFile, *bearerToken, *bearerTokenFile)
if err != nil {
return nil, fmt.Errorf("failed to configure auth: %w", err)
}
c := &http.Client{Transport: tr}
return datasource.NewVMStorage(*addr, *basicAuthUsername, *basicAuthPassword, 0, 0, false, c), nil
return datasource.NewVMStorage(*addr, authCfg, 0, 0, false, c), nil
}

View File

@@ -10,10 +10,14 @@ import (
)
var (
addr = flag.String("remoteWrite.url", "", "Optional URL to Victoria Metrics or VMInsert where to persist alerts state"+
" and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428")
basicAuthUsername = flag.String("remoteWrite.basicAuth.username", "", "Optional basic auth username for -remoteWrite.url")
basicAuthPassword = flag.String("remoteWrite.basicAuth.password", "", "Optional basic auth password for -remoteWrite.url")
addr = flag.String("remoteWrite.url", "", "Optional URL to VictoriaMetrics or vminsert where to persist alerts state "+
"and recording rules results in form of timeseries. For example, if -remoteWrite.url=http://127.0.0.1:8428 is specified, "+
"then the alerts state will be written to http://127.0.0.1:8428/api/v1/write . See also -remoteWrite.disablePathAppend")
basicAuthUsername = flag.String("remoteWrite.basicAuth.username", "", "Optional basic auth username for -remoteWrite.url")
basicAuthPassword = flag.String("remoteWrite.basicAuth.password", "", "Optional basic auth password for -remoteWrite.url")
basicAuthPasswordFile = flag.String("remoteWrite.basicAuth.passwordFile", "", "Optional path to basic auth password to use for -remoteWrite.url")
bearerToken = flag.String("remoteWrite.bearerToken", "", "Optional bearer auth token to use for -remoteWrite.url.")
bearerTokenFile = flag.String("remoteWrite.bearerTokenFile", "", "Optional path to bearer token file to use for -remoteWrite.url.")
maxQueueSize = flag.Int("remoteWrite.maxQueueSize", 1e5, "Defines the max number of pending datapoints to remote write endpoint")
maxBatchSize = flag.Int("remoteWrite.maxBatchSize", 1e3, "Defines defines max number of timeseries to be flushed at once")
@@ -27,6 +31,7 @@ var (
"By default system CA is used")
tlsServerName = flag.String("remoteWrite.tlsServerName", "", "Optional TLS server name to use for connections to -remoteWrite.url. "+
"By default the server name from -remoteWrite.url is used")
disablePathAppend = flag.Bool("remoteWrite.disablePathAppend", false, "Whether to disable automatic appending of '/api/v1/write' path to the configured -remoteWrite.url.")
)
// Init creates Client object from given flags.
@@ -41,14 +46,19 @@ func Init(ctx context.Context) (*Client, error) {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
authCfg, err := utils.AuthConfig(*basicAuthUsername, *basicAuthPassword, *basicAuthPasswordFile, *bearerToken, *bearerTokenFile)
if err != nil {
return nil, fmt.Errorf("failed to configure auth: %w", err)
}
return NewClient(ctx, Config{
Addr: *addr,
Concurrency: *concurrency,
MaxQueueSize: *maxQueueSize,
MaxBatchSize: *maxBatchSize,
FlushInterval: *flushInterval,
BasicAuthUser: *basicAuthUsername,
BasicAuthPass: *basicAuthPassword,
Transport: t,
Addr: *addr,
AuthCfg: authCfg,
Concurrency: *concurrency,
MaxQueueSize: *maxQueueSize,
MaxBatchSize: *maxBatchSize,
FlushInterval: *flushInterval,
DisablePathAppend: *disablePathAppend,
Transport: t,
})
}

View File

@@ -6,26 +6,30 @@ import (
"fmt"
"io/ioutil"
"net/http"
"path"
"strings"
"sync"
"time"
"github.com/golang/snappy"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/metrics"
"github.com/golang/snappy"
)
// Client is an asynchronous HTTP client for writing
// timeseries via remote write protocol.
type Client struct {
addr string
c *http.Client
input chan prompbmarshal.TimeSeries
baUser, baPass string
flushInterval time.Duration
maxBatchSize int
maxQueueSize int
addr string
c *http.Client
authCfg *promauth.Config
input chan prompbmarshal.TimeSeries
flushInterval time.Duration
maxBatchSize int
maxQueueSize int
disablePathAppend bool
wg sync.WaitGroup
doneCh chan struct{}
@@ -34,10 +38,8 @@ type Client struct {
// Config is config for remote write.
type Config struct {
// Addr of remote storage
Addr string
BasicAuthUser string
BasicAuthPass string
Addr string
AuthCfg *promauth.Config
// Concurrency defines number of readers that
// concurrently read from the queue and flush data
@@ -56,6 +58,8 @@ type Config struct {
WriteTimeout time.Duration
// Transport will be used by the underlying http.Client
Transport *http.Transport
// DisablePathAppend can be used to not automatically append '/api/v1/write' to the remote write url
DisablePathAppend bool
}
const (
@@ -89,24 +93,25 @@ func NewClient(ctx context.Context, cfg Config) (*Client, error) {
if cfg.Transport == nil {
cfg.Transport = http.DefaultTransport.(*http.Transport).Clone()
}
cc := defaultConcurrency
if cfg.Concurrency > 0 {
cc = cfg.Concurrency
}
c := &Client{
c: &http.Client{
Timeout: cfg.WriteTimeout,
Transport: cfg.Transport,
},
addr: strings.TrimSuffix(cfg.Addr, "/"),
baUser: cfg.BasicAuthUser,
baPass: cfg.BasicAuthPass,
flushInterval: cfg.FlushInterval,
maxBatchSize: cfg.MaxBatchSize,
maxQueueSize: cfg.MaxQueueSize,
doneCh: make(chan struct{}),
input: make(chan prompbmarshal.TimeSeries, cfg.MaxQueueSize),
}
cc := defaultConcurrency
if cfg.Concurrency > 0 {
cc = cfg.Concurrency
addr: strings.TrimSuffix(cfg.Addr, "/"),
authCfg: cfg.AuthCfg,
flushInterval: cfg.FlushInterval,
maxBatchSize: cfg.MaxBatchSize,
maxQueueSize: cfg.MaxQueueSize,
doneCh: make(chan struct{}),
input: make(chan prompbmarshal.TimeSeries, cfg.MaxQueueSize),
disablePathAppend: cfg.DisablePathAppend,
}
for i := 0; i < cc; i++ {
c.run(ctx)
}
@@ -179,10 +184,11 @@ func (c *Client) run(ctx context.Context) {
}
var (
sentRows = metrics.NewCounter(`vmalert_remotewrite_sent_rows_total`)
sentBytes = metrics.NewCounter(`vmalert_remotewrite_sent_bytes_total`)
droppedRows = metrics.NewCounter(`vmalert_remotewrite_dropped_rows_total`)
droppedBytes = metrics.NewCounter(`vmalert_remotewrite_dropped_bytes_total`)
sentRows = metrics.NewCounter(`vmalert_remotewrite_sent_rows_total`)
sentBytes = metrics.NewCounter(`vmalert_remotewrite_sent_bytes_total`)
droppedRows = metrics.NewCounter(`vmalert_remotewrite_dropped_rows_total`)
droppedBytes = metrics.NewCounter(`vmalert_remotewrite_dropped_bytes_total`)
bufferFlushDuration = metrics.NewHistogram(`vmalert_remotewrite_flush_duration_seconds`)
)
// flush is a blocking function that marshals WriteRequest and sends
@@ -193,6 +199,7 @@ func (c *Client) flush(ctx context.Context, wr *prompbmarshal.WriteRequest) {
return
}
defer prompbmarshal.ResetWriteRequest(wr)
defer bufferFlushDuration.UpdateDuration(time.Now())
data, err := wr.Marshal()
if err != nil {
@@ -228,17 +235,21 @@ func (c *Client) send(ctx context.Context, data []byte) error {
if err != nil {
return fmt.Errorf("failed to create new HTTP request: %w", err)
}
if c.baPass != "" {
req.SetBasicAuth(c.baUser, c.baPass)
if c.authCfg != nil {
if auth := c.authCfg.GetAuthHeader(); auth != "" {
req.Header.Set("Authorization", auth)
}
}
if !c.disablePathAppend {
req.URL.Path = path.Join(req.URL.Path, writePath)
}
req.URL.Path += writePath
resp, err := c.c.Do(req.WithContext(ctx))
if err != nil {
return fmt.Errorf("error while sending request to %s: %w; Data len %d(%d)",
req.URL, err, len(data), r.Size())
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusNoContent {
if resp.StatusCode != http.StatusNoContent && resp.StatusCode != http.StatusOK {
body, _ := ioutil.ReadAll(resp.Body)
return fmt.Errorf("unexpected response code %d for %s. Response body %q",
resp.StatusCode, req.URL, body)

160
app/vmalert/replay.go Normal file
View File

@@ -0,0 +1,160 @@
package main
import (
"context"
"flag"
"fmt"
"strings"
"time"
"github.com/cheggaaa/pb/v3"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
var (
replayFrom = flag.String("replay.timeFrom", "",
"The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'")
replayTo = flag.String("replay.timeTo", "",
"The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'")
replayRulesDelay = flag.Duration("replay.rulesDelay", time.Second,
"Delay between rules evaluation within the group. Could be important if there are chained rules inside of the group"+
"and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule."+
"Keep it equal or bigger than -remoteWrite.flushInterval.")
replayMaxDatapoints = flag.Int("replay.maxDatapointsPerQuery", 1e3,
"Max number of data points expected in one request. The higher the value, the less requests will be made during replay.")
replayRuleRetryAttempts = flag.Int("replay.ruleRetryAttempts", 5,
"Defines how many retries to make before giving up on rule if request for it returns an error.")
)
func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw *remotewrite.Client) error {
if *replayMaxDatapoints < 1 {
return fmt.Errorf("replay.maxDatapointsPerQuery can't be lower than 1")
}
tFrom, err := time.Parse(time.RFC3339, *replayFrom)
if err != nil {
return fmt.Errorf("failed to parse %q: %s", *replayFrom, err)
}
tTo, err := time.Parse(time.RFC3339, *replayTo)
if err != nil {
return fmt.Errorf("failed to parse %q: %s", *replayTo, err)
}
if !tTo.After(tFrom) {
return fmt.Errorf("replay.timeTo must be bigger than replay.timeFrom")
}
labels := make(map[string]string)
for _, s := range *externalLabels {
if len(s) == 0 {
continue
}
n := strings.IndexByte(s, '=')
if n < 0 {
return fmt.Errorf("missing '=' in `-label`. It must contain label in the form `name=value`; got %q", s)
}
labels[s[:n]] = s[n+1:]
}
fmt.Printf("Replay mode:"+
"\nfrom: \t%v "+
"\nto: \t%v "+
"\nmax data points per request: %d\n",
tFrom, tTo, *replayMaxDatapoints)
var total int
for _, cfg := range groupsCfg {
ng := newGroup(cfg, qb, *evaluationInterval, labels)
total += ng.replay(tFrom, tTo, rw)
}
logger.Infof("replay finished! Imported %d samples", total)
if rw != nil {
return rw.Close()
}
return nil
}
func (g *Group) replay(start, end time.Time, rw *remotewrite.Client) int {
var total int
step := g.Interval * time.Duration(*replayMaxDatapoints)
ri := rangeIterator{start: start, end: end, step: step}
iterations := int(end.Sub(start)/step) + 1
fmt.Printf("\nGroup %q"+
"\ninterval: \t%v"+
"\nrequests to make: \t%d"+
"\nmax range per request: \t%v\n",
g.Name, g.Interval, iterations, step)
for _, rule := range g.Rules {
fmt.Printf("> Rule %q (ID: %d)\n", rule, rule.ID())
bar := pb.StartNew(iterations)
ri.reset()
for ri.next() {
n, err := replayRule(rule, ri.s, ri.e, rw)
if err != nil {
logger.Fatalf("rule %q: %s", rule, err)
}
total += n
bar.Increment()
}
bar.Finish()
// sleep to let remote storage to flush data on-disk
// so chained rules could be calculated correctly
time.Sleep(*replayRulesDelay)
}
return total
}
func replayRule(rule Rule, start, end time.Time, rw *remotewrite.Client) (int, error) {
var err error
var tss []prompbmarshal.TimeSeries
for i := 0; i < *replayRuleRetryAttempts; i++ {
tss, err = rule.ExecRange(context.Background(), start, end)
if err == nil {
break
}
logger.Errorf("attempt %d to execute rule %q failed: %s", i+1, rule, err)
time.Sleep(time.Second)
}
if err != nil { // means all attempts failed
return 0, err
}
if len(tss) < 1 {
return 0, nil
}
var n int
for _, ts := range tss {
if err := rw.Push(ts); err != nil {
return n, fmt.Errorf("remote write failure: %s", err)
}
n += len(ts.Samples)
}
return n, nil
}
type rangeIterator struct {
step time.Duration
start, end time.Time
iter int
s, e time.Time
}
func (ri *rangeIterator) reset() {
ri.iter = 0
ri.s, ri.e = time.Time{}, time.Time{}
}
func (ri *rangeIterator) next() bool {
ri.s = ri.start.Add(ri.step * time.Duration(ri.iter))
if !ri.end.After(ri.s) {
return false
}
ri.e = ri.s.Add(ri.step)
if ri.e.After(ri.end) {
ri.e = ri.end
}
ri.iter++
return true
}

250
app/vmalert/replay_test.go Normal file
View File

@@ -0,0 +1,250 @@
package main
import (
"context"
"fmt"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
type fakeReplayQuerier struct {
fakeQuerier
registry map[string]map[string]struct{}
}
func (fr *fakeReplayQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fr
}
func (fr *fakeReplayQuerier) QueryRange(_ context.Context, q string, from, to time.Time) ([]datasource.Metric, error) {
key := fmt.Sprintf("%s+%s", from.Format("15:04:05"), to.Format("15:04:05"))
dps, ok := fr.registry[q]
if !ok {
return nil, fmt.Errorf("unexpected query received: %q", q)
}
_, ok = dps[key]
if !ok {
return nil, fmt.Errorf("unexpected time range received: %q", key)
}
delete(dps, key)
if len(fr.registry[q]) < 1 {
delete(fr.registry, q)
}
return nil, nil
}
func TestReplay(t *testing.T) {
testCases := []struct {
name string
from, to string
maxDP int
cfg []config.Group
qb *fakeReplayQuerier
}{
{
name: "one rule + one response",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:00.000Z",
maxDP: 10,
cfg: []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {"12:00:00+12:02:00": {}},
},
},
},
{
name: "one rule + multiple responses",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:30.000Z",
maxDP: 1,
cfg: []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
},
},
{
name: "datapoints per step",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T15:02:30.000Z",
maxDP: 60,
cfg: []config.Group{
{Interval: utils.NewPromDuration(time.Minute), Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {
"12:00:00+13:00:00": {},
"13:00:00+14:00:00": {},
"14:00:00+15:00:00": {},
"15:00:00+15:02:30": {},
},
},
},
},
{
name: "multiple recording rules + multiple responses",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:30.000Z",
maxDP: 1,
cfg: []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
{Rules: []config.Rule{{Record: "bar", Expr: "max(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
},
},
{
name: "multiple alerting rules + multiple responses",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:30.000Z",
maxDP: 1,
cfg: []config.Group{
{Rules: []config.Rule{{Alert: "foo", Expr: "sum(up) > 1"}}},
{Rules: []config.Rule{{Alert: "bar", Expr: "max(up) < 1"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up) > 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
},
},
}
from, to, maxDP := *replayFrom, *replayTo, *replayMaxDatapoints
retries, delay := *replayRuleRetryAttempts, *replayRulesDelay
defer func() {
*replayFrom, *replayTo = from, to
*replayMaxDatapoints, *replayRuleRetryAttempts = maxDP, retries
*replayRulesDelay = delay
}()
*replayRuleRetryAttempts = 1
*replayRulesDelay = time.Millisecond
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
*replayFrom = tc.from
*replayTo = tc.to
*replayMaxDatapoints = tc.maxDP
if err := replay(tc.cfg, tc.qb, nil); err != nil {
t.Fatalf("replay failed: %s", err)
}
if len(tc.qb.registry) > 0 {
t.Fatalf("not all requests were sent: %#v", tc.qb.registry)
}
})
}
}
func TestRangeIterator(t *testing.T) {
testCases := []struct {
ri rangeIterator
result [][2]time.Time
}{
{
ri: rangeIterator{
start: parseTime(t, "2021-01-01T12:00:00.000Z"),
end: parseTime(t, "2021-01-01T12:30:00.000Z"),
step: 5 * time.Minute,
},
result: [][2]time.Time{
{parseTime(t, "2021-01-01T12:00:00.000Z"), parseTime(t, "2021-01-01T12:05:00.000Z")},
{parseTime(t, "2021-01-01T12:05:00.000Z"), parseTime(t, "2021-01-01T12:10:00.000Z")},
{parseTime(t, "2021-01-01T12:10:00.000Z"), parseTime(t, "2021-01-01T12:15:00.000Z")},
{parseTime(t, "2021-01-01T12:15:00.000Z"), parseTime(t, "2021-01-01T12:20:00.000Z")},
{parseTime(t, "2021-01-01T12:20:00.000Z"), parseTime(t, "2021-01-01T12:25:00.000Z")},
{parseTime(t, "2021-01-01T12:25:00.000Z"), parseTime(t, "2021-01-01T12:30:00.000Z")},
},
},
{
ri: rangeIterator{
start: parseTime(t, "2021-01-01T12:00:00.000Z"),
end: parseTime(t, "2021-01-01T12:30:00.000Z"),
step: 45 * time.Minute,
},
result: [][2]time.Time{
{parseTime(t, "2021-01-01T12:00:00.000Z"), parseTime(t, "2021-01-01T12:30:00.000Z")},
{parseTime(t, "2021-01-01T12:30:00.000Z"), parseTime(t, "2021-01-01T12:30:00.000Z")},
},
},
{
ri: rangeIterator{
start: parseTime(t, "2021-01-01T12:00:12.000Z"),
end: parseTime(t, "2021-01-01T12:00:17.000Z"),
step: time.Second,
},
result: [][2]time.Time{
{parseTime(t, "2021-01-01T12:00:12.000Z"), parseTime(t, "2021-01-01T12:00:13.000Z")},
{parseTime(t, "2021-01-01T12:00:13.000Z"), parseTime(t, "2021-01-01T12:00:14.000Z")},
{parseTime(t, "2021-01-01T12:00:14.000Z"), parseTime(t, "2021-01-01T12:00:15.000Z")},
{parseTime(t, "2021-01-01T12:00:15.000Z"), parseTime(t, "2021-01-01T12:00:16.000Z")},
{parseTime(t, "2021-01-01T12:00:16.000Z"), parseTime(t, "2021-01-01T12:00:17.000Z")},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
var j int
for tc.ri.next() {
if len(tc.result) < j+1 {
t.Fatalf("unexpected result for iterator on step %d: %v - %v",
j, tc.ri.s, tc.ri.e)
}
s, e := tc.ri.s, tc.ri.e
expS, expE := tc.result[j][0], tc.result[j][1]
if s != expS {
t.Fatalf("expected to get start=%v; got %v", expS, s)
}
if e != expE {
t.Fatalf("expected to get end=%v; got %v", expE, e)
}
j++
}
})
}
}
func parseTime(t *testing.T, s string) time.Time {
t.Helper()
tt, err := time.Parse("2006-01-02T15:04:05.000Z", s)
if err != nil {
t.Fatal(err)
}
return tt
}

View File

@@ -3,22 +3,21 @@ package main
import (
"context"
"errors"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"time"
)
// Rule represents alerting or recording rule
// that has unique ID, can be Executed and
// updated with other Rule.
type Rule interface {
// Returns unique ID that may be used for
// ID returns unique ID that may be used for
// identifying this Rule among others.
ID() uint64
// Exec executes the rule with given context
// and Querier. If returnSeries is true, Exec
// may return TimeSeries as result of execution
Exec(ctx context.Context, q datasource.Querier, returnSeries bool) ([]prompbmarshal.TimeSeries, error)
Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, error)
// ExecRange executes the rule on the given time range
ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error)
// UpdateWith performs modification of current Rule
// with fields of the given Rule.
UpdateWith(Rule) error

View File

@@ -0,0 +1,36 @@
{% func Footer() %}
</main>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" crossorigin="anonymous"></script>
<script src="https://code.jquery.com/jquery-3.3.1.min.js"></script>
<script type="text/javascript">
function expandAll() {
$('.collapse').addClass('show');
}
function collapseAll() {
$('.collapse').removeClass('show');
}
$(document).ready(function() {
// prevent collapse logic on link click
$(".group-heading a").click(function(e) {
e.stopPropagation();
});
$(".group-heading").click(function(e) {
let target = $(this).attr('data-bs-target');
let el = $('#'+target);
new bootstrap.Collapse(el, {
toggle: true
});
});
var hash = window.location.hash.substr(1);
let group = $('#'+hash);
if (group.length > 0) {
group.click();
}
});
</script>
</body>
</html>
{% endfunc %}

View File

@@ -0,0 +1,86 @@
// Code generated by qtc from "footer.qtpl". DO NOT EDIT.
// See https://github.com/valyala/quicktemplate for details.
//line app/vmalert/tpl/footer.qtpl:1
package tpl
//line app/vmalert/tpl/footer.qtpl:1
import (
qtio422016 "io"
qt422016 "github.com/valyala/quicktemplate"
)
//line app/vmalert/tpl/footer.qtpl:1
var (
_ = qtio422016.Copy
_ = qt422016.AcquireByteBuffer
)
//line app/vmalert/tpl/footer.qtpl:1
func StreamFooter(qw422016 *qt422016.Writer) {
//line app/vmalert/tpl/footer.qtpl:1
qw422016.N().S(`
</main>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.bundle.min.js" integrity="sha384-MrcW6ZMFYlzcLA8Nl+NtUVF0sA7MsXsP1UyJoMp4YLEuNSfAP+JcXn/tWtIaxVXM" crossorigin="anonymous"></script>
<script src="https://code.jquery.com/jquery-3.3.1.min.js"></script>
<script type="text/javascript">
function expandAll() {
$('.collapse').addClass('show');
}
function collapseAll() {
$('.collapse').removeClass('show');
}
$(document).ready(function() {
// prevent collapse logic on link click
$(".group-heading a").click(function(e) {
e.stopPropagation();
});
$(".group-heading").click(function(e) {
let target = $(this).attr('data-bs-target');
let el = $('#'+target);
new bootstrap.Collapse(el, {
toggle: true
});
});
var hash = window.location.hash.substr(1);
let group = $('#'+hash);
if (group.length > 0) {
group.click();
}
});
</script>
</body>
</html>
`)
//line app/vmalert/tpl/footer.qtpl:36
}
//line app/vmalert/tpl/footer.qtpl:36
func WriteFooter(qq422016 qtio422016.Writer) {
//line app/vmalert/tpl/footer.qtpl:36
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/tpl/footer.qtpl:36
StreamFooter(qw422016)
//line app/vmalert/tpl/footer.qtpl:36
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/tpl/footer.qtpl:36
}
//line app/vmalert/tpl/footer.qtpl:36
func Footer() string {
//line app/vmalert/tpl/footer.qtpl:36
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/tpl/footer.qtpl:36
WriteFooter(qb422016)
//line app/vmalert/tpl/footer.qtpl:36
qs422016 := string(qb422016.B)
//line app/vmalert/tpl/footer.qtpl:36
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/tpl/footer.qtpl:36
return qs422016
//line app/vmalert/tpl/footer.qtpl:36
}

View File

@@ -0,0 +1,43 @@
{% func Header(title string, pages []NavItem) %}
<!DOCTYPE html>
<html lang="en">
<head>
<title>vmalert{% if title != "" %} - {%s title %}{% endif %}</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
<style>
body{
min-height: 75rem;
padding-top: 4.5rem;
}
pre {
overflow: scroll;
max-width: 600px;
min-height: 30px;
}
.group-heading {
cursor: pointer;
padding: 5px;
margin-top: 5px;
position: relative;
}
.group-heading .anchor {
position:absolute;
top:-60px;
}
.group-heading span {
float: right;
margin-left: 5px;
margin-right: 5px;
}
.group-heading:hover {
background-color: #f8f9fa!important;
}
.table .error-cell{
word-break: break-word;
}
</style>
</head>
<body>
{%= PrintNavItems(title, pages) %}
<main class="px-2">
{% endfunc %}

View File

@@ -0,0 +1,107 @@
// Code generated by qtc from "header.qtpl". DO NOT EDIT.
// See https://github.com/valyala/quicktemplate for details.
//line app/vmalert/tpl/header.qtpl:1
package tpl
//line app/vmalert/tpl/header.qtpl:1
import (
qtio422016 "io"
qt422016 "github.com/valyala/quicktemplate"
)
//line app/vmalert/tpl/header.qtpl:1
var (
_ = qtio422016.Copy
_ = qt422016.AcquireByteBuffer
)
//line app/vmalert/tpl/header.qtpl:1
func StreamHeader(qw422016 *qt422016.Writer, title string, pages []NavItem) {
//line app/vmalert/tpl/header.qtpl:1
qw422016.N().S(`
<!DOCTYPE html>
<html lang="en">
<head>
<title>vmalert`)
//line app/vmalert/tpl/header.qtpl:5
if title != "" {
//line app/vmalert/tpl/header.qtpl:5
qw422016.N().S(` - `)
//line app/vmalert/tpl/header.qtpl:5
qw422016.E().S(title)
//line app/vmalert/tpl/header.qtpl:5
}
//line app/vmalert/tpl/header.qtpl:5
qw422016.N().S(`</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
<style>
body{
min-height: 75rem;
padding-top: 4.5rem;
}
pre {
overflow: scroll;
max-width: 600px;
min-height: 30px;
}
.group-heading {
cursor: pointer;
padding: 5px;
margin-top: 5px;
position: relative;
}
.group-heading .anchor {
position:absolute;
top:-60px;
}
.group-heading span {
float: right;
margin-left: 5px;
margin-right: 5px;
}
.group-heading:hover {
background-color: #f8f9fa!important;
}
.table .error-cell{
word-break: break-word;
}
</style>
</head>
<body>
`)
//line app/vmalert/tpl/header.qtpl:41
StreamPrintNavItems(qw422016, title, pages)
//line app/vmalert/tpl/header.qtpl:41
qw422016.N().S(`
<main class="px-2">
`)
//line app/vmalert/tpl/header.qtpl:43
}
//line app/vmalert/tpl/header.qtpl:43
func WriteHeader(qq422016 qtio422016.Writer, title string, pages []NavItem) {
//line app/vmalert/tpl/header.qtpl:43
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/tpl/header.qtpl:43
StreamHeader(qw422016, title, pages)
//line app/vmalert/tpl/header.qtpl:43
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/tpl/header.qtpl:43
}
//line app/vmalert/tpl/header.qtpl:43
func Header(title string, pages []NavItem) string {
//line app/vmalert/tpl/header.qtpl:43
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/tpl/header.qtpl:43
WriteHeader(qb422016, title, pages)
//line app/vmalert/tpl/header.qtpl:43
qs422016 := string(qb422016.B)
//line app/vmalert/tpl/header.qtpl:43
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/tpl/header.qtpl:43
return qs422016
//line app/vmalert/tpl/header.qtpl:43
}

25
app/vmalert/tpl/nav.qtpl Normal file
View File

@@ -0,0 +1,25 @@
{% code
type NavItem struct {
Name string
Url string
}
%}
{% func PrintNavItems(current string, items []NavItem) %}
<nav class="navbar navbar-expand-md navbar-dark fixed-top bg-dark">
<div class="container-fluid">
<div class="collapse navbar-collapse" id="navbarCollapse">
<ul class="navbar-nav me-auto mb-2 mb-md-0">
{% for _, item := range items %}
<li class="nav-item">
<a class="nav-link{% if current == item.Name %} active{% endif %}" href="{%s item.Url %}">
{%s item.Name %}
</a>
</li>
{% endfor %}
</ul>
</div>
</nav>
{% endfunc %}

View File

@@ -0,0 +1,96 @@
// Code generated by qtc from "nav.qtpl". DO NOT EDIT.
// See https://github.com/valyala/quicktemplate for details.
//line app/vmalert/tpl/nav.qtpl:1
package tpl
//line app/vmalert/tpl/nav.qtpl:1
import (
qtio422016 "io"
qt422016 "github.com/valyala/quicktemplate"
)
//line app/vmalert/tpl/nav.qtpl:1
var (
_ = qtio422016.Copy
_ = qt422016.AcquireByteBuffer
)
//line app/vmalert/tpl/nav.qtpl:2
type NavItem struct {
Name string
Url string
}
//line app/vmalert/tpl/nav.qtpl:8
func StreamPrintNavItems(qw422016 *qt422016.Writer, current string, items []NavItem) {
//line app/vmalert/tpl/nav.qtpl:8
qw422016.N().S(`
<nav class="navbar navbar-expand-md navbar-dark fixed-top bg-dark">
<div class="container-fluid">
<div class="collapse navbar-collapse" id="navbarCollapse">
<ul class="navbar-nav me-auto mb-2 mb-md-0">
`)
//line app/vmalert/tpl/nav.qtpl:13
for _, item := range items {
//line app/vmalert/tpl/nav.qtpl:13
qw422016.N().S(`
<li class="nav-item">
<a class="nav-link`)
//line app/vmalert/tpl/nav.qtpl:15
if current == item.Name {
//line app/vmalert/tpl/nav.qtpl:15
qw422016.N().S(` active`)
//line app/vmalert/tpl/nav.qtpl:15
}
//line app/vmalert/tpl/nav.qtpl:15
qw422016.N().S(`" href="`)
//line app/vmalert/tpl/nav.qtpl:15
qw422016.E().S(item.Url)
//line app/vmalert/tpl/nav.qtpl:15
qw422016.N().S(`">
`)
//line app/vmalert/tpl/nav.qtpl:16
qw422016.E().S(item.Name)
//line app/vmalert/tpl/nav.qtpl:16
qw422016.N().S(`
</a>
</li>
`)
//line app/vmalert/tpl/nav.qtpl:19
}
//line app/vmalert/tpl/nav.qtpl:19
qw422016.N().S(`
</ul>
</div>
</nav>
`)
//line app/vmalert/tpl/nav.qtpl:23
}
//line app/vmalert/tpl/nav.qtpl:23
func WritePrintNavItems(qq422016 qtio422016.Writer, current string, items []NavItem) {
//line app/vmalert/tpl/nav.qtpl:23
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/tpl/nav.qtpl:23
StreamPrintNavItems(qw422016, current, items)
//line app/vmalert/tpl/nav.qtpl:23
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/tpl/nav.qtpl:23
}
//line app/vmalert/tpl/nav.qtpl:23
func PrintNavItems(current string, items []NavItem) string {
//line app/vmalert/tpl/nav.qtpl:23
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/tpl/nav.qtpl:23
WritePrintNavItems(qb422016, current, items)
//line app/vmalert/tpl/nav.qtpl:23
qs422016 := string(qb422016.B)
//line app/vmalert/tpl/nav.qtpl:23
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/tpl/nav.qtpl:23
return qs422016
//line app/vmalert/tpl/nav.qtpl:23
}

View File

@@ -7,17 +7,21 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
func newTimeSeries(value float64, labels map[string]string, timestamp time.Time) prompbmarshal.TimeSeries {
ts := prompbmarshal.TimeSeries{}
ts.Samples = append(ts.Samples, prompbmarshal.Sample{
Value: value,
Timestamp: timestamp.UnixNano() / 1e6,
})
func newTimeSeries(values []float64, timestamps []int64, labels map[string]string) prompbmarshal.TimeSeries {
ts := prompbmarshal.TimeSeries{
Samples: make([]prompbmarshal.Sample, len(values)),
}
for i := range values {
ts.Samples[i] = prompbmarshal.Sample{
Value: values[i],
Timestamp: time.Unix(timestamps[i], 0).UnixNano() / 1e6,
}
}
keys := make([]string, 0, len(labels))
for k := range labels {
keys = append(keys, k)
}
sort.Strings(keys)
sort.Strings(keys) // make order deterministic
for _, key := range keys {
ts.Labels = append(ts.Labels, prompbmarshal.Label{
Name: key,

18
app/vmalert/utils/auth.go Normal file
View File

@@ -0,0 +1,18 @@
package utils
import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
)
// AuthConfig returns promauth.Config based on the given params
func AuthConfig(baUser, baPass, baFile, bearerToken, bearerTokenFile string) (*promauth.Config, error) {
var baCfg *promauth.BasicAuthConfig
if baUser != "" || baPass != "" || baFile != "" {
baCfg = &promauth.BasicAuthConfig{
Username: baUser,
Password: baPass,
PasswordFile: baFile,
}
}
return promauth.NewConfig(".", nil, baCfg, bearerToken, bearerTokenFile, nil, nil)
}

View File

@@ -0,0 +1,43 @@
package utils
import (
"time"
"github.com/VictoriaMetrics/metricsql"
)
// PromDuration is Prometheus duration.
type PromDuration struct {
milliseconds int64
}
// NewPromDuration returns PromDuration for given d.
func NewPromDuration(d time.Duration) PromDuration {
return PromDuration{
milliseconds: d.Milliseconds(),
}
}
// MarshalYAML implements yaml.Marshaler interface.
func (pd PromDuration) MarshalYAML() (interface{}, error) {
return pd.Duration().String(), nil
}
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (pd *PromDuration) UnmarshalYAML(unmarshal func(interface{}) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
ms, err := metricsql.DurationValue(s, 0)
if err != nil {
return err
}
pd.milliseconds = ms
return nil
}
// Duration returns duration for pd.
func (pd *PromDuration) Duration() time.Duration {
return time.Duration(pd.milliseconds) * time.Millisecond
}

View File

@@ -13,6 +13,7 @@ func TestTLSConfig(t *testing.T) {
}
if tlsCfg == nil {
t.Errorf("expected tlsConfig to be set, got nil")
return
}
if tlsCfg.ServerName != serverName {
t.Errorf("unexpected ServerName, want %s, got %s", serverName, tlsCfg.ServerName)

View File

@@ -4,40 +4,67 @@ import (
"encoding/json"
"fmt"
"net/http"
"path"
"sort"
"strconv"
"strings"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
var (
once = sync.Once{}
apiLinks [][2]string
navItems []tpl.NavItem
)
func initLinks() {
pathPrefix := httpserver.GetPathPrefix()
apiLinks = [][2]string{
{path.Join(pathPrefix, "api/v1/groups"), "list all loaded groups and rules"},
{path.Join(pathPrefix, "api/v1/alerts"), "list all active alerts"},
{path.Join(pathPrefix, "api/v1/groupID/alertID/status"), "get alert status by ID"},
{path.Join(pathPrefix, "metrics"), "list of application metrics"},
{path.Join(pathPrefix, "-/reload"), "reload configuration"},
}
navItems = []tpl.NavItem{
{Name: "vmalert", Url: path.Join(pathPrefix, "/")},
{Name: "Groups", Url: path.Join(pathPrefix, "groups")},
{Name: "Alerts", Url: path.Join(pathPrefix, "/alerts")},
{Name: "Docs", Url: "https://docs.victoriametrics.com/vmalert.html"},
}
}
type requestHandler struct {
m *manager
}
var pathList = [][]string{
{"/api/v1/groups", "list all loaded groups and rules"},
{"/api/v1/alerts", "list all active alerts"},
{"/api/v1/groupID/alertID/status", "get alert status by ID"},
// /metrics is served by httpserver by default
{"/metrics", "list of application metrics"},
{"/-/reload", "reload configuration"},
}
func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
once.Do(func() {
initLinks()
})
switch r.URL.Path {
case "/":
for _, path := range pathList {
p, doc := path[0], path[1]
fmt.Fprintf(w, "<a href='%s'>%q</a> - %s<br/>", p, p, doc)
if r.Method != "GET" {
return false
}
WriteWelcome(w)
return true
case "/alerts":
WriteListAlerts(w, rh.groupAlerts())
return true
case "/groups":
WriteListGroups(w, rh.groups())
return true
case "/api/v1/groups":
data, err := rh.listGroups()
if err != nil {
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
@@ -46,7 +73,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/api/v1/alerts":
data, err := rh.listAlerts()
if err != nil {
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
@@ -61,14 +88,26 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
if !strings.HasSuffix(r.URL.Path, "/status") {
return false
}
// /api/v1/<groupName>/<alertID>/status
data, err := rh.alert(r.URL.Path)
alert, err := rh.alertByPath(strings.TrimPrefix(r.URL.Path, "/api/v1/"))
if err != nil {
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.Write(data)
// /api/v1/<groupID>/<alertID>/status
if strings.HasPrefix(r.URL.Path, "/api/v1/") {
data, err := json.Marshal(alert)
if err != nil {
httpserver.Errorf(w, r, "failed to marshal alert: %s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.Write(data)
return true
}
// <groupID>/<alertID>/status
WriteAlert(w, alert)
return true
}
}
@@ -80,20 +119,25 @@ type listGroupsResponse struct {
Status string `json:"status"`
}
func (rh *requestHandler) listGroups() ([]byte, error) {
func (rh *requestHandler) groups() []APIGroup {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
lr := listGroupsResponse{Status: "success"}
var groups []APIGroup
for _, g := range rh.m.groups {
lr.Data.Groups = append(lr.Data.Groups, g.toAPI())
groups = append(groups, g.toAPI())
}
// sort list of alerts for deterministic output
sort.Slice(lr.Data.Groups, func(i, j int) bool {
return lr.Data.Groups[i].Name < lr.Data.Groups[j].Name
sort.Slice(groups, func(i, j int) bool {
return groups[i].Name < groups[j].Name
})
return groups
}
func (rh *requestHandler) listGroups() ([]byte, error) {
lr := listGroupsResponse{Status: "success"}
lr.Data.Groups = rh.groups()
b, err := json.Marshal(lr)
if err != nil {
return nil, &httpserver.ErrorWithStatusCode{
@@ -111,6 +155,30 @@ type listAlertsResponse struct {
Status string `json:"status"`
}
func (rh *requestHandler) groupAlerts() []GroupAlerts {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
var groupAlerts []GroupAlerts
for _, g := range rh.m.groups {
var alerts []*APIAlert
for _, r := range g.Rules {
a, ok := r.(*AlertingRule)
if !ok {
continue
}
alerts = append(alerts, a.AlertsAPI()...)
}
if len(alerts) > 0 {
groupAlerts = append(groupAlerts, GroupAlerts{
Group: g.toAPI(),
Alerts: alerts,
})
}
}
return groupAlerts
}
func (rh *requestHandler) listAlerts() ([]byte, error) {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
@@ -141,18 +209,17 @@ func (rh *requestHandler) listAlerts() ([]byte, error) {
return b, nil
}
func (rh *requestHandler) alert(path string) ([]byte, error) {
func (rh *requestHandler) alertByPath(path string) (*APIAlert, error) {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
parts := strings.SplitN(strings.TrimPrefix(path, "/api/v1/"), "/", 3)
parts := strings.SplitN(strings.TrimLeft(path, "/"), "/", 3)
if len(parts) != 3 {
return nil, &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf(`path %q cointains /status suffix but doesn't match pattern "/group/alert/status"`, path),
Err: fmt.Errorf(`path %q cointains /status suffix but doesn't match pattern "/groupID/alertID/status"`, path),
StatusCode: http.StatusBadRequest,
}
}
groupID, err := uint64FromPath(parts[0])
if err != nil {
return nil, badRequest(fmt.Errorf(`cannot parse groupID: %w`, err))
@@ -165,7 +232,7 @@ func (rh *requestHandler) alert(path string) ([]byte, error) {
if err != nil {
return nil, errResponse(err, http.StatusNotFound)
}
return json.Marshal(resp)
return resp, nil
}
func uint64FromPath(path string) (uint64, error) {

276
app/vmalert/web.qtpl Normal file
View File

@@ -0,0 +1,276 @@
{% package main %}
{% import (
"time"
"sort"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
) %}
{% func Welcome() %}
{%= tpl.Header("vmalert", navItems) %}
<p>
API:<br>
{% for _, p := range apiLinks %}
{%code
p, doc := p[0], p[1]
%}
<a href="{%s p %}">{%s p %}</a> - {%s doc %}<br/>
{% endfor %}
</p>
{%= tpl.Footer() %}
{% endfunc %}
{% func ListGroups(groups []APIGroup) %}
{%= tpl.Header("Groups", navItems) %}
{% if len(groups) > 0 %}
{%code
rOk := make(map[string]int)
rNotOk := make(map[string]int)
for _, g := range groups {
for _, r := range g.AlertingRules{
if r.LastError != "" {
rNotOk[g.Name]++
} else {
rOk[g.Name]++
}
}
for _, r := range g.RecordingRules{
if r.LastError != "" {
rNotOk[g.Name]++
} else {
rOk[g.Name]++
}
}
}
%}
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
{% for _, g := range groups %}
<div class="group-heading{% if rNotOk[g.Name] > 0 %} alert-danger{% endif %}" data-bs-target="rules-{%s g.ID %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%s g.Interval %})</a>
{% if rNotOk[g.Name] > 0 %}<span class="badge bg-danger" title="Number of rules withs status Error">{%d rNotOk[g.Name] %}</span> {% endif %}
<span class="badge bg-success" title="Number of rules withs status Ok">{%d rOk[g.Name] %}</span>
<p class="fs-6 fw-lighter">{%s g.File %}</p>
{% if len(g.ExtraFilterLabels) > 0 %}
<div class="fs-6 fw-lighter">Extra filter labels
{% for k, v := range g.ExtraFilterLabels %}
<span class="float-left badge bg-primary">{%s k %}={%s v %}</span>
{% endfor %}
</div>
{% endif %}
</div>
<div class="collapse" id="rules-{%s g.ID %}">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Rule</th>
<th scope="col" title="Shows if rule's execution ended with error">Error</th>
<th scope="col" title="How many samples were produced by the rule">Samples</th>
<th scope="col" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
{% for _, ar := range g.AlertingRules %}
<tr{% if ar.LastError != "" %} class="alert-danger"{% endif %}>
<td>
<b>alert:</b> {%s ar.Name %} (for: {%v ar.For %})<br>
<code><pre>{%s ar.Expression %}</pre></code><br>
{% if len(ar.Labels) > 0 %} <b>Labels:</b>{% endif %}
{% for k, v := range ar.Labels %}
<span class="ms-1 badge bg-primary">{%s k %}={%s v %}</span>
{% endfor %}
</td>
<td><div class="error-cell">{%s ar.LastError %}</div></td>
<td>{%d ar.LastSamples %}</td>
<td>{%f.3 time.Since(ar.LastExec).Seconds() %}s ago</td>
</tr>
{% endfor %}
{% for _, rr := range g.RecordingRules %}
<tr>
<td>
<b>record:</b> {%s rr.Name %}<br>
<code><pre>{%s rr.Expression %}</pre></code>
{% if len(rr.Labels) > 0 %} <b>Labels:</b>{% endif %}
{% for k, v := range rr.Labels %}
<span class="ms-1 badge bg-primary">{%s k %}={%s v %}</span>
{% endfor %}
</td>
<td><div class="error-cell">{%s rr.LastError %}</div></td>
<td>{%d rr.LastSamples %}</td>
<td>{%f.3 time.Since(rr.LastExec).Seconds() %}s ago</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endfor %}
{% else %}
<div>
<p>No items...</p>
</div>
{% endif %}
{%= tpl.Footer() %}
{% endfunc %}
{% func ListAlerts(groupAlerts []GroupAlerts) %}
{%= tpl.Header("Alerts", navItems) %}
{% if len(groupAlerts) > 0 %}
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
{% for _, ga := range groupAlerts %}
{%code g := ga.Group %}
<div class="group-heading alert-danger" data-bs-target="rules-{%s g.ID %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %}</a>
<span class="badge bg-danger" title="Number of active alerts">{%d len(ga.Alerts) %}</span>
<br>
<p class="fs-6 fw-lighter">{%s g.File %}</p>
</div>
{%code
var keys []string
alertsByRule := make(map[string][]*APIAlert)
for _, alert := range ga.Alerts {
if len(alertsByRule[alert.RuleID]) < 1 {
keys = append(keys, alert.RuleID)
}
alertsByRule[alert.RuleID] = append(alertsByRule[alert.RuleID], alert)
}
sort.Strings(keys)
%}
<div class="collapse" id="rules-{%s g.ID %}">
{% for _, ruleID := range keys %}
{%code
defaultAR := alertsByRule[ruleID][0]
var labelKeys []string
for k := range defaultAR.Labels {
labelKeys = append(labelKeys, k)
}
sort.Strings(labelKeys)
%}
<br>
<b>alert:</b> {%s defaultAR.Name %} ({%d len(alertsByRule[ruleID]) %})<br>
<b>expr:</b><code><pre>{%s defaultAR.Expression %}</pre></code>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Labels</th>
<th scope="col">State</th>
<th scope="col">Active at</th>
<th scope="col">Value</th>
<th scope="col">Link</th>
</tr>
</thead>
<tbody>
{% for _, ar := range alertsByRule[ruleID] %}
<tr>
<td>
{% for _, k := range labelKeys %}
<span class="ms-1 badge bg-primary">{%s k %}={%s ar.Labels[k] %}</span>
{% endfor %}
</td>
<td><span class="badge {% if ar.State=="firing" %}bg-danger{% else %} bg-warning text-dark{% endif %}">{%s ar.State %}</span></td>
<td>{%s ar.ActiveAt.Format("2006-01-02T15:04:05Z07:00") %}</td>
<td>{%s ar.Value %}</td>
<td>
<a href="/{%s g.ID %}/{%s ar.ID %}/status">Details</a>
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% endfor %}
</div>
<br>
{% endfor %}
{% else %}
<div>
<p>No items...</p>
</div>
{% endif %}
{%= tpl.Footer() %}
{% endfunc %}
{% func Alert(alert *APIAlert) %}
{%= tpl.Header("", navItems) %}
{%code
var labelKeys []string
for k := range alert.Labels {
labelKeys = append(labelKeys, k)
}
sort.Strings(labelKeys)
var annotationKeys []string
for k := range alert.Annotations {
annotationKeys = append(annotationKeys, k)
}
sort.Strings(annotationKeys)
%}
<div class="display-6 pb-3 mb-3">{%s alert.Name %}<span class="ms-2 badge {% if alert.State=="firing" %}bg-danger{% else %} bg-warning text-dark{% endif %}">{%s alert.State %}</span></div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Active at
</div>
<div class="col">
{%s alert.ActiveAt.Format("2006-01-02T15:04:05Z07:00") %}
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Expr
</div>
<div class="col">
<code><pre>{%s alert.Expression %}</pre></code>
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Labels
</div>
<div class="col">
{% for _, k := range labelKeys %}
<span class="m-1 badge bg-primary">{%s k %}={%s alert.Labels[k] %}</span>
{% endfor %}
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Annotations
</div>
<div class="col">
{% for _, k := range annotationKeys %}
<b>{%s k %}:</b><br>
<p>{%s alert.Annotations[k] %}</p>
{% endfor %}
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Group
</div>
<div class="col">
<a target="_blank" href="/groups#group-{%s alert.GroupID %}">{%s alert.GroupID %}</a>
</div>
</div>
</div>
{%= tpl.Footer() %}
{% endfunc %}

912
app/vmalert/web.qtpl.go Normal file
View File

@@ -0,0 +1,912 @@
// Code generated by qtc from "web.qtpl". DO NOT EDIT.
// See https://github.com/valyala/quicktemplate for details.
//line app/vmalert/web.qtpl:1
package main
//line app/vmalert/web.qtpl:3
import (
"sort"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
)
//line app/vmalert/web.qtpl:11
import (
qtio422016 "io"
qt422016 "github.com/valyala/quicktemplate"
)
//line app/vmalert/web.qtpl:11
var (
_ = qtio422016.Copy
_ = qt422016.AcquireByteBuffer
)
//line app/vmalert/web.qtpl:11
func StreamWelcome(qw422016 *qt422016.Writer) {
//line app/vmalert/web.qtpl:11
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:12
tpl.StreamHeader(qw422016, "vmalert", navItems)
//line app/vmalert/web.qtpl:12
qw422016.N().S(`
<p>
API:<br>
`)
//line app/vmalert/web.qtpl:15
for _, p := range apiLinks {
//line app/vmalert/web.qtpl:15
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:17
p, doc := p[0], p[1]
//line app/vmalert/web.qtpl:18
qw422016.N().S(`
<a href="`)
//line app/vmalert/web.qtpl:19
qw422016.E().S(p)
//line app/vmalert/web.qtpl:19
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:19
qw422016.E().S(p)
//line app/vmalert/web.qtpl:19
qw422016.N().S(`</a> - `)
//line app/vmalert/web.qtpl:19
qw422016.E().S(doc)
//line app/vmalert/web.qtpl:19
qw422016.N().S(`<br/>
`)
//line app/vmalert/web.qtpl:20
}
//line app/vmalert/web.qtpl:20
qw422016.N().S(`
</p>
`)
//line app/vmalert/web.qtpl:22
tpl.StreamFooter(qw422016)
//line app/vmalert/web.qtpl:22
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:23
}
//line app/vmalert/web.qtpl:23
func WriteWelcome(qq422016 qtio422016.Writer) {
//line app/vmalert/web.qtpl:23
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:23
StreamWelcome(qw422016)
//line app/vmalert/web.qtpl:23
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:23
}
//line app/vmalert/web.qtpl:23
func Welcome() string {
//line app/vmalert/web.qtpl:23
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:23
WriteWelcome(qb422016)
//line app/vmalert/web.qtpl:23
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:23
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:23
return qs422016
//line app/vmalert/web.qtpl:23
}
//line app/vmalert/web.qtpl:25
func StreamListGroups(qw422016 *qt422016.Writer, groups []APIGroup) {
//line app/vmalert/web.qtpl:25
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:26
tpl.StreamHeader(qw422016, "Groups", navItems)
//line app/vmalert/web.qtpl:26
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:27
if len(groups) > 0 {
//line app/vmalert/web.qtpl:27
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:29
rOk := make(map[string]int)
rNotOk := make(map[string]int)
for _, g := range groups {
for _, r := range g.AlertingRules {
if r.LastError != "" {
rNotOk[g.Name]++
} else {
rOk[g.Name]++
}
}
for _, r := range g.RecordingRules {
if r.LastError != "" {
rNotOk[g.Name]++
} else {
rOk[g.Name]++
}
}
}
//line app/vmalert/web.qtpl:47
qw422016.N().S(`
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
`)
//line app/vmalert/web.qtpl:50
for _, g := range groups {
//line app/vmalert/web.qtpl:50
qw422016.N().S(`
<div class="group-heading`)
//line app/vmalert/web.qtpl:51
if rNotOk[g.Name] > 0 {
//line app/vmalert/web.qtpl:51
qw422016.N().S(` alert-danger`)
//line app/vmalert/web.qtpl:51
}
//line app/vmalert/web.qtpl:51
qw422016.N().S(`" data-bs-target="rules-`)
//line app/vmalert/web.qtpl:51
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:51
qw422016.N().S(`">
<span class="anchor" id="group-`)
//line app/vmalert/web.qtpl:52
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:52
qw422016.N().S(`"></span>
<a href="#group-`)
//line app/vmalert/web.qtpl:53
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:53
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:53
qw422016.E().S(g.Name)
//line app/vmalert/web.qtpl:53
if g.Type != "prometheus" {
//line app/vmalert/web.qtpl:53
qw422016.N().S(` (`)
//line app/vmalert/web.qtpl:53
qw422016.E().S(g.Type)
//line app/vmalert/web.qtpl:53
qw422016.N().S(`)`)
//line app/vmalert/web.qtpl:53
}
//line app/vmalert/web.qtpl:53
qw422016.N().S(` (every `)
//line app/vmalert/web.qtpl:53
qw422016.E().S(g.Interval)
//line app/vmalert/web.qtpl:53
qw422016.N().S(`)</a>
`)
//line app/vmalert/web.qtpl:54
if rNotOk[g.Name] > 0 {
//line app/vmalert/web.qtpl:54
qw422016.N().S(`<span class="badge bg-danger" title="Number of rules withs status Error">`)
//line app/vmalert/web.qtpl:54
qw422016.N().D(rNotOk[g.Name])
//line app/vmalert/web.qtpl:54
qw422016.N().S(`</span> `)
//line app/vmalert/web.qtpl:54
}
//line app/vmalert/web.qtpl:54
qw422016.N().S(`
<span class="badge bg-success" title="Number of rules withs status Ok">`)
//line app/vmalert/web.qtpl:55
qw422016.N().D(rOk[g.Name])
//line app/vmalert/web.qtpl:55
qw422016.N().S(`</span>
<p class="fs-6 fw-lighter">`)
//line app/vmalert/web.qtpl:56
qw422016.E().S(g.File)
//line app/vmalert/web.qtpl:56
qw422016.N().S(`</p>
`)
//line app/vmalert/web.qtpl:57
if len(g.ExtraFilterLabels) > 0 {
//line app/vmalert/web.qtpl:57
qw422016.N().S(`
<div class="fs-6 fw-lighter">Extra filter labels
`)
//line app/vmalert/web.qtpl:59
for k, v := range g.ExtraFilterLabels {
//line app/vmalert/web.qtpl:59
qw422016.N().S(`
<span class="float-left badge bg-primary">`)
//line app/vmalert/web.qtpl:60
qw422016.E().S(k)
//line app/vmalert/web.qtpl:60
qw422016.N().S(`=`)
//line app/vmalert/web.qtpl:60
qw422016.E().S(v)
//line app/vmalert/web.qtpl:60
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:61
}
//line app/vmalert/web.qtpl:61
qw422016.N().S(`
</div>
`)
//line app/vmalert/web.qtpl:63
}
//line app/vmalert/web.qtpl:63
qw422016.N().S(`
</div>
<div class="collapse" id="rules-`)
//line app/vmalert/web.qtpl:65
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:65
qw422016.N().S(`">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Rule</th>
<th scope="col" title="Shows if rule's execution ended with error">Error</th>
<th scope="col" title="How many samples were produced by the rule">Samples</th>
<th scope="col" title="How many seconds ago rule was executed">Updated</th>
</tr>
</thead>
<tbody>
`)
//line app/vmalert/web.qtpl:76
for _, ar := range g.AlertingRules {
//line app/vmalert/web.qtpl:76
qw422016.N().S(`
<tr`)
//line app/vmalert/web.qtpl:77
if ar.LastError != "" {
//line app/vmalert/web.qtpl:77
qw422016.N().S(` class="alert-danger"`)
//line app/vmalert/web.qtpl:77
}
//line app/vmalert/web.qtpl:77
qw422016.N().S(`>
<td>
<b>alert:</b> `)
//line app/vmalert/web.qtpl:79
qw422016.E().S(ar.Name)
//line app/vmalert/web.qtpl:79
qw422016.N().S(` (for: `)
//line app/vmalert/web.qtpl:79
qw422016.E().V(ar.For)
//line app/vmalert/web.qtpl:79
qw422016.N().S(`)<br>
<code><pre>`)
//line app/vmalert/web.qtpl:80
qw422016.E().S(ar.Expression)
//line app/vmalert/web.qtpl:80
qw422016.N().S(`</pre></code><br>
`)
//line app/vmalert/web.qtpl:81
if len(ar.Labels) > 0 {
//line app/vmalert/web.qtpl:81
qw422016.N().S(` <b>Labels:</b>`)
//line app/vmalert/web.qtpl:81
}
//line app/vmalert/web.qtpl:81
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:82
for k, v := range ar.Labels {
//line app/vmalert/web.qtpl:82
qw422016.N().S(`
<span class="ms-1 badge bg-primary">`)
//line app/vmalert/web.qtpl:83
qw422016.E().S(k)
//line app/vmalert/web.qtpl:83
qw422016.N().S(`=`)
//line app/vmalert/web.qtpl:83
qw422016.E().S(v)
//line app/vmalert/web.qtpl:83
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:84
}
//line app/vmalert/web.qtpl:84
qw422016.N().S(`
</td>
<td><div class="error-cell">`)
//line app/vmalert/web.qtpl:86
qw422016.E().S(ar.LastError)
//line app/vmalert/web.qtpl:86
qw422016.N().S(`</div></td>
<td>`)
//line app/vmalert/web.qtpl:87
qw422016.N().D(ar.LastSamples)
//line app/vmalert/web.qtpl:87
qw422016.N().S(`</td>
<td>`)
//line app/vmalert/web.qtpl:88
qw422016.N().FPrec(time.Since(ar.LastExec).Seconds(), 3)
//line app/vmalert/web.qtpl:88
qw422016.N().S(`s ago</td>
</tr>
`)
//line app/vmalert/web.qtpl:90
}
//line app/vmalert/web.qtpl:90
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:91
for _, rr := range g.RecordingRules {
//line app/vmalert/web.qtpl:91
qw422016.N().S(`
<tr>
<td>
<b>record:</b> `)
//line app/vmalert/web.qtpl:94
qw422016.E().S(rr.Name)
//line app/vmalert/web.qtpl:94
qw422016.N().S(`<br>
<code><pre>`)
//line app/vmalert/web.qtpl:95
qw422016.E().S(rr.Expression)
//line app/vmalert/web.qtpl:95
qw422016.N().S(`</pre></code>
`)
//line app/vmalert/web.qtpl:96
if len(rr.Labels) > 0 {
//line app/vmalert/web.qtpl:96
qw422016.N().S(` <b>Labels:</b>`)
//line app/vmalert/web.qtpl:96
}
//line app/vmalert/web.qtpl:96
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:97
for k, v := range rr.Labels {
//line app/vmalert/web.qtpl:97
qw422016.N().S(`
<span class="ms-1 badge bg-primary">`)
//line app/vmalert/web.qtpl:98
qw422016.E().S(k)
//line app/vmalert/web.qtpl:98
qw422016.N().S(`=`)
//line app/vmalert/web.qtpl:98
qw422016.E().S(v)
//line app/vmalert/web.qtpl:98
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:99
}
//line app/vmalert/web.qtpl:99
qw422016.N().S(`
</td>
<td><div class="error-cell">`)
//line app/vmalert/web.qtpl:101
qw422016.E().S(rr.LastError)
//line app/vmalert/web.qtpl:101
qw422016.N().S(`</div></td>
<td>`)
//line app/vmalert/web.qtpl:102
qw422016.N().D(rr.LastSamples)
//line app/vmalert/web.qtpl:102
qw422016.N().S(`</td>
<td>`)
//line app/vmalert/web.qtpl:103
qw422016.N().FPrec(time.Since(rr.LastExec).Seconds(), 3)
//line app/vmalert/web.qtpl:103
qw422016.N().S(`s ago</td>
</tr>
`)
//line app/vmalert/web.qtpl:105
}
//line app/vmalert/web.qtpl:105
qw422016.N().S(`
</tbody>
</table>
</div>
`)
//line app/vmalert/web.qtpl:109
}
//line app/vmalert/web.qtpl:109
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:111
} else {
//line app/vmalert/web.qtpl:111
qw422016.N().S(`
<div>
<p>No items...</p>
</div>
`)
//line app/vmalert/web.qtpl:115
}
//line app/vmalert/web.qtpl:115
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:117
tpl.StreamFooter(qw422016)
//line app/vmalert/web.qtpl:117
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:119
}
//line app/vmalert/web.qtpl:119
func WriteListGroups(qq422016 qtio422016.Writer, groups []APIGroup) {
//line app/vmalert/web.qtpl:119
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:119
StreamListGroups(qw422016, groups)
//line app/vmalert/web.qtpl:119
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:119
}
//line app/vmalert/web.qtpl:119
func ListGroups(groups []APIGroup) string {
//line app/vmalert/web.qtpl:119
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:119
WriteListGroups(qb422016, groups)
//line app/vmalert/web.qtpl:119
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:119
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:119
return qs422016
//line app/vmalert/web.qtpl:119
}
//line app/vmalert/web.qtpl:122
func StreamListAlerts(qw422016 *qt422016.Writer, groupAlerts []GroupAlerts) {
//line app/vmalert/web.qtpl:122
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:123
tpl.StreamHeader(qw422016, "Alerts", navItems)
//line app/vmalert/web.qtpl:123
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:124
if len(groupAlerts) > 0 {
//line app/vmalert/web.qtpl:124
qw422016.N().S(`
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
`)
//line app/vmalert/web.qtpl:127
for _, ga := range groupAlerts {
//line app/vmalert/web.qtpl:127
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:128
g := ga.Group
//line app/vmalert/web.qtpl:128
qw422016.N().S(`
<div class="group-heading alert-danger" data-bs-target="rules-`)
//line app/vmalert/web.qtpl:129
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:129
qw422016.N().S(`">
<span class="anchor" id="group-`)
//line app/vmalert/web.qtpl:130
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:130
qw422016.N().S(`"></span>
<a href="#group-`)
//line app/vmalert/web.qtpl:131
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:131
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:131
qw422016.E().S(g.Name)
//line app/vmalert/web.qtpl:131
if g.Type != "prometheus" {
//line app/vmalert/web.qtpl:131
qw422016.N().S(` (`)
//line app/vmalert/web.qtpl:131
qw422016.E().S(g.Type)
//line app/vmalert/web.qtpl:131
qw422016.N().S(`)`)
//line app/vmalert/web.qtpl:131
}
//line app/vmalert/web.qtpl:131
qw422016.N().S(`</a>
<span class="badge bg-danger" title="Number of active alerts">`)
//line app/vmalert/web.qtpl:132
qw422016.N().D(len(ga.Alerts))
//line app/vmalert/web.qtpl:132
qw422016.N().S(`</span>
<br>
<p class="fs-6 fw-lighter">`)
//line app/vmalert/web.qtpl:134
qw422016.E().S(g.File)
//line app/vmalert/web.qtpl:134
qw422016.N().S(`</p>
</div>
`)
//line app/vmalert/web.qtpl:137
var keys []string
alertsByRule := make(map[string][]*APIAlert)
for _, alert := range ga.Alerts {
if len(alertsByRule[alert.RuleID]) < 1 {
keys = append(keys, alert.RuleID)
}
alertsByRule[alert.RuleID] = append(alertsByRule[alert.RuleID], alert)
}
sort.Strings(keys)
//line app/vmalert/web.qtpl:146
qw422016.N().S(`
<div class="collapse" id="rules-`)
//line app/vmalert/web.qtpl:147
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:147
qw422016.N().S(`">
`)
//line app/vmalert/web.qtpl:148
for _, ruleID := range keys {
//line app/vmalert/web.qtpl:148
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:150
defaultAR := alertsByRule[ruleID][0]
var labelKeys []string
for k := range defaultAR.Labels {
labelKeys = append(labelKeys, k)
}
sort.Strings(labelKeys)
//line app/vmalert/web.qtpl:156
qw422016.N().S(`
<br>
<b>alert:</b> `)
//line app/vmalert/web.qtpl:158
qw422016.E().S(defaultAR.Name)
//line app/vmalert/web.qtpl:158
qw422016.N().S(` (`)
//line app/vmalert/web.qtpl:158
qw422016.N().D(len(alertsByRule[ruleID]))
//line app/vmalert/web.qtpl:158
qw422016.N().S(`)<br>
<b>expr:</b><code><pre>`)
//line app/vmalert/web.qtpl:159
qw422016.E().S(defaultAR.Expression)
//line app/vmalert/web.qtpl:159
qw422016.N().S(`</pre></code>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Labels</th>
<th scope="col">State</th>
<th scope="col">Active at</th>
<th scope="col">Value</th>
<th scope="col">Link</th>
</tr>
</thead>
<tbody>
`)
//line app/vmalert/web.qtpl:171
for _, ar := range alertsByRule[ruleID] {
//line app/vmalert/web.qtpl:171
qw422016.N().S(`
<tr>
<td>
`)
//line app/vmalert/web.qtpl:174
for _, k := range labelKeys {
//line app/vmalert/web.qtpl:174
qw422016.N().S(`
<span class="ms-1 badge bg-primary">`)
//line app/vmalert/web.qtpl:175
qw422016.E().S(k)
//line app/vmalert/web.qtpl:175
qw422016.N().S(`=`)
//line app/vmalert/web.qtpl:175
qw422016.E().S(ar.Labels[k])
//line app/vmalert/web.qtpl:175
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:176
}
//line app/vmalert/web.qtpl:176
qw422016.N().S(`
</td>
<td><span class="badge `)
//line app/vmalert/web.qtpl:178
if ar.State == "firing" {
//line app/vmalert/web.qtpl:178
qw422016.N().S(`bg-danger`)
//line app/vmalert/web.qtpl:178
} else {
//line app/vmalert/web.qtpl:178
qw422016.N().S(` bg-warning text-dark`)
//line app/vmalert/web.qtpl:178
}
//line app/vmalert/web.qtpl:178
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:178
qw422016.E().S(ar.State)
//line app/vmalert/web.qtpl:178
qw422016.N().S(`</span></td>
<td>`)
//line app/vmalert/web.qtpl:179
qw422016.E().S(ar.ActiveAt.Format("2006-01-02T15:04:05Z07:00"))
//line app/vmalert/web.qtpl:179
qw422016.N().S(`</td>
<td>`)
//line app/vmalert/web.qtpl:180
qw422016.E().S(ar.Value)
//line app/vmalert/web.qtpl:180
qw422016.N().S(`</td>
<td>
<a href="/`)
//line app/vmalert/web.qtpl:182
qw422016.E().S(g.ID)
//line app/vmalert/web.qtpl:182
qw422016.N().S(`/`)
//line app/vmalert/web.qtpl:182
qw422016.E().S(ar.ID)
//line app/vmalert/web.qtpl:182
qw422016.N().S(`/status">Details</a>
</td>
</tr>
`)
//line app/vmalert/web.qtpl:185
}
//line app/vmalert/web.qtpl:185
qw422016.N().S(`
</tbody>
</table>
`)
//line app/vmalert/web.qtpl:188
}
//line app/vmalert/web.qtpl:188
qw422016.N().S(`
</div>
<br>
`)
//line app/vmalert/web.qtpl:191
}
//line app/vmalert/web.qtpl:191
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:193
} else {
//line app/vmalert/web.qtpl:193
qw422016.N().S(`
<div>
<p>No items...</p>
</div>
`)
//line app/vmalert/web.qtpl:197
}
//line app/vmalert/web.qtpl:197
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:199
tpl.StreamFooter(qw422016)
//line app/vmalert/web.qtpl:199
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:201
}
//line app/vmalert/web.qtpl:201
func WriteListAlerts(qq422016 qtio422016.Writer, groupAlerts []GroupAlerts) {
//line app/vmalert/web.qtpl:201
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:201
StreamListAlerts(qw422016, groupAlerts)
//line app/vmalert/web.qtpl:201
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:201
}
//line app/vmalert/web.qtpl:201
func ListAlerts(groupAlerts []GroupAlerts) string {
//line app/vmalert/web.qtpl:201
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:201
WriteListAlerts(qb422016, groupAlerts)
//line app/vmalert/web.qtpl:201
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:201
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:201
return qs422016
//line app/vmalert/web.qtpl:201
}
//line app/vmalert/web.qtpl:203
func StreamAlert(qw422016 *qt422016.Writer, alert *APIAlert) {
//line app/vmalert/web.qtpl:203
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:204
tpl.StreamHeader(qw422016, "", navItems)
//line app/vmalert/web.qtpl:204
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:206
var labelKeys []string
for k := range alert.Labels {
labelKeys = append(labelKeys, k)
}
sort.Strings(labelKeys)
var annotationKeys []string
for k := range alert.Annotations {
annotationKeys = append(annotationKeys, k)
}
sort.Strings(annotationKeys)
//line app/vmalert/web.qtpl:217
qw422016.N().S(`
<div class="display-6 pb-3 mb-3">`)
//line app/vmalert/web.qtpl:218
qw422016.E().S(alert.Name)
//line app/vmalert/web.qtpl:218
qw422016.N().S(`<span class="ms-2 badge `)
//line app/vmalert/web.qtpl:218
if alert.State == "firing" {
//line app/vmalert/web.qtpl:218
qw422016.N().S(`bg-danger`)
//line app/vmalert/web.qtpl:218
} else {
//line app/vmalert/web.qtpl:218
qw422016.N().S(` bg-warning text-dark`)
//line app/vmalert/web.qtpl:218
}
//line app/vmalert/web.qtpl:218
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:218
qw422016.E().S(alert.State)
//line app/vmalert/web.qtpl:218
qw422016.N().S(`</span></div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Active at
</div>
<div class="col">
`)
//line app/vmalert/web.qtpl:225
qw422016.E().S(alert.ActiveAt.Format("2006-01-02T15:04:05Z07:00"))
//line app/vmalert/web.qtpl:225
qw422016.N().S(`
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Expr
</div>
<div class="col">
<code><pre>`)
//line app/vmalert/web.qtpl:235
qw422016.E().S(alert.Expression)
//line app/vmalert/web.qtpl:235
qw422016.N().S(`</pre></code>
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Labels
</div>
<div class="col">
`)
//line app/vmalert/web.qtpl:245
for _, k := range labelKeys {
//line app/vmalert/web.qtpl:245
qw422016.N().S(`
<span class="m-1 badge bg-primary">`)
//line app/vmalert/web.qtpl:246
qw422016.E().S(k)
//line app/vmalert/web.qtpl:246
qw422016.N().S(`=`)
//line app/vmalert/web.qtpl:246
qw422016.E().S(alert.Labels[k])
//line app/vmalert/web.qtpl:246
qw422016.N().S(`</span>
`)
//line app/vmalert/web.qtpl:247
}
//line app/vmalert/web.qtpl:247
qw422016.N().S(`
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Annotations
</div>
<div class="col">
`)
//line app/vmalert/web.qtpl:257
for _, k := range annotationKeys {
//line app/vmalert/web.qtpl:257
qw422016.N().S(`
<b>`)
//line app/vmalert/web.qtpl:258
qw422016.E().S(k)
//line app/vmalert/web.qtpl:258
qw422016.N().S(`:</b><br>
<p>`)
//line app/vmalert/web.qtpl:259
qw422016.E().S(alert.Annotations[k])
//line app/vmalert/web.qtpl:259
qw422016.N().S(`</p>
`)
//line app/vmalert/web.qtpl:260
}
//line app/vmalert/web.qtpl:260
qw422016.N().S(`
</div>
</div>
</div>
<div class="container border-bottom p-2">
<div class="row">
<div class="col-2">
Group
</div>
<div class="col">
<a target="_blank" href="/groups#group-`)
//line app/vmalert/web.qtpl:270
qw422016.E().S(alert.GroupID)
//line app/vmalert/web.qtpl:270
qw422016.N().S(`">`)
//line app/vmalert/web.qtpl:270
qw422016.E().S(alert.GroupID)
//line app/vmalert/web.qtpl:270
qw422016.N().S(`</a>
</div>
</div>
</div>
`)
//line app/vmalert/web.qtpl:274
tpl.StreamFooter(qw422016)
//line app/vmalert/web.qtpl:274
qw422016.N().S(`
`)
//line app/vmalert/web.qtpl:276
}
//line app/vmalert/web.qtpl:276
func WriteAlert(qq422016 qtio422016.Writer, alert *APIAlert) {
//line app/vmalert/web.qtpl:276
qw422016 := qt422016.AcquireWriter(qq422016)
//line app/vmalert/web.qtpl:276
StreamAlert(qw422016, alert)
//line app/vmalert/web.qtpl:276
qt422016.ReleaseWriter(qw422016)
//line app/vmalert/web.qtpl:276
}
//line app/vmalert/web.qtpl:276
func Alert(alert *APIAlert) string {
//line app/vmalert/web.qtpl:276
qb422016 := qt422016.AcquireByteBuffer()
//line app/vmalert/web.qtpl:276
WriteAlert(qb422016, alert)
//line app/vmalert/web.qtpl:276
qs422016 := string(qb422016.B)
//line app/vmalert/web.qtpl:276
qt422016.ReleaseByteBuffer(qb422016)
//line app/vmalert/web.qtpl:276
return qs422016
//line app/vmalert/web.qtpl:276
}

View File

@@ -9,6 +9,7 @@ import (
type APIAlert struct {
ID string `json:"id"`
Name string `json:"name"`
RuleID string `json:"rule_id"`
GroupID string `json:"group_id"`
Expression string `json:"expression"`
State string `json:"state"`
@@ -20,14 +21,16 @@ type APIAlert struct {
// APIGroup represents Group for WEB view
type APIGroup struct {
Name string `json:"name"`
Type string `json:"type"`
ID string `json:"id"`
File string `json:"file"`
Interval string `json:"interval"`
Concurrency int `json:"concurrency"`
AlertingRules []APIAlertingRule `json:"alerting_rules"`
RecordingRules []APIRecordingRule `json:"recording_rules"`
Name string `json:"name"`
Type string `json:"type"`
ID string `json:"id"`
File string `json:"file"`
Interval string `json:"interval"`
Concurrency int `json:"concurrency"`
ExtraFilterLabels map[string]string `json:"extra_filter_labels"`
Labels map[string]string `json:"labels,omitempty"`
AlertingRules []APIAlertingRule `json:"alerting_rules"`
RecordingRules []APIRecordingRule `json:"recording_rules"`
}
// APIAlertingRule represents AlertingRule for WEB view
@@ -39,6 +42,7 @@ type APIAlertingRule struct {
Expression string `json:"expression"`
For string `json:"for"`
LastError string `json:"last_error"`
LastSamples int `json:"last_samples"`
LastExec time.Time `json:"last_exec"`
Labels map[string]string `json:"labels"`
Annotations map[string]string `json:"annotations"`
@@ -46,12 +50,19 @@ type APIAlertingRule struct {
// APIRecordingRule represents RecordingRule for WEB view
type APIRecordingRule struct {
ID string `json:"id"`
Name string `json:"name"`
Type string `json:"type"`
GroupID string `json:"group_id"`
Expression string `json:"expression"`
LastError string `json:"last_error"`
LastExec time.Time `json:"last_exec"`
Labels map[string]string `json:"labels"`
ID string `json:"id"`
Name string `json:"name"`
Type string `json:"type"`
GroupID string `json:"group_id"`
Expression string `json:"expression"`
LastError string `json:"last_error"`
LastSamples int `json:"last_samples"`
LastExec time.Time `json:"last_exec"`
Labels map[string]string `json:"labels"`
}
// GroupAlerts represents a group of alerts for WEB view
type GroupAlerts struct {
Group APIGroup
Alerts []*APIAlert
}

View File

@@ -1,8 +1,8 @@
## vmauth
# vmauth
`vmauth` is a simple auth proxy and router for [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics).
It reads username and password from [Basic Auth headers](https://en.wikipedia.org/wiki/Basic_access_authentication),
matches them against configs pointed by `-auth.config` command-line flag and proxies incoming HTTP requests to the configured per-user `url_prefix` on successful match.
`vmauth` is a simple auth proxy, router and [load balancer](#load-balancing) for [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics).
It reads auth credentials from `Authorization` http header ([Basic Auth](https://en.wikipedia.org/wiki/Basic_access_authentication) and `Bearer token` is supported),
matches them against configs pointed by [-auth.config](#auth-config) command-line flag and proxies incoming HTTP requests to the configured per-user `url_prefix` on successful match.
## Quick start
@@ -17,18 +17,24 @@ and pass the following flag to `vmauth` binary in order to start authorizing and
After that `vmauth` starts accepting HTTP requests on port `8427` and routing them according to the provided [-auth.config](#auth-config).
The port can be modified via `-httpListenAddr` command-line flag.
The auth config can be reloaded by passing `SIGHUP` signal to `vmauth`.
The auth config can be reloaded either by passing `SIGHUP` signal to `vmauth` or by querying `/-/reload` http endpoint.
Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags).
Pass `-help` to `vmauth` in order to see all the supported command-line flags with their descriptions.
Feel free [contacting us](mailto:info@victoriametrics.com) if you need customized auth proxy for VictoriaMetrics with the support of LDAP, SSO, RBAC, SAML, accounting, limits, etc.
Feel free [contacting us](mailto:info@victoriametrics.com) if you need customized auth proxy for VictoriaMetrics with the support of LDAP, SSO, RBAC, SAML,
accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.com/vmgateway.html).
## Load balancing
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
## Auth config
Auth config is represented in the following simple `yml` format:
`-auth.config` is represented in the following simple `yml` format:
```yml
@@ -36,43 +42,71 @@ Auth config is represented in the following simple `yml` format:
# Usernames must be unique.
users:
# Requests with the 'Authorization: Bearer XXXX' header are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- bearer_token: "XXXX"
url_prefix: "http://localhost:8428"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query
# will be proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is routed to http://vmselect:8481/select/123/prometheus/api/v1/select
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
- username: "cluster-select-account-123"
password: "***"
url_prefix: "http://vmselect:8481/select/123/prometheus"
url_prefix:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is routed to http://vminsert:8480/insert/42/prometheus/api/v1/write
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
- username: "cluster-insert-account-42"
password: "***"
url_prefix: "http://vminsert:8480/insert/42/prometheus"
url_prefix:
- "http://vminsert1:8480/insert/42/prometheus"
- "http://vminsert2:8480/insert/42/prometheus"
# A single user for querying and inserting data:
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are routed to http://vmselect:8481/select/42/prometheus.
# For example, http://vmauth:8427/api/v1/query is routed to http://vmselect:8480/select/42/prometheus/api/v1/query
# - Requests to http://vmauth:8427/api/v1/write are routed to http://vminsert:8480/insert/42/prometheus/api/v1/write
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/42/prometheus
# - http://vmselect2:8481/select/42/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "foobar"
url_map:
- src_paths: ["/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^/]+/values"]
url_prefix: "http://vmselect:8481/select/42/prometheus"
- src_paths:
- "/api/v1/query"
- "/api/v1/query_range"
- "/api/v1/label/[^/]+/values"
url_prefix:
- "http://vmselect1:8481/select/42/prometheus"
- "http://vmselect2:8481/select/42/prometheus"
- src_paths: ["/api/v1/write"]
url_prefix: "http://vminsert:8480/insert/42/prometheus"
```
@@ -96,11 +130,13 @@ Do not transfer Basic Auth headers in plaintext over untrusted networks. Enable
Alternatively, [https termination proxy](https://en.wikipedia.org/wiki/TLS_termination_proxy) may be put in front of `vmauth`.
It is recommended protecting `/-/reload` endpoint with `-reloadAuthKey` command-line flag, so external users couldn't trigger config reload.
## Monitoring
`vmauth` exports various metrics in Prometheus exposition format at `http://vmauth-host:8427/metrics` page. It is recommended setting up regular scraping of this page
either via [vmagent](https://victoriametrics.github.io/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
either via [vmagent](https://docs.victoriametrics.com/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
## How to build from sources
@@ -110,7 +146,7 @@ It is recommended using [binary releases](https://github.com/VictoriaMetrics/Vic
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmauth` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmauth` binary and puts it into the `bin` folder.
@@ -164,40 +200,42 @@ Pass `-help` command-line arg to `vmauth` in order to see all the configuration
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmauth.html .
See the docs at https://docs.victoriametrics.com/vmauth.html .
-auth.config string
Path to auth config. See https://victoriametrics.github.io/vmauth.html for details on the format of this auth config
Path to auth config. See https://docs.victoriametrics.com/vmauth.html for details on the format of this auth config
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help spreading incoming load among a cluster of services behind load balancer. Note that the real timeout may be bigger by up to 10% as a protection from Thundering herd problem (default 2m0s)
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for graceful shutdown of HTTP server. Highly loaded server may require increased value for graceful shutdown (default 7s)
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealy the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8427")
-logInvalidAuthTokens
Whether to log requests with invalid auth tokens. Such requests are always counted at vmagent_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
@@ -207,20 +245,24 @@ See the docs at https://victoriametrics.github.io/vmauth.html .
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxIdleConnsPerBackend int
The maximum number of idle connections vmauth can open per each backend host (default 100)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-reloadAuthKey string
Auth key for /-/reload http endpoint. It must be passed as authKey=...
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version

View File

@@ -1,11 +1,14 @@
package main
import (
"encoding/base64"
"flag"
"fmt"
"io/ioutil"
"net/url"
"os"
"regexp"
"strconv"
"strings"
"sync"
"sync/atomic"
@@ -18,7 +21,7 @@ import (
)
var (
authConfigPath = flag.String("auth.config", "", "Path to auth config. See https://victoriametrics.github.io/vmauth.html "+
authConfigPath = flag.String("auth.config", "", "Path to auth config. See https://docs.victoriametrics.com/vmauth.html "+
"for details on the format of this auth config")
)
@@ -29,10 +32,11 @@ type AuthConfig struct {
// UserInfo is user information read from authConfigPath
type UserInfo struct {
Username string `yaml:"username"`
Password string `yaml:"password"`
URLPrefix string `yaml:"url_prefix"`
URLMap []URLMap `yaml:"url_map"`
BearerToken string `yaml:"bearer_token"`
Username string `yaml:"username"`
Password string `yaml:"password"`
URLPrefix *URLPrefix `yaml:"url_prefix"`
URLMap []URLMap `yaml:"url_map"`
requests *metrics.Counter
}
@@ -40,7 +44,7 @@ type UserInfo struct {
// URLMap is a mapping from source paths to target urls.
type URLMap struct {
SrcPaths []*SrcPath `yaml:"src_paths"`
URLPrefix string `yaml:"url_prefix"`
URLPrefix *URLPrefix `yaml:"url_prefix"`
}
// SrcPath represents an src path
@@ -49,6 +53,76 @@ type SrcPath struct {
re *regexp.Regexp
}
// URLPrefix represents pased `url_prefix`
type URLPrefix struct {
n uint32
urls []*url.URL
}
func (up *URLPrefix) getNextURL() *url.URL {
n := atomic.AddUint32(&up.n, 1)
idx := n % uint32(len(up.urls))
return up.urls[idx]
}
// UnmarshalYAML unmarshals up from yaml.
func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
var v interface{}
if err := f(&v); err != nil {
return err
}
var urls []string
switch x := v.(type) {
case string:
urls = []string{x}
case []interface{}:
if len(x) == 0 {
return fmt.Errorf("`url_prefix` must contain at least a single url")
}
us := make([]string, len(x))
for i, xx := range x {
s, ok := xx.(string)
if !ok {
return fmt.Errorf("`url_prefix` must contain array of strings; got %T", xx)
}
us[i] = s
}
urls = us
default:
return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
}
pus := make([]*url.URL, len(urls))
for i, u := range urls {
pu, err := url.Parse(u)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
}
pus[i] = pu
}
up.urls = pus
return nil
}
// MarshalYAML marshals up to yaml.
func (up *URLPrefix) MarshalYAML() (interface{}, error) {
var b []byte
if len(up.urls) == 1 {
u := up.urls[0].String()
b = strconv.AppendQuote(b, u)
return string(b), nil
}
b = append(b, '[')
for i, pu := range up.urls {
u := pu.String()
b = strconv.AppendQuote(b, u)
if i+1 < len(up.urls) {
b = append(b, ',')
}
}
b = append(b, ']')
return string(b), nil
}
func (sp *SrcPath) match(s string) bool {
prefix, ok := sp.re.LiteralPrefix()
if ok {
@@ -86,6 +160,12 @@ func initAuthConfig() {
if len(*authConfigPath) == 0 {
logger.Fatalf("missing required `-auth.config` command-line flag")
}
// Register SIGHUP handler for config re-read just before readAuthConfig call.
// This guarantees that the config will be re-read if the signal arrives during readAuthConfig call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
m, err := readAuthConfig(*authConfigPath)
if err != nil {
logger.Fatalf("cannot load auth config from `-auth.config=%s`: %s", *authConfigPath, err)
@@ -95,7 +175,7 @@ func initAuthConfig() {
authConfigWG.Add(1)
go func() {
defer authConfigWG.Done()
authConfigReloader()
authConfigReloader(sighupCh)
}()
}
@@ -104,8 +184,7 @@ func stopAuthConfig() {
authConfigWG.Wait()
}
func authConfigReloader() {
sighupCh := procutil.NewSighupChan()
func authConfigReloader(sighupCh <-chan os.Signal) {
for {
select {
case <-stopCh:
@@ -150,53 +229,93 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
if len(uis) == 0 {
return nil, fmt.Errorf("`users` section cannot be empty in AuthConfig")
}
m := make(map[string]*UserInfo, len(uis))
byAuthToken := make(map[string]*UserInfo, len(uis))
byUsername := make(map[string]bool, len(uis))
byBearerToken := make(map[string]bool, len(uis))
for i := range uis {
ui := &uis[i]
if m[ui.Username] != nil {
if ui.BearerToken == "" && ui.Username == "" {
return nil, fmt.Errorf("either bearer_token or username must be set")
}
if ui.BearerToken != "" && ui.Username != "" {
return nil, fmt.Errorf("bearer_token=%q and username=%q cannot be set simultaneously", ui.BearerToken, ui.Username)
}
if byBearerToken[ui.BearerToken] {
return nil, fmt.Errorf("duplicate bearer_token found; bearer_token: %q", ui.BearerToken)
}
if byUsername[ui.Username] {
return nil, fmt.Errorf("duplicate username found; username: %q", ui.Username)
}
if len(ui.URLPrefix) > 0 {
urlPrefix, err := sanitizeURLPrefix(ui.URLPrefix)
if err != nil {
authToken := getAuthToken(ui.BearerToken, ui.Username, ui.Password)
if byAuthToken[authToken] != nil {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", authToken, ui.BearerToken, ui.Username)
}
if ui.URLPrefix != nil {
if err := ui.URLPrefix.sanitize(); err != nil {
return nil, err
}
ui.URLPrefix = urlPrefix
}
for _, e := range ui.URLMap {
if len(e.SrcPaths) == 0 {
return nil, fmt.Errorf("missing `src_paths`")
return nil, fmt.Errorf("missing `src_paths` in `url_map`")
}
urlPrefix, err := sanitizeURLPrefix(e.URLPrefix)
if err != nil {
if e.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix` in `url_map`")
}
if err := e.URLPrefix.sanitize(); err != nil {
return nil, err
}
e.URLPrefix = urlPrefix
}
if len(ui.URLMap) == 0 && len(ui.URLPrefix) == 0 {
if len(ui.URLMap) == 0 && ui.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix`")
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, ui.Username))
m[ui.Username] = ui
if ui.BearerToken != "" {
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(`vmauth_user_requests_total{username="bearer_token"}`)
byBearerToken[ui.BearerToken] = true
}
if ui.Username != "" {
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, ui.Username))
byUsername[ui.Username] = true
}
byAuthToken[authToken] = ui
}
return m, nil
return byAuthToken, nil
}
func sanitizeURLPrefix(urlPrefix string) (string, error) {
func getAuthToken(bearerToken, username, password string) string {
if bearerToken != "" {
return "Bearer " + bearerToken
}
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
}
func (up *URLPrefix) sanitize() error {
for i, pu := range up.urls {
puNew, err := sanitizeURLPrefix(pu)
if err != nil {
return err
}
up.urls[i] = puNew
}
return nil
}
func sanitizeURLPrefix(urlPrefix *url.URL) (*url.URL, error) {
// Remove trailing '/' from urlPrefix
for strings.HasSuffix(urlPrefix, "/") {
urlPrefix = urlPrefix[:len(urlPrefix)-1]
for strings.HasSuffix(urlPrefix.Path, "/") {
urlPrefix.Path = urlPrefix.Path[:len(urlPrefix.Path)-1]
}
// Validate urlPrefix
target, err := url.Parse(urlPrefix)
if err != nil {
return "", fmt.Errorf("invalid `url_prefix: %q`: %w", urlPrefix, err)
if urlPrefix.Scheme != "http" && urlPrefix.Scheme != "https" {
return nil, fmt.Errorf("unsupported scheme for `url_prefix: %q`: %q; must be `http` or `https`", urlPrefix, urlPrefix.Scheme)
}
if target.Scheme != "http" && target.Scheme != "https" {
return "", fmt.Errorf("unsupported scheme for `url_prefix: %q`: %q; must be `http` or `https`", urlPrefix, target.Scheme)
}
if target.Host == "" {
return "", fmt.Errorf("missing hostname in `url_prefix %q`", urlPrefix)
if urlPrefix.Host == "" {
return nil, fmt.Errorf("missing hostname in `url_prefix %q`", urlPrefix.Host)
}
return urlPrefix, nil
}

View File

@@ -3,6 +3,7 @@ package main
import (
"bytes"
"fmt"
"net/url"
"regexp"
"testing"
@@ -55,6 +56,41 @@ users:
- username: foo
url_prefix: http:///bar
`)
f(`
users:
- username: foo
url_prefix:
bar: baz
`)
f(`
users:
- username: foo
url_prefix:
- [foo]
`)
// empty url_prefix
f(`
users:
- username: foo
url_prefix: []
`)
// Username and bearer_token in a single config
f(`
users:
- username: foo
bearer_token: bbb
url_prefix: http://foo.bar
`)
// Bearer_token and password in a single config
f(`
users:
- password: foo
bearer_token: bbb
url_prefix: http://foo.bar
`)
// Duplicate users
f(`
@@ -67,6 +103,17 @@ users:
url_prefix: https://sss.sss
`)
// Duplicate bearer_tokens
f(`
users:
- bearer_token: foo
url_prefix: http://foo.bar
- username: bar
url_prefix: http://xxx.yyy
- bearer_token: foo
url_prefix: https://sss.sss
`)
// Missing url_prefix in url_map
f(`
users:
@@ -75,6 +122,24 @@ users:
- src_paths: ["/foo/bar"]
`)
// Invalid url_prefix in url_map
f(`
users:
- username: a
url_map:
- src_paths: ["/foo/bar"]
url_prefix: foo.bar
`)
// empty url_prefix in url_map
f(`
users:
- username: a
url_map:
- src_paths: ['/foo/bar']
url_prefix: []
`)
// Missing src_paths in url_map
f(`
users:
@@ -113,10 +178,29 @@ users:
password: bar
url_prefix: http://aaa:343/bbb
`, map[string]*UserInfo{
"foo": {
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: "http://aaa:343/bbb",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
},
})
// Multiple url_prefix entries
f(`
users:
- username: foo
password: bar
url_prefix:
- http://node1:343/bbb
- http://node2:343/bbb
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURLs([]string{
"http://node1:343/bbb",
"http://node2:343/bbb",
}),
},
})
@@ -128,36 +212,39 @@ users:
- username: bar
url_prefix: https://bar/x///
`, map[string]*UserInfo{
"foo": {
getAuthToken("", "foo", ""): {
Username: "foo",
URLPrefix: "http://foo",
URLPrefix: mustParseURL("http://foo"),
},
"bar": {
getAuthToken("", "bar", ""): {
Username: "bar",
URLPrefix: "https://bar/x",
URLPrefix: mustParseURL("https://bar/x"),
},
})
// non-empty URLMap
f(`
users:
- username: foo
- bearer_token: foo
url_map:
- src_paths: ["/api/v1/query","/api/v1/query_range","/api/v1/label/[^./]+/.+"]
url_prefix: http://vmselect/select/0/prometheus
- src_paths: ["/api/v1/write"]
url_prefix: http://vminsert/insert/0/prometheus
url_prefix: ["http://vminsert1/insert/0/prometheus","http://vminsert2/insert/0/prometheus"]
`, map[string]*UserInfo{
"foo": {
Username: "foo",
getAuthToken("foo", "", ""): {
BearerToken: "foo",
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^./]+/.+"}),
URLPrefix: "http://vmselect/select/0/prometheus",
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: "http://vminsert/insert/0/prometheus",
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
},
},
},
@@ -195,3 +282,21 @@ func areEqualConfigs(a, b map[string]*UserInfo) error {
}
return nil
}
func mustParseURL(u string) *URLPrefix {
return mustParseURLs([]string{u})
}
func mustParseURLs(us []string) *URLPrefix {
pus := make([]*url.URL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
pus[i] = pu
}
return &URLPrefix{
urls: pus,
}
}

View File

@@ -2,30 +2,70 @@
# Usernames must be unique.
users:
# Requests with the 'Authorization: Bearer XXXX' header are proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- bearer_token: "XXXX"
url_prefix: "http://localhost:8428"
# The user for querying local single-node VictoriaMetrics.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query
# will be proxied to http://localhost:8428 .
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is routed to http://vmselect:8481/select/123/prometheus/api/v1/select
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
- username: "local-single-node"
password: "***"
url_prefix: "http://localhost:8428?extra_label=team=dev"
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
- username: "cluster-select-account-123"
password: "***"
url_prefix: "http://vmselect:8481/select/123/prometheus"
url_prefix:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://victoriametrics.github.io/Cluster-VictoriaMetrics.html#url-format
# All the reuqests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be routed to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is routed to http://vminsert:8480/insert/42/prometheus/api/v1/write
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
- username: "cluster-insert-account-42"
password: "***"
url_prefix: "http://vminsert:8480/insert/42/prometheus"
url_prefix:
- "http://vminsert1:8480/insert/42/prometheus"
- "http://vminsert2:8480/insert/42/prometheus"
# A single user for querying and inserting data:
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/42/prometheus
# - http://vmselect2:8481/select/42/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "foobar"
url_map:
- src_paths:
- "/api/v1/query"
- "/api/v1/query_range"
- "/api/v1/label/[^/]+/values"
url_prefix:
- "http://vmselect1:8481/select/42/prometheus"
- "http://vmselect2:8481/select/42/prometheus"
- src_paths: ["/api/v1/write"]
url_prefix: "http://vminsert:8480/insert/42/prometheus"

View File

@@ -2,6 +2,7 @@ package main
import (
"flag"
"fmt"
"net/http"
"net/http/httputil"
"net/url"
@@ -14,10 +15,15 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmagent_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
)
func main() {
@@ -47,16 +53,34 @@ func main() {
}
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
username, password, ok := r.BasicAuth()
if !ok {
switch r.URL.Path {
case "/-/reload":
authKey := r.FormValue("authKey")
if authKey != *reloadAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -reloadAuthKey command line flag", authKey)
return true
}
configReloadRequests.Inc()
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
return true
}
authToken := r.Header.Get("Authorization")
if authToken == "" {
w.Header().Set("WWW-Authenticate", `Basic realm="Restricted"`)
http.Error(w, "missing `Authorization: Basic *` header", http.StatusUnauthorized)
http.Error(w, "missing `Authorization` request header", http.StatusUnauthorized)
return true
}
ac := authConfig.Load().(map[string]*UserInfo)
ui := ac[username]
if ui == nil || ui.Password != password {
httpserver.Errorf(w, r, "cannot find the provided username %q or password in config", username)
ui := ac[authToken]
if ui == nil {
invalidAuthTokenRequests.Inc()
if *logInvalidAuthTokens {
httpserver.Errorf(w, r, "cannot find the provided auth token %q in config", authToken)
} else {
errStr := fmt.Sprintf("cannot find the provided auth token %q in config", authToken)
http.Error(w, errStr, http.StatusBadRequest)
}
return true
}
ui.requests.Inc()
@@ -65,15 +89,31 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.Errorf(w, r, "cannot determine targetURL: %s", err)
return true
}
if _, err := url.Parse(targetURL); err != nil {
httpserver.Errorf(w, r, "invalid targetURL=%q: %s", targetURL, err)
return true
}
r.Header.Set("vm-target-url", targetURL)
reverseProxy.ServeHTTP(w, r)
r.Header.Set("vm-target-url", targetURL.String())
proxyRequest(w, r)
return true
}
func proxyRequest(w http.ResponseWriter, r *http.Request) {
defer func() {
err := recover()
if err == nil || err == http.ErrAbortHandler {
// Suppress http.ErrAbortHandler panic.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
return
}
// Forward other panics to the caller.
panic(err)
}()
reverseProxy.ServeHTTP(w, r)
}
var (
configReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`)
invalidAuthTokenRequests = metrics.NewCounter(`vmagent_http_request_errors_total{reason="invalid_auth_token"}`)
missingRouteRequests = metrics.NewCounter(`vmagent_http_request_errors_total{reason="missing_route"}`)
)
var reverseProxy = &httputil.ReverseProxy{
Director: func(r *http.Request) {
targetURL := r.Header.Get("vm-target-url")
@@ -89,6 +129,7 @@ var reverseProxy = &httputil.ReverseProxy{
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
return tr
}(),
FlushInterval: time.Second,
@@ -99,7 +140,7 @@ func usage() {
const s = `
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
See the docs at https://victoriametrics.github.io/vmauth.html .
See the docs at https://docs.victoriametrics.com/vmauth.html .
`
flagutil.Usage(s)
}

View File

@@ -7,25 +7,52 @@ import (
"strings"
)
func createTargetURL(ui *UserInfo, uOrig *url.URL) (string, error) {
u, err := url.Parse(uOrig.String())
if err != nil {
return "", fmt.Errorf("cannot make a copy of %q: %w", u, err)
func (up *URLPrefix) mergeURLs(requestURI *url.URL) *url.URL {
pu := up.getNextURL()
return mergeURLs(pu, requestURI)
}
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
targetURL := *uiURL
targetURL.Path += requestURI.Path
requestParams := requestURI.Query()
// fast path
if len(requestParams) == 0 {
return &targetURL
}
// merge query parameters from requests.
uiParams := targetURL.Query()
for k, v := range requestParams {
// skip clashed query params from original request
if exist := uiParams.Get(k); len(exist) > 0 {
continue
}
for i := range v {
uiParams.Add(k, v[i])
}
}
targetURL.RawQuery = uiParams.Encode()
return &targetURL
}
func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, error) {
u := *uOrig
// Prevent from attacks with using `..` in r.URL.Path
u.Path = path.Clean(u.Path)
if !strings.HasPrefix(u.Path, "/") {
u.Path = "/" + u.Path
}
u.Path = strings.TrimSuffix(u.Path, "/")
for _, e := range ui.URLMap {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return e.URLPrefix + u.RequestURI(), nil
return e.URLPrefix.mergeURLs(&u), nil
}
}
}
if len(ui.URLPrefix) > 0 {
return ui.URLPrefix + u.RequestURI(), nil
if ui.URLPrefix != nil {
return ui.URLPrefix.mergeURLs(&u), nil
}
return "", fmt.Errorf("missing route for %q", u)
missingRouteRequests.Inc()
return nil, fmt.Errorf("missing route for %q", u.String())
}

View File

@@ -16,43 +16,46 @@ func TestCreateTargetURLSuccess(t *testing.T) {
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
if target != expectedTarget {
if target.String() != expectedTarget {
t.Fatalf("unexpected target; got %q; want %q", target, expectedTarget)
}
}
// Simple routing with `url_prefix`
f(&UserInfo{
URLPrefix: "http://foo.bar",
URLPrefix: mustParseURL("http://foo.bar"),
}, "", "http://foo.bar/.")
f(&UserInfo{
URLPrefix: "http://foo.bar",
}, "/", "http://foo.bar/")
URLPrefix: mustParseURL("http://foo.bar"),
}, "/", "http://foo.bar")
f(&UserInfo{
URLPrefix: "http://foo.bar",
URLPrefix: mustParseURL("http://foo.bar/federate"),
}, "/", "http://foo.bar/federate")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar"),
}, "a/b?c=d", "http://foo.bar/a/b?c=d")
f(&UserInfo{
URLPrefix: "https://sss:3894/x/y",
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/z", "https://sss:3894/x/y/z")
f(&UserInfo{
URLPrefix: "https://sss:3894/x/y",
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/../../aaa", "https://sss:3894/x/y/aaa")
f(&UserInfo{
URLPrefix: "https://sss:3894/x/y",
}, "/./asd/../../aaa?a=d&s=s/../d", "https://sss:3894/x/y/aaa?a=d&s=s/../d")
URLPrefix: mustParseURL("https://sss:3894/x/y"),
}, "/./asd/../../aaa?a=d&s=s/../d", "https://sss:3894/x/y/aaa?a=d&s=s%2F..%2Fd")
// Complex routing with `url_map`
ui := &UserInfo{
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: "http://vmselect/0/prometheus",
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: "http://vminsert/0/prometheus",
URLPrefix: mustParseURL("http://vminsert/0/prometheus"),
},
},
URLPrefix: "http://default-server",
URLPrefix: mustParseURL("http://default-server"),
}
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up")
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write")
@@ -63,20 +66,27 @@ func TestCreateTargetURLSuccess(t *testing.T) {
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query(_range)?", "/api/v1/label/[^/]+/values"}),
URLPrefix: "http://vmselect/0/prometheus",
URLPrefix: mustParseURL("http://vmselect/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: "http://vminsert/0/prometheus",
URLPrefix: mustParseURL("http://vminsert/0/prometheus"),
},
},
URLPrefix: "http://default-server",
URLPrefix: mustParseURL("http://default-server"),
}
f(ui, "/api/v1/query?query=up", "http://vmselect/0/prometheus/api/v1/query?query=up")
f(ui, "/api/v1/query_range?query=up", "http://vmselect/0/prometheus/api/v1/query_range?query=up")
f(ui, "/api/v1/label/foo/values", "http://vmselect/0/prometheus/api/v1/label/foo/values")
f(ui, "/api/v1/write", "http://vminsert/0/prometheus/api/v1/write")
f(ui, "/api/v1/foo/bar", "http://default-server/api/v1/foo/bar")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar?extra_label=team=dev"),
}, "/api/v1/query", "http://foo.bar/api/v1/query?extra_label=team=dev")
f(&UserInfo{
URLPrefix: mustParseURL("http://foo.bar?extra_label=team=mobile"),
}, "/api/v1/query?extra_label=team=dev", "http://foo.bar/api/v1/query?extra_label=team%3Dmobile")
}
func TestCreateTargetURLFailure(t *testing.T) {
@@ -90,7 +100,7 @@ func TestCreateTargetURLFailure(t *testing.T) {
if err == nil {
t.Fatalf("expecting non-nil error")
}
if target != "" {
if target != nil {
t.Fatalf("unexpected target=%q; want empty string", target)
}
}
@@ -99,7 +109,7 @@ func TestCreateTargetURLFailure(t *testing.T) {
URLMap: []URLMap{
{
SrcPaths: getSrcPaths([]string{"/api/v1/query"}),
URLPrefix: "http://foobar/baz",
URLPrefix: mustParseURL("http://foobar/baz"),
},
},
}, "/api/v1/write")

View File

@@ -1,6 +1,6 @@
## vmbackup
# vmbackup
`vmbackup` creates VictoriaMetrics data backups from [instant snapshots](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
`vmbackup` creates VictoriaMetrics data backups from [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
Supported storage systems for backups:
@@ -15,11 +15,11 @@ data between the existing backup and new backup. It saves time and costs on data
Backup process can be interrupted at any time. It is automatically resumed from the interruption point when restarting `vmbackup` with the same args.
Backed up data can be restored with [vmrestore](https://victoriametrics.github.io/vmrestore.html).
Backed up data can be restored with [vmrestore](https://docs.victoriametrics.com/vmrestore.html).
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
See also [vmbackupmanager](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) tool built on top of `vmbackup`. This tool simplifies
See also [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager.html) tool built on top of `vmbackup`. This tool simplifies
creation of hourly, daily, weekly and monthly backups.
@@ -34,8 +34,8 @@ vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-
```
* `</path/to/victoria-metrics-data>` - path to VictoriaMetrics data pointed by `-storageDataPath` command-line flag in single-node VictoriaMetrics or in cluster `vmstorage`.
There is no need to stop VictoriaMetrics for creating backups, since they are performed from immutable [instant snapshots](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
There is no need to stop VictoriaMetrics for creating backups, since they are performed from immutable [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots). `vmbackup` can create the snapshot on itself if `-snapshot.createURL` command-line flag is set to an url for creating snapshots. In this case `-snapshotName` flag isn't needed.
* `<bucket>` is an already existing name for [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets).
* `<path/to/new/backup>` is the destination path where new backup will be placed.
@@ -72,7 +72,7 @@ Smart backups mean storing full daily backups into `YYYYMMDD` folders and creati
vmbackup -snapshotName=<latest-snapshot> -dst=gcs://<bucket>/latest
```
Where `<latest-snapshot>` is the latest [snapshot](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
Where `<latest-snapshot>` is the latest [snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
The command will upload only changed data to `gcs://<bucket>/latest`.
* Run the following command once a day:
@@ -89,7 +89,7 @@ or from any day (`YYYYMMDD` backups). Note that hourly backup shouldn't run when
Do not forget removing old snapshots and backups when they are no longer needed for saving storage costs.
See also [vmbackupmanager tool](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/466) for automating smart backups.
See also [vmbackupmanager tool](https://docs.victoriametrics.com/vmbackupmanager.html) for automating smart backups.
## How does it work?
@@ -123,8 +123,8 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
* If the backup is slow, then try setting higher value for `-concurrency` flag. This will increase the number of concurrent workers that upload data to backup storage.
* If `vmbackup` eats all the network bandwidth, then set `-maxBytesPerSecond` to the desired value.
* If `vmbackup` has been interrupted due to temporary error, then just restart it with the same args. It will resume the backup process.
* Backups created from [single-node VictoriaMetrics](https://victoriametrics.github.io/Single-server-VictoriaMetrics.html) cannot be restored
at [cluster VictoriaMetrics](https://victoriametrics.github.io/Cluster-VictoriaMetrics.html) and vice versa.
* Backups created from [single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) cannot be restored
at [cluster VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) and vice versa.
## Advanced usage
@@ -186,7 +186,7 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
Where to put the backup on the remote storage. Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir
-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-fs.disableMmap
@@ -194,7 +194,7 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, then the remaining errors are suppressed. Zero value disables the rate limit
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
@@ -204,23 +204,23 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero value disables the rate limit
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxBytesPerSecond size
The maximum upload speed. There is no limit if it is set to 0
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to non-zero value. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low value may increase cache miss rate, which usually results in higher CPU and disk IO usage. Too high value may evict too much data from OS page cache, which will result in higher disk IO usage (default 60)
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-origin string
Optional origin directory on the remote storage with old backup for server-side copying when performing full backup. This speeds up full backups
-snapshot.createURL string
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snaphsot/create
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set
-snapshot.deleteURL string
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snaphosts will be automatically deleted. Example: http://victoriametrics:8428/snaphsot/delete
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete
-snapshotName string
Name for the snapshot to backup. See https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots
Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set
-storageDataPath string
Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data")
-version
@@ -235,7 +235,7 @@ It is recommended using [binary releases](https://github.com/VictoriaMetrics/Vic
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmbackup` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmbackup` binary and puts it into the `bin` folder.

View File

@@ -19,11 +19,11 @@ import (
var (
storageDataPath = flag.String("storageDataPath", "victoria-metrics-data", "Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage")
snapshotName = flag.String("snapshotName", "", "Name for the snapshot to backup. See https://victoriametrics.github.io/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots")
snapshotName = flag.String("snapshotName", "", "Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set")
snapshotCreateURL = flag.String("snapshot.createURL", "", "VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. "+
"Example: http://victoriametrics:8428/snaphsot/create")
"Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set")
snapshotDeleteURL = flag.String("snapshot.deleteURL", "", "VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. "+
"All created snaphosts will be automatically deleted. Example: http://victoriametrics:8428/snaphsot/delete")
"All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete")
dst = flag.String("dst", "", "Where to put the backup on the remote storage. "+
"Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir\n"+
"-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded")
@@ -41,7 +41,9 @@ func main() {
logger.Init()
if len(*snapshotCreateURL) > 0 {
logger.Infof("Snapshots enabled")
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", *snapshotCreateURL)
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
@@ -99,7 +101,7 @@ func usage() {
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3
or local filesystem. Backed up data can be restored with vmrestore.
See the docs at https://victoriametrics.github.io/vbackup.html .
See the docs at https://docs.victoriametrics.com/vmbackup.html .
`
flagutil.Usage(s)
}

View File

@@ -1,13 +1,15 @@
## Victoria Metrics Backup Manager
## vmbackupmanager
This service automates regular backup procedures. It supports the following backup intervals: **hourly**, **daily**, **weekly** and **monthly**. Multiple backup intervals may be configured simultaneously. I.e. the backup manager creates hourly backups every hour, while it creates daily backups every day, etc. Backup manager must have read access to the storage data, so best practice is to install it on the same machine (or as a sidecar) where the storage node is installed.
***vmbackupmanager is a part of [enterprise package](https://victoriametrics.com/enterprise.html). It is available for download and evaluation at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)***
The VictoriaMetrics backup manager automates regular backup procedures. It supports the following backup intervals: **hourly**, **daily**, **weekly** and **monthly**. Multiple backup intervals may be configured simultaneously. I.e. the backup manager creates hourly backups every hour, while it creates daily backups every day, etc. Backup manager must have read access to the storage data, so best practice is to install it on the same machine (or as a sidecar) where the storage node is installed.
The backup service makes a backup every hour and puts it to the latest folder and then copies data to the folders which represent the backup intervals (hourly, daily, weekly and monthly)
The required flags for running the service are as follows:
* -eula - should be true and means that you have the legal right to run a backup manager. That can either be a signed contract or an email with confirmation to run the service in a trial period
* -storageDataPath - path to VictoriaMetrics or vmstorage data path to make backup from
* -snapshot.createURL - VictoriaMetrics creates snapshot URL which will automatically be created during backup. Example: http://victoriametrics:8428/snaphsot/create
* -snapshot.createURL - VictoriaMetrics creates snapshot URL which will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create
* -dst - backup destination at s3, gcs or local filesystem
* -credsFilePath - path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. See [https://cloud.google.com/iam/docs/creating-managing-service-account-keys](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and [https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html](https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html)
@@ -47,7 +49,7 @@ There are two flags which could help with performance tuning:
* -concurrency - The number of concurrent workers. Higher concurrency may improve upload speed (default 10)
### Example of Usage
## Example of Usage
GCS and cluster version. You need to have a credentials file in json format with following structure

View File

@@ -1,92 +1,178 @@
# vmctl
Victoria metrics command-line tool
VictoriaMetrics command-line tool
Features:
- [x] Prometheus: migrate data from Prometheus to VictoriaMetrics using snapshot API
- [x] Thanos: migrate data from Thanos to VictoriaMetrics
- [ ] ~~Prometheus: migrate data from Prometheus to VictoriaMetrics by query~~(discarded)
- [x] InfluxDB: migrate data from InfluxDB to VictoriaMetrics
- [x] OpenTSDB: migrate data from OpenTSDB to VictoriaMetrics
- [ ] Storage Management: data re-balancing between nodes
# Table of contents
vmctl acts as a proxy between data source ([Prometheus](#migrating-data-from-prometheus),
[InfluxDB](#migrating-data-from-influxdb-1x), [VictoriaMetrics](##migrating-data-from-victoriametrics), etc.)
and destination - VictoriaMetrics single or cluster version. To see the full list of supported modes
run the following command:
```
./vmctl --help
NAME:
vmctl - VictoriaMetrics command-line tool
* [Articles](#articles)
* [How to build](#how-to-build)
* [Migrating data from InfluxDB 1.x](#migrating-data-from-influxdb-1x)
* [Data mapping](#data-mapping)
* [Configuration](#configuration)
* [Filtering](#filtering)
* [Migrating data from InfluxDB 2.x](#migrating-data-from-influxdb-2x)
* [Migrating data from Prometheus](#migrating-data-from-prometheus)
* [Data mapping](#data-mapping-1)
* [Configuration](#configuration-1)
* [Filtering](#filtering-1)
* [Migrating data from Thanos](#migrating-data-from-thanos)
* [Current data](#current-data)
* [Historical data](#historical-data)
* [Migrating data from VictoriaMetrics](#migrating-data-from-victoriametrics)
* [Native protocol](#native-protocol)
* [Tuning](#tuning)
* [Influx mode](#influx-mode)
* [Prometheus mode](#prometheus-mode)
* [VictoriaMetrics importer](#victoriametrics-importer)
* [Importer stats](#importer-stats)
* [Significant figures](#significant-figures)
* [Adding extra labels](#adding-extra-labels)
USAGE:
vmctl [global options] command [command options] [arguments...]
COMMANDS:
opentsdb Migrate timeseries from OpenTSDB
influx Migrate timeseries from InfluxDB
prometheus Migrate timeseries from Prometheus
vm-native Migrate time series between VictoriaMetrics installations via native binary format
```
Each mode has its own unique set of flags specific (e.g. prefixed with `influx` for influx mode)
to the data source and common list of flags for destination (prefixed with `vm` for VictoriaMetrics):
```
./vmctl influx --help
OPTIONS:
--influx-addr value InfluxDB server addr (default: "http://localhost:8086")
--influx-user value InfluxDB user [$INFLUX_USERNAME]
...
--vm-addr vmctl VictoriaMetrics address to perform import requests.
Should be the same as --httpListenAddr value for single-node version or vminsert component.
When importing into the clustered version do not forget to set additionally --vm-account-id flag.
Please note, that vmctl performs initial readiness check for the given address by checking `/health` endpoint. (default: "http://localhost:8428")
--vm-user value VictoriaMetrics username for basic auth [$VM_USERNAME]
--vm-password value VictoriaMetrics password for basic auth [$VM_PASSWORD]
```
When doing a migration user needs to specify flags for source (where and how to fetch data) and for
destination (where to migrate data). Every mode has additional details and nuances, please see
them below in corresponding sections.
For the destination flags see the full description by running the following command:
```
./vmctl influx --help | grep vm-
```
Some flags like [--vm-extra-label](#adding-extra-labels) or [--vm-significant-figures](#significant-figures)
has additional sections with description below. Details about tweaking and adjusting settings
are explained in [Tuning](#tuning) section.
Please note, that if you're going to import data into VictoriaMetrics cluster do not
forget to specify the `--vm-account-id` flag. See more details for cluster version
[here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
## Articles
* [How to migrate data from Prometheus](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043)
* [How to migrate data from Prometheus. Filtering and modifying time series](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-filtering-and-modifying-time-series-6d40cea4bf21)
## How to build
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmctl` is located in `vmutils-*` archives there.
## Migrating data from OpenTSDB
`vmctl` supports the `opentsdb` mode to migrate data from OpenTSDB to VictoriaMetrics time-series database.
### Development build
See `./vmctl opentsdb --help` for details and full list of flags.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
2. Run `make vmctl` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl` binary and puts it into the `bin` folder.
*OpenTSDB migration is not possible without a functioning [meta](http://opentsdb.net/docs/build/html/user_guide/metadata.html) table to search for metrics/series.*
### Production build
OpenTSDB migration works like so:
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmctl-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-prod` binary and puts it into the `bin` folder.
1. Find metrics based on selected filters (or the default filter set ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'])
* e.g. `curl -Ss "http://opentsdb:4242/api/suggest?type=metrics&q=sys"`
2. Find series associated with each returned metric
* e.g. `curl -Ss "http://opentsdb:4242/api/search/lookup?m=system.load5&limit=1000000"`
3. Download data for each series in chunks defined in the CLI switches
* e.g. `-retention=sum-1m-avg:1h:90d` ==
* `curl -Ss "http://opentsdb:4242/api/query?start=1h-ago&end=now&m=sum:1m-avg-none:system.load5\{host=host1\}"`
* `curl -Ss "http://opentsdb:4242/api/query?start=2h-ago&end=1h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
* `curl -Ss "http://opentsdb:4242/api/query?start=3h-ago&end=2h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
* ...
* `curl -Ss "http://opentsdb:4242/api/query?start=2160h-ago&end=2159h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
### Building docker images
This means that we must stream data from OpenTSDB to VictoriaMetrics in chunks. This is where concurrency for OpenTSDB comes in. We can query multiple chunks at once, but we shouldn't perform too many chunks at a time to avoid overloading the OpenTSDB cluster.
Run `make package-vmctl`. It builds `victoriametrics/vmctl:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmctl`.
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
```bash
ROOT_IMAGE=scratch make package-vmctl
```
$ bin/vmctl opentsdb --otsdb-addr http://opentsdb:4242/ --otsdb-retentions sum-1m-avg:1h:1d --otsdb-filters system --otsdb-normalize --vm-addr http://victoria/
OpenTSDB import mode
2021/04/09 11:52:50 Will collect data starting at TS 1617990770
2021/04/09 11:52:50 Loading all metrics from OpenTSDB for filters: [system]
Found 9 metrics to import. Continue? [Y/n]
2021/04/09 11:52:51 Starting work on system.load1
23 / 402200 [>____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________] 0.01% 2 p/s
```
### ARM build
### Retention strings
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
Starting with a relatively simple retention string (`sum-1m-avg:1h:30d`), let's describe how this is converted into actual queries.
#### Development ARM build
There are two essential parts of a retention string:
1. [aggregation](#aggregation)
2. [windows/time ranges](#windows)
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.14.
2. Run `make vmctl-arm` or `make vmctl-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm` or `vmctl-arm64` binary respectively and puts it into the `bin` folder.
#### Aggregation
#### Production ARM build
Retention strings essentially define the two levels of aggregation for our collected series.
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmctl-arm-prod` or `make vmctl-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm-prod` or `vmctl-arm64-prod` binary respectively and puts it into the `bin` folder.
`sum-1m-avg` would become:
* First order: `sum`
* Second order: `1m-avg-none`
##### First Order Aggregations
First-order aggregation addresses how to aggregate any un-mentioned tags.
This is, conceptually, directly opposite to how PromQL deals with tags. In OpenTSDB, if a tag isn't explicitly mentioned, all values assocaited with that tag will be aggregated.
It is recommended to use `sum` for the first aggregation because it is relatively quick and should not cause any changes to the incoming data (because we collect each individual series).
##### Second Order Aggregations
Second-order aggregation (`1m-avg` in our example) defines any windowing that should occur before returning the data
It is recommended to match the stat collection interval so we again avoid transforming incoming data.
We do not allow for defining the "null value" portion of the rollup window (e.g. in the aggreagtion, `1m-avg-none`, the user cannot change `none`), as the goal of this tool is to avoid modifying incoming data.
#### Windows
There are two important windows we define in a retention string:
1. the "chunk" range of each query
2. The time range we will be querying on with that "chunk"
From our example, our windows are `1h:30d`.
##### Window "chunks"
The window `1h` means that each individual query to OpenTSDB should only span 1 hour of time (e.g. `start=2h-ago&end=1h-ago`).
It is important to ensure this window somewhat matches the row size in HBase to help improve query times.
For example, if the query is hitting a rollup table with a 4 hour row size, we should set a chunk size of a multiple of 4 hours (e.g. `4h`, `8h`, etc.) to avoid requesting data across row boundaries. Landing on row boundaries allows for more consistent request times to HBase.
The default table created in HBase for OpenTSDB has a 1 hour row size, so if you aren't sure on a correct row size to use, `1h` is a reasonable choice.
##### Time range
The time range `30d` simply means we are asking for the last 30 days of data. This time range can be written using `h`, `d`, `w`, or `y`. (We can't use `m` for month because it already means `minute` in time parsing).
#### Results of retention string
The resultant queries that will be created, based on our example retention string of `sum-1m-avg:1h:30d` look like this:
```
http://opentsdb:4242/api/query?start=1h-ago&end=now&m=sum:1m-avg-none:<series>
http://opentsdb:4242/api/query?start=2h-ago&end=1h-ago&m=sum:1m-avg-none:<series>
http://opentsdb:4242/api/query?start=3h-ago&end=2h-ago&m=sum:1m-avg-none:<series>
...
http://opentsdb:4242/api/query?start=721h-ago&end=720h-ago&m=sum:1m-avg-none:<series>
```
Chunking the data like this means each individual query returns faster, so we can start populating data into VictoriaMetrics quicker.
### Restarting OpenTSDB migrations
One important note for OpenTSDB migration: Queries/HBase scans can "get stuck" within OpenTSDB itself. This can cause instability and performance issues within an OpenTSDB cluster, so stopping the migrator to deal with it may be necessary. Because of this, we provide the timstamp we started collecting data from at thebeginning of the run. You can stop and restart the importer using this "hard timestamp" to ensure you collect data from the same time range over multiple runs.
## Migrating data from InfluxDB (1.x)
@@ -96,7 +182,7 @@ See `./vmctl influx --help` for details and full list of flags.
To use migration tool please specify the InfluxDB address `--influx-addr`, the database `--influx-database` and VictoriaMetrics address `--vm-addr`.
Flag `--vm-addr` for single-node VM is usually equal to `--httpListenAddr`, and for cluster version
is equal to `--httpListenAddr` flag of VMInsert component. Please note, that vmctl performs initial readiness check for the given address
is equal to `--httpListenAddr` flag of vminsert component. Please note, that vmctl performs initial readiness check for the given address
by checking `/health` endpoint. For cluster version it is additionally required to specify the `--vm-account-id` flag.
See more details for cluster version [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
@@ -130,16 +216,16 @@ Found 40000 timeseries to import. Continue? [Y/n] y
### Data mapping
Vmctl maps Influx data the same way as VictoriaMetrics does by using the following rules:
Vmctl maps InfluxDB data the same way as VictoriaMetrics does by using the following rules:
* `influx-database` arg is mapped into `db` label value unless `db` tag exists in the Influx line.
* `influx-database` arg is mapped into `db` label value unless `db` tag exists in the InfluxDB line.
* Field names are mapped to time series names prefixed with {measurement}{separator} value,
where {separator} equals to _ by default.
It can be changed with `--influx-measurement-field-separator` command-line flag.
* Field values are mapped to time series values.
* Tags are mapped to Prometheus labels format as-is.
For example, the following Influx line:
For example, the following InfluxDB line:
```
foo,tag1=value1,tag2=value2 field1=12,field2=40
```
@@ -191,14 +277,14 @@ You may find useful a 3rd party solution for this - https://github.com/jonppe/in
## Migrating data from Prometheus
`vmctl` supports the `prometheus` mode for migrating data from Prometheus to VictoriaMetrics time-series database.
Migration is based on reading Prometheus snapshot, which is basically a hard-link to Prometheus data files.
Migration is based on reading Prometheus snapshot, which is basically a hard-link to Prometheus data files.
See `./vmctl prometheus --help` for details and full list of flags.
See `./vmctl prometheus --help` for details and full list of flags. Also see Prometheus related articles [here](#articles).
To use migration tool please specify the path to Prometheus snapshot `--prom-snapshot` and VictoriaMetrics address `--vm-addr`.
More about Prometheus snapshots may be found [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data).
To use migration tool please specify the file path to Prometheus snapshot `--prom-snapshot` (see how to make a snapshot [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data)) and VictoriaMetrics address `--vm-addr`.
Please note, that `vmctl` *do not make a snapshot from Prometheus*, it uses an already prepared snapshot. More about Prometheus snapshots may be found [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data) and [here](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043).
Flag `--vm-addr` for single-node VM is usually equal to `--httpListenAddr`, and for cluster version
is equal to `--httpListenAddr` flag of VMInsert component. Please note, that vmctl performs initial readiness check for the given address
is equal to `--httpListenAddr` flag of vminsert component. Please note, that vmctl performs initial readiness check for the given address
by checking `/health` endpoint. For cluster version it is additionally required to specify the `--vm-account-id` flag.
See more details for cluster version [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
@@ -208,7 +294,7 @@ if flags `--prom-filter-time-start` or `--prom-filter-time-end` were set. The ex
Please note that stats are not taking into account timeseries or samples filtering. This will be done during importing process.
The importing process takes the snapshot blocks revealed from Explore procedure and processes them one by one
accumulating timeseries and samples. Please note, that `vmctl` relies on responses from Influx on this stage,
accumulating timeseries and samples. Please note, that `vmctl` relies on responses from InfluxDB on this stage,
so ensure that Explore queries are executed without errors or limits. Please see this
[issue](https://github.com/VictoriaMetrics/vmctl/issues/30) for details.
The data processed in chunks and then sent to VM.
@@ -359,7 +445,7 @@ then import it into VM using `vmctl` in `prometheus` mode.
### Native protocol
The [native binary protocol](https://victoriametrics.github.io/#how-to-export-data-in-native-format)
The [native binary protocol](https://docs.victoriametrics.com/#how-to-export-data-in-native-format)
was introduced in [1.42.0 release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.42.0)
and provides the most efficient way to migrate data between VM instances: single to single, cluster to cluster,
single to cluster and vice versa. Please note that both instances (source and destination) should be of v1.42.0
@@ -393,11 +479,12 @@ To avoid such situation try to filter out VM process metrics via `--vm-native-fi
[Backfilling tips](https://github.com/VictoriaMetrics/VictoriaMetrics#backfilling) section.
3. `vmctl` doesn't provide relabeling or other types of labels management in this mode.
Instead, use [relabeling in VictoriaMetrics](https://github.com/VictoriaMetrics/vmctl/issues/4#issuecomment-683424375).
4. When importing in or from cluster version remember to use correct [URL format](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format)
and specify `accountID` param.
## Tuning
### Influx mode
### InfluxDB mode
The flag `--influx-concurrency` controls how many concurrent requests may be sent to InfluxDB while fetching
timeseries. Please set it wisely to avoid InfluxDB overwhelming.
@@ -472,4 +559,49 @@ results such as `average`, `rate`, etc.
`vmctl` allows to add extra labels to all imported series. It can be achived with flag `--vm-extra-label label=value`.
If multiple labels needs to be added, set flag for each label, for example, `--vm-extra-label label1=value1 --vm-extra-label label2=value2`.
If timeseries already have label, that must be added with `--vm-extra-label` flag, flag has priority and will override label value from timeseries.
## How to build
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmctl` is located in `vmutils-*` archives there.
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmctl` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl` binary and puts it into the `bin` folder.
### Production build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmctl-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-prod` binary and puts it into the `bin` folder.
### Building docker images
Run `make package-vmctl`. It builds `victoriametrics/vmctl:<PKG_TAG>` docker image locally.
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmctl`.
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
```bash
ROOT_IMAGE=scratch make package-vmctl
```
### ARM build
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
#### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmctl-arm` or `make vmctl-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm` or `vmctl-arm64` binary respectively and puts it into the `bin` folder.
#### Production ARM build
1. [Install docker](https://docs.docker.com/install/).
2. Run `make vmctl-arm-prod` or `make vmctl-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm-prod` or `vmctl-arm64-prod` binary respectively and puts it into the `bin` folder.

View File

@@ -39,7 +39,8 @@ var (
Name: vmAddr,
Value: "http://localhost:8428",
Usage: "VictoriaMetrics address to perform import requests. \n" +
"Should be the same as --httpListenAddr value for single-node version or VMInsert component. \n" +
"Should be the same as --httpListenAddr value for single-node version or vminsert component. \n" +
"When importing into the clustered version do not forget to set additionally --vm-account-id flag. \n" +
"Please note, that `vmctl` performs initial readiness check for the given address by checking `/health` endpoint.",
},
&cli.StringFlag{
@@ -55,6 +56,7 @@ var (
&cli.StringFlag{
Name: vmAccountID,
Usage: "AccountID is an arbitrary 32-bit integer identifying namespace for data ingestion (aka tenant). \n" +
"AccountID is required when importing into the clustered version of VictoriaMetrics. \n" +
"It is possible to set it as accountID:projectID, where projectID is also arbitrary 32-bit integer. \n" +
"If projectID isn't set, then it equals to 0",
},
@@ -96,6 +98,78 @@ var (
}
)
const (
otsdbAddr = "otsdb-addr"
otsdbConcurrency = "otsdb-concurrency"
otsdbQueryLimit = "otsdb-query-limit"
otsdbOffsetDays = "otsdb-offset-days"
otsdbHardTSStart = "otsdb-hard-ts-start"
otsdbRetentions = "otsdb-retentions"
otsdbFilters = "otsdb-filters"
otsdbNormalize = "otsdb-normalize"
otsdbMsecsTime = "otsdb-msecstime"
)
var (
otsdbFlags = []cli.Flag{
&cli.StringFlag{
Name: otsdbAddr,
Value: "http://localhost:4242",
Required: true,
Usage: "OpenTSDB server addr",
},
&cli.IntFlag{
Name: otsdbConcurrency,
Usage: "Number of concurrently running fetch queries to OpenTSDB per metric",
Value: 1,
},
&cli.StringSliceFlag{
Name: otsdbRetentions,
Value: nil,
Required: true,
Usage: "Retentions patterns to collect on. Each pattern should describe the aggregation performed " +
"for the query, the row size (in HBase) that will define how long each individual query is, " +
"and the time range to query for. e.g. sum-1m-avg:1h:3d. " +
"The first time range defined should be a multiple of the row size in HBase. " +
"e.g. if the row size is 2 hours, 4h is good, 5h less so. We want each query to land on unique rows.",
},
&cli.StringSliceFlag{
Name: otsdbFilters,
Value: cli.NewStringSlice("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"),
Usage: "Filters to process for discovering metrics in OpenTSDB",
},
&cli.Int64Flag{
Name: otsdbOffsetDays,
Usage: "Days to offset our 'starting' point for collecting data from OpenTSDB",
Value: 0,
},
&cli.Int64Flag{
Name: otsdbHardTSStart,
Usage: "A specific timestamp to start from, will override using an offset",
Value: 0,
},
/*
because the defaults are set *extremely* low in OpenTSDB (10-25 results), we will
set a larger default limit, but still allow a user to increase/decrease it
*/
&cli.IntFlag{
Name: otsdbQueryLimit,
Usage: "Result limit on meta queries to OpenTSDB (affects both metric name and tag value queries, recommended to use a value exceeding your largest series)",
Value: 100e3,
},
&cli.BoolFlag{
Name: otsdbMsecsTime,
Value: false,
Usage: "Whether OpenTSDB is writing values in milliseconds or seconds",
},
&cli.BoolFlag{
Name: otsdbNormalize,
Value: false,
Usage: "Whether to normalize all data received to lower case before forwarding to VictoriaMetrics",
},
}
)
const (
influxAddr = "influx-addr"
influxUser = "influx-user"
@@ -115,26 +189,26 @@ var (
&cli.StringFlag{
Name: influxAddr,
Value: "http://localhost:8086",
Usage: "Influx server addr",
Usage: "InfluxDB server addr",
},
&cli.StringFlag{
Name: influxUser,
Usage: "Influx user",
Usage: "InfluxDB user",
EnvVars: []string{"INFLUX_USERNAME"},
},
&cli.StringFlag{
Name: influxPassword,
Usage: "Influx user password",
Usage: "InfluxDB user password",
EnvVars: []string{"INFLUX_PASSWORD"},
},
&cli.StringFlag{
Name: influxDB,
Usage: "Influx database",
Usage: "InfluxDB database",
Required: true,
},
&cli.StringFlag{
Name: influxRetention,
Usage: "Influx retention policy",
Usage: "InfluxDB retention policy",
Value: "autogen",
},
&cli.IntFlag{
@@ -149,7 +223,7 @@ var (
},
&cli.StringFlag{
Name: influxFilterSeries,
Usage: "Influx filter expression to select series. E.g. \"from cpu where arch='x86' AND hostname='host_2753'\".\n" +
Usage: "InfluxDB filter expression to select series. E.g. \"from cpu where arch='x86' AND hostname='host_2753'\".\n" +
"See for details https://docs.influxdata.com/influxdb/v1.7/query_language/schema_exploration#show-series",
},
&cli.StringFlag{
@@ -243,8 +317,8 @@ var (
&cli.StringFlag{
Name: vmNativeSrcAddr,
Usage: "VictoriaMetrics address to perform export from. \n" +
" Should be the same as --httpListenAddr value for single-node version or VMSelect component." +
" If exporting from cluster version - include the tenet token in address.",
" Should be the same as --httpListenAddr value for single-node version or vmselect component." +
" If exporting from cluster version see https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format",
Required: true,
},
&cli.StringFlag{
@@ -260,8 +334,8 @@ var (
&cli.StringFlag{
Name: vmNativeDstAddr,
Usage: "VictoriaMetrics address to perform import to. \n" +
" Should be the same as --httpListenAddr value for single-node version or VMInsert component." +
" If importing into cluster version - include the tenet token in address.",
" Should be the same as --httpListenAddr value for single-node version or vminsert component." +
" If importing into cluster version see https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format",
Required: true,
},
&cli.StringFlag{

View File

@@ -61,7 +61,7 @@ func (s Series) fetchQuery(timeFilter string) string {
}
for i, pair := range s.LabelPairs {
pairV := valueEscaper.Replace(pair.Value)
fmt.Fprintf(f, " %q='%s'", pair.Name, pairV)
fmt.Fprintf(f, " %q::tag='%s'", pair.Name, pairV)
if i != len(s.LabelPairs)-1 {
f.WriteString(" and")
}

View File

@@ -19,7 +19,7 @@ func TestFetchQuery(t *testing.T) {
},
},
},
expected: `select "value" from "cpu" where "foo"='bar'`,
expected: `select "value" from "cpu" where "foo"::tag='bar'`,
},
{
s: Series{
@@ -36,7 +36,7 @@ func TestFetchQuery(t *testing.T) {
},
},
},
expected: `select "value" from "cpu" where "foo"='bar' and "baz"='qux'`,
expected: `select "value" from "cpu" where "foo"::tag='bar' and "baz"::tag='qux'`,
},
{
s: Series{
@@ -50,7 +50,7 @@ func TestFetchQuery(t *testing.T) {
},
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where "foo"='b\'ar' and time >= now()`,
expected: `select "value" from "cpu" where "foo"::tag='b\'ar' and time >= now()`,
},
{
s: Series{
@@ -68,7 +68,7 @@ func TestFetchQuery(t *testing.T) {
},
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where "name"='dev-mapper-centos\\x2dswap.swap' and "state"='dev-mapp\'er-c\'en\'tos' and time >= now()`,
expected: `select "value" from "cpu" where "name"::tag='dev-mapper-centos\\x2dswap.swap' and "state"::tag='dev-mapp\'er-c\'en\'tos' and time >= now()`,
},
{
s: Series{

View File

@@ -10,6 +10,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/influx"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
@@ -20,9 +21,41 @@ func main() {
start := time.Now()
app := &cli.App{
Name: "vmctl",
Usage: "Victoria metrics command-line tool",
Usage: "VictoriaMetrics command-line tool",
Version: buildinfo.Version,
Commands: []*cli.Command{
{
Name: "opentsdb",
Usage: "Migrate timeseries from OpenTSDB",
Flags: mergeFlags(globalFlags, otsdbFlags, vmFlags),
Action: func(c *cli.Context) error {
fmt.Println("OpenTSDB import mode")
oCfg := opentsdb.Config{
Addr: c.String(otsdbAddr),
Limit: c.Int(otsdbQueryLimit),
Offset: c.Int64(otsdbOffsetDays),
HardTS: c.Int64(otsdbHardTSStart),
Retentions: c.StringSlice(otsdbRetentions),
Filters: c.StringSlice(otsdbFilters),
Normalize: c.Bool(otsdbNormalize),
MsecsTime: c.Bool(otsdbMsecsTime),
}
otsdbClient, err := opentsdb.NewClient(oCfg)
if err != nil {
return fmt.Errorf("failed to create opentsdb client: %s", err)
}
vmCfg := initConfigVM(c)
importer, err := vm.NewImporter(vmCfg)
if err != nil {
return fmt.Errorf("failed to create VM importer: %s", err)
}
otsdbProcessor := newOtsdbProcessor(otsdbClient, importer, c.Int(otsdbConcurrency))
return otsdbProcessor.run(c.Bool(globalSilent))
},
},
{
Name: "influx",
Usage: "Migrate timeseries from InfluxDB",

164
app/vmctl/opentsdb.go Normal file
View File

@@ -0,0 +1,164 @@
package main
import (
"fmt"
"log"
"sync"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/opentsdb"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/cheggaaa/pb/v3"
)
type otsdbProcessor struct {
oc *opentsdb.Client
im *vm.Importer
otsdbcc int
}
type queryObj struct {
Series opentsdb.Meta
Rt opentsdb.RetentionMeta
Tr opentsdb.TimeRange
StartTime int64
}
func newOtsdbProcessor(oc *opentsdb.Client, im *vm.Importer, otsdbcc int) *otsdbProcessor {
if otsdbcc < 1 {
otsdbcc = 1
}
return &otsdbProcessor{
oc: oc,
im: im,
otsdbcc: otsdbcc,
}
}
func (op *otsdbProcessor) run(silent bool) error {
log.Println("Loading all metrics from OpenTSDB for filters: ", op.oc.Filters)
var metrics []string
for _, filter := range op.oc.Filters {
q := fmt.Sprintf("%s/api/suggest?type=metrics&q=%s&max=%d", op.oc.Addr, filter, op.oc.Limit)
m, err := op.oc.FindMetrics(q)
if err != nil {
return fmt.Errorf("metric discovery failed for %q: %s", q, err)
}
metrics = append(metrics, m...)
}
if len(metrics) < 1 {
return fmt.Errorf("found no timeseries to import with filters %q", op.oc.Filters)
}
question := fmt.Sprintf("Found %d metrics to import. Continue?", len(metrics))
if !silent && !prompt(question) {
return nil
}
op.im.ResetStats()
var startTime int64
if op.oc.HardTS != 0 {
startTime = op.oc.HardTS
} else {
startTime = time.Now().Unix()
}
queryRanges := 0
// pre-calculate the number of query ranges we'll be processing
for _, rt := range op.oc.Retentions {
queryRanges += len(rt.QueryRanges)
}
for _, metric := range metrics {
log.Println(fmt.Sprintf("Starting work on %s", metric))
serieslist, err := op.oc.FindSeries(metric)
if err != nil {
return fmt.Errorf("couldn't retrieve series list for %s : %s", metric, err)
}
/*
Create channels for collecting/processing series and errors
We'll create them per metric to reduce pressure against OpenTSDB
Limit the size of seriesCh so we can't get too far ahead of actual processing
*/
seriesCh := make(chan queryObj, op.otsdbcc)
errCh := make(chan error)
// we're going to make serieslist * queryRanges queries, so we should represent that in the progress bar
bar := pb.StartNew(len(serieslist) * queryRanges)
var wg sync.WaitGroup
wg.Add(op.otsdbcc)
for i := 0; i < op.otsdbcc; i++ {
go func() {
defer wg.Done()
for s := range seriesCh {
if err := op.do(s); err != nil {
errCh <- fmt.Errorf("couldn't retrieve series for %s : %s", metric, err)
return
}
bar.Increment()
}
}()
}
/*
Loop through all series for this metric, processing all retentions and time ranges
requested. This loop is our primary "collect data from OpenTSDB loop" and should
be async, sending data to VictoriaMetrics over time.
The idea with having the select at the inner-most loop is to ensure quick
short-circuiting on error.
*/
for _, series := range serieslist {
for _, rt := range op.oc.Retentions {
for _, tr := range rt.QueryRanges {
select {
case otsdbErr := <-errCh:
return fmt.Errorf("opentsdb error: %s", otsdbErr)
case vmErr := <-op.im.Errors():
return fmt.Errorf("Import process failed: \n%s", wrapErr(vmErr))
case seriesCh <- queryObj{
Tr: tr, StartTime: startTime,
Series: series, Rt: opentsdb.RetentionMeta{
FirstOrder: rt.FirstOrder, SecondOrder: rt.SecondOrder, AggTime: rt.AggTime}}:
}
}
}
}
// Drain channels per metric
close(seriesCh)
wg.Wait()
close(errCh)
// check for any lingering errors on the query side
for otsdbErr := range errCh {
return fmt.Errorf("Import process failed: \n%s", otsdbErr)
}
bar.Finish()
log.Print(op.im.Stats())
}
op.im.Close()
for vmErr := range op.im.Errors() {
return fmt.Errorf("Import process failed: \n%s", wrapErr(vmErr))
}
log.Println("Import finished!")
log.Print(op.im.Stats())
return nil
}
func (op *otsdbProcessor) do(s queryObj) error {
start := s.StartTime - s.Tr.Start
end := s.StartTime - s.Tr.End
data, err := op.oc.GetData(s.Series, s.Rt, start, end)
if err != nil {
return fmt.Errorf("failed to collect data for %v in %v:%v :: %v", s.Series, s.Rt, s.Tr, err)
}
if len(data.Timestamps) < 1 || len(data.Values) < 1 {
return nil
}
labels := make([]vm.LabelPair, len(data.Tags))
for k, v := range data.Tags {
labels = append(labels, vm.LabelPair{Name: k, Value: v})
}
op.im.Input() <- &vm.TimeSeries{
Name: data.Metric,
LabelPairs: labels,
Timestamps: data.Timestamps,
Values: data.Values,
}
return nil
}

View File

@@ -0,0 +1,349 @@
package opentsdb
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
"strings"
"time"
)
// Retention objects contain meta data about what to query for our run
type Retention struct {
/*
OpenTSDB has two levels of aggregation,
First, we aggregate any un-mentioned tags into the last result
Second, we aggregate into buckets over time
To simulate this with config, we have
FirstOrder (e.g. sum/avg/max/etc.)
SecondOrder (e.g. sum/avg/max/etc.)
AggTime (e.g. 1m/10m/1d/etc.)
This will build into m=<FirstOrder>:<AggTime>-<SecondOrder>-none:
Or an example: m=sum:1m-avg-none
*/
FirstOrder string
SecondOrder string
AggTime string
// The actual ranges will will attempt to query (as offsets from now)
QueryRanges []TimeRange
}
// RetentionMeta objects exist to pass smaller subsets (only one retention range) of a full Retention object around
type RetentionMeta struct {
FirstOrder string
SecondOrder string
AggTime string
}
// Client object holds general config about how queries should be performed
type Client struct {
Addr string
// The meta query limit for series returned
Limit int
Retentions []Retention
Filters []string
Normalize bool
HardTS int64
}
// Config contains fields required
// for Client configuration
type Config struct {
Addr string
Limit int
Offset int64
HardTS int64
Retentions []string
Filters []string
Normalize bool
MsecsTime bool
}
// TimeRange contains data about time ranges to query
type TimeRange struct {
Start int64
End int64
}
// MetaResults contains return data from search series lookup queries
type MetaResults struct {
Type string `json:"type"`
Results []Meta `json:"results"`
//metric string
//tags interface{}
//limit int
//time int
//startIndex int
//totalResults int
}
// Meta A meta object about a metric
// only contain the tags/etc. and no data
type Meta struct {
//tsuid string
Metric string `json:"metric"`
Tags map[string]string `json:"tags"`
}
// Metric holds the time series data
type Metric struct {
Metric string
Tags map[string]string
Timestamps []int64
Values []float64
}
// ExpressionOutput contains results from actual data queries
type ExpressionOutput struct {
Outputs []qoObj `json:"outputs"`
Query interface{} `json:"query"`
}
// QoObj contains actual timeseries data from the returned data query
type qoObj struct {
ID string `json:"id"`
Alias string `json:"alias"`
Dps [][]float64 `json:"dps"`
//dpsMeta interface{}
//meta interface{}
}
// Expression objects format our data queries
/*
All of the following structs are to build a OpenTSDB expression object
*/
type Expression struct {
Time timeObj `json:"time"`
Filters []filterObj `json:"filters"`
Metrics []metricObj `json:"metrics"`
// this just needs to be an empty object, so the value doesn't matter
Expressions []int `json:"expressions"`
Outputs []outputObj `json:"outputs"`
}
type timeObj struct {
Start int64 `json:"start"`
End int64 `json:"end"`
Aggregator string `json:"aggregator"`
Downsampler dSObj `json:"downsampler"`
}
type dSObj struct {
Interval string `json:"interval"`
Aggregator string `json:"aggregator"`
FillPolicy fillObj `json:"fillPolicy"`
}
type fillObj struct {
// we'll always hard-code to NaN here, so we don't need value
Policy string `json:"policy"`
}
type filterObj struct {
Tags []tagObj `json:"tags"`
ID string `json:"id"`
}
type tagObj struct {
Type string `json:"type"`
Tagk string `json:"tagk"`
Filter string `json:"filter"`
GroupBy bool `json:"groupBy"`
}
type metricObj struct {
ID string `json:"id"`
Metric string `json:"metric"`
Filter string `json:"filter"`
FillPolicy fillObj `json:"fillPolicy"`
}
type outputObj struct {
ID string `json:"id"`
Alias string `json:"alias"`
}
/* End expression object structs */
var (
exprOutput = outputObj{ID: "a", Alias: "query"}
exprFillPolicy = fillObj{Policy: "nan"}
)
// FindMetrics discovers all metrics that OpenTSDB knows about (given a filter)
// e.g. /api/suggest?type=metrics&q=system&max=100000
func (c Client) FindMetrics(q string) ([]string, error) {
resp, err := http.Get(q)
if err != nil {
return nil, fmt.Errorf("failed to send GET request to %q: %s", q, err)
}
if resp.StatusCode != 200 {
return nil, fmt.Errorf("Bad return from OpenTSDB: %q: %v", resp.StatusCode, resp)
}
defer func() { _ = resp.Body.Close() }()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("could not retrieve metric data from %q: %s", q, err)
}
var metriclist []string
err = json.Unmarshal(body, &metriclist)
if err != nil {
return nil, fmt.Errorf("failed to read response from %q: %s", q, err)
}
return metriclist, nil
}
// FindSeries discovers all series associated with a metric
// e.g. /api/search/lookup?m=system.load5&limit=1000000
func (c Client) FindSeries(metric string) ([]Meta, error) {
q := fmt.Sprintf("%s/api/search/lookup?m=%s&limit=%d", c.Addr, metric, c.Limit)
resp, err := http.Get(q)
if err != nil {
return nil, fmt.Errorf("failed to set GET request to %q: %s", q, err)
}
if resp.StatusCode != 200 {
return nil, fmt.Errorf("Bad return from OpenTSDB: %q: %v", resp.StatusCode, resp)
}
defer func() { _ = resp.Body.Close() }()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("could not retrieve series data from %q: %s", q, err)
}
var results MetaResults
err = json.Unmarshal(body, &results)
if err != nil {
return nil, fmt.Errorf("failed to read response from %q: %s", q, err)
}
return results.Results, nil
}
// GetData actually retrieves data for a series at a specified time range
func (c Client) GetData(series Meta, rt RetentionMeta, start int64, end int64) (Metric, error) {
/*
Here we build the actual exp query we'll send to OpenTSDB
This is comprised of a number of different settings. We hard-code
a few to simplify the JSON object creation.
There are examples queries available, so not too much detail here...
*/
expr := Expression{}
expr.Outputs = []outputObj{exprOutput}
expr.Metrics = append(expr.Metrics, metricObj{ID: "a", Metric: series.Metric,
Filter: "f1", FillPolicy: exprFillPolicy})
expr.Time = timeObj{Start: start, End: end, Aggregator: rt.FirstOrder,
Downsampler: dSObj{Interval: rt.AggTime,
Aggregator: rt.SecondOrder,
FillPolicy: exprFillPolicy}}
var TagList []tagObj
for k, v := range series.Tags {
/*
every tag should be a literal_or because that's the closest to a full "==" that
this endpoint allows for
*/
TagList = append(TagList, tagObj{Type: "literal_or", Tagk: k,
Filter: v, GroupBy: true})
}
expr.Filters = append(expr.Filters, filterObj{ID: "f1", Tags: TagList})
// "expressions" is required in the query object or we get a 5xx, so force it to exist
expr.Expressions = make([]int, 0)
inputData, err := json.Marshal(expr)
if err != nil {
return Metric{}, fmt.Errorf("failed to marshal query JSON %s", err)
}
q := fmt.Sprintf("%s/api/query/exp", c.Addr)
resp, err := http.Post(q, "application/json", bytes.NewBuffer(inputData))
if err != nil {
return Metric{}, fmt.Errorf("failed to send GET request to %q: %s", q, err)
}
if resp.StatusCode != 200 {
return Metric{}, fmt.Errorf("Bad return from OpenTSDB: %q: %v", resp.StatusCode, resp)
}
defer func() { _ = resp.Body.Close() }()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
return Metric{}, fmt.Errorf("could not retrieve series data from %q: %s", q, err)
}
var output ExpressionOutput
err = json.Unmarshal(body, &output)
if err != nil {
return Metric{}, fmt.Errorf("failed to unmarshal response from %q: %s", q, err)
}
if len(output.Outputs) < 1 {
// no results returned...return an empty object without error
return Metric{}, nil
}
data := Metric{}
data.Metric = series.Metric
data.Tags = series.Tags
/*
We evaluate data for correctness before formatting the actual values
to skip a little bit of time if the series has invalid formatting
First step is to enforce Prometheus' data model
*/
data, err = modifyData(data, c.Normalize)
if err != nil {
return Metric{}, fmt.Errorf("invalid series data from %q: %s", q, err)
}
/*
Convert data from OpenTSDB's output format ([[ts,val],[ts,val]...])
to VictoriaMetrics format: {"timestamps": [ts,ts,ts...], "values": [val,val,val...]}
The nasty part here is that because an object in each array
can be a float64, we have to initially cast _all_ objects that way
then convert the timestamp back to something reasonable.
*/
for _, tsobj := range output.Outputs[0].Dps {
data.Timestamps = append(data.Timestamps, int64(tsobj[0]))
data.Values = append(data.Values, tsobj[1])
}
return data, nil
}
// NewClient creates and returns OpenTSDB client
// configured with passed Config
func NewClient(cfg Config) (*Client, error) {
var retentions []Retention
offsetPrint := int64(time.Now().Unix())
if cfg.MsecsTime {
// 1000000 == Nanoseconds -> Milliseconds difference
offsetPrint = int64(time.Now().UnixNano() / 1000000)
}
if cfg.HardTS > 0 {
/*
HardTS is a specific timestamp we'll be starting at.
Just present that if it is defined
*/
offsetPrint = cfg.HardTS
} else if cfg.Offset > 0 {
/*
Our "offset" is the number of days we should step
back before starting to scan for data
*/
if cfg.MsecsTime {
offsetPrint = offsetPrint - (cfg.Offset * 24 * 60 * 60 * 1000)
} else {
offsetPrint = offsetPrint - (cfg.Offset * 24 * 60 * 60)
}
}
log.Println(fmt.Sprintf("Will collect data starting at TS %v", offsetPrint))
for _, r := range cfg.Retentions {
ret, err := convertRetention(r, cfg.Offset, cfg.MsecsTime)
if err != nil {
return &Client{}, fmt.Errorf("Couldn't parse retention %q :: %v", r, err)
}
retentions = append(retentions, ret)
}
client := &Client{
Addr: strings.Trim(cfg.Addr, "/"),
Retentions: retentions,
Limit: cfg.Limit,
Filters: cfg.Filters,
Normalize: cfg.Normalize,
HardTS: cfg.HardTS,
}
return client, nil
}

View File

@@ -0,0 +1,173 @@
package opentsdb
import (
"fmt"
"regexp"
"strconv"
"strings"
"time"
)
var (
allowedNames = regexp.MustCompile("^[a-zA-Z][a-zA-Z0-9_:]*$")
allowedFirstChar = regexp.MustCompile("^[a-zA-Z]")
replaceChars = regexp.MustCompile("[^a-zA-Z0-9_:]")
allowedTagKeys = regexp.MustCompile("^[a-zA-Z][a-zA-Z0-9_]*$")
)
func convertDuration(duration string) (time.Duration, error) {
/*
Golang's time library doesn't support many different
string formats (year, month, week, day) because they
aren't consistent ranges. But Java's library _does_.
Consequently, we'll need to handle all the custom
time ranges, and, to make the internal API call consistent,
we'll need to allow for durations that Go supports, too.
The nice thing is all the "broken" time ranges are > 1 hour,
so we can just make assumptions to convert them to a range in hours.
They aren't *good* assumptions, but they're reasonable
for this function.
*/
var actualDuration time.Duration
var err error
var timeValue int
if strings.HasSuffix(duration, "y") {
timeValue, err = strconv.Atoi(strings.Trim(duration, "y"))
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
timeValue = timeValue * 365 * 24
actualDuration, err = time.ParseDuration(fmt.Sprintf("%vh", timeValue))
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
} else if strings.HasSuffix(duration, "w") {
timeValue, err = strconv.Atoi(strings.Trim(duration, "w"))
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
timeValue = timeValue * 7 * 24
actualDuration, err = time.ParseDuration(fmt.Sprintf("%vh", timeValue))
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
} else if strings.HasSuffix(duration, "d") {
timeValue, err = strconv.Atoi(strings.Trim(duration, "d"))
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
timeValue = timeValue * 24
actualDuration, err = time.ParseDuration(fmt.Sprintf("%vh", timeValue))
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
} else if strings.HasSuffix(duration, "h") || strings.HasSuffix(duration, "m") || strings.HasSuffix(duration, "s") || strings.HasSuffix(duration, "ms") {
actualDuration, err = time.ParseDuration(duration)
if err != nil {
return 0, fmt.Errorf("invalid time range: %q", duration)
}
} else {
return 0, fmt.Errorf("invalid time duration string: %q", duration)
}
return actualDuration, nil
}
// Convert an incoming retention "string" into the component parts
func convertRetention(retention string, offset int64, msecTime bool) (Retention, error) {
/*
A retention string coming in looks like
sum-1m-avg:1h:30d
So we:
1. split on the :
2. split on the - in slice 0
3. create the time ranges we actually need
*/
chunks := strings.Split(retention, ":")
if len(chunks) != 3 {
return Retention{}, fmt.Errorf("invalid retention string: %q", retention)
}
rowLengthDuration, err := convertDuration(chunks[1])
if err != nil {
return Retention{}, fmt.Errorf("invalid row length (first order) duration string: %q: %s", chunks[1], err)
}
// set length of each row in milliseconds, unless we aren't using millisecond time in OpenTSDB...then use seconds
rowLength := rowLengthDuration.Milliseconds()
if !msecTime {
rowLength = rowLength / 1000
}
ttlDuration, err := convertDuration(chunks[2])
if err != nil {
return Retention{}, fmt.Errorf("invalid ttl (second order) duration string: %q: %s", chunks[2], err)
}
// set ttl in milliseconds, unless we aren't using millisecond time in OpenTSDB...then use seconds
ttl := ttlDuration.Milliseconds()
if !msecTime {
ttl = ttl / 1000
}
// bump by the offset so we don't look at empty ranges any time offset > ttl
ttl += offset
var timeChunks []TimeRange
var i int64
for i = offset; i <= ttl; i = i + rowLength {
timeChunks = append(timeChunks, TimeRange{Start: i + rowLength, End: i})
}
// first/second order aggregations for queries defined in chunk 0...
aggregates := strings.Split(chunks[0], "-")
if len(aggregates) != 3 {
return Retention{}, fmt.Errorf("invalid aggregation string: %q", chunks[0])
}
ret := Retention{FirstOrder: aggregates[0],
SecondOrder: aggregates[2],
AggTime: aggregates[1],
QueryRanges: timeChunks}
return ret, nil
}
// This ensures any incoming data from OpenTSDB matches the Prometheus data model
// https://prometheus.io/docs/concepts/data_model
func modifyData(msg Metric, normalize bool) (Metric, error) {
finalMsg := Metric{
Metric: "", Tags: make(map[string]string),
Timestamps: msg.Timestamps, Values: msg.Values,
}
// if the metric name has invalid characters, the data model says to drop it
if !allowedFirstChar.MatchString(msg.Metric) {
return Metric{}, fmt.Errorf("%s has a bad first character", msg.Metric)
}
name := msg.Metric
// if normalization requested, lowercase the name
if normalize {
name = strings.ToLower(name)
}
/*
replace bad characters in metric name with _ per the data model
only replace if needed to reduce string processing time
*/
if !allowedNames.MatchString(name) {
finalMsg.Metric = replaceChars.ReplaceAllString(name, "_")
} else {
finalMsg.Metric = name
}
// replace bad characters in tag keys with _ per the data model
for key, value := range msg.Tags {
// if normalization requested, lowercase the key and value
if normalize {
key = strings.ToLower(key)
value = strings.ToLower(value)
}
/*
replace all explicitly bad characters with _
only replace if needed to reduce string processing time
*/
if !allowedTagKeys.MatchString(key) {
key = replaceChars.ReplaceAllString(key, "_")
}
// tags that start with __ are considered custom stats for internal prometheus stuff, we should drop them
if !strings.HasPrefix(key, "__") {
finalMsg.Tags[key] = value
}
}
return finalMsg, nil
}

View File

@@ -0,0 +1,217 @@
package opentsdb
import (
"testing"
)
func TestConvertRetention(t *testing.T) {
/*
2592000 seconds in 30 days
3600 in one hour
2592000 / 3600 = 720 individual query "ranges" should exist, plus one because time ranges can be weird
First order should == "sum"
Second order should == "avg"
AggTime should == "1m"
*/
res, err := convertRetention("sum-1m-avg:1h:30d", 0, false)
if err != nil {
t.Fatalf("Error parsing valid retention string: %v", err)
}
if len(res.QueryRanges) != 721 {
t.Fatalf("Found %v query ranges. Should have found 720", len(res.QueryRanges))
}
if res.FirstOrder != "sum" {
t.Fatalf("Incorrect first order aggregation %q. Should have been 'sum'", res.FirstOrder)
}
if res.SecondOrder != "avg" {
t.Fatalf("Incorrect second order aggregation %q. Should have been 'avg'", res.SecondOrder)
}
if res.AggTime != "1m" {
t.Fatalf("Incorrect aggregation time length %q. Should have been '1m'", res.AggTime)
}
/*
Invalid retention string
*/
res, err = convertRetention("sum-1m-avg:30d", 0, false)
if err == nil {
t.Fatalf("Bad retention string (sum-1m-avg:30d) didn't fail: %v", res)
}
/*
Invalid aggregation string
*/
res, err = convertRetention("sum-1m:1h:30d", 0, false)
if err == nil {
t.Fatalf("Bad aggregation string (sum-1m:1h:30d) didn't fail: %v", res)
}
}
func TestModifyData(t *testing.T) {
/*
Good metric metadata
*/
m := Metric{
Metric: "cpu",
Tags: map[string]string{
"core": "0",
},
Values: []float64{
0,
},
Timestamps: []int64{
0,
},
}
res, err := modifyData(m, false)
if err != nil {
t.Fatalf("Valid metric %v failed to parse: %v", m, err)
}
if res.Metric != "cpu" {
t.Fatalf("Valid metric name %q was converted: %q", m.Metric, res.Metric)
}
found := false
for k := range res.Tags {
if k == "core" {
found = true
break
}
}
if !found {
t.Fatalf("Valid metric tag name 'core' missing: %v", res.Tags)
}
/*
Bad first character in metric name
metric names cannot start with _, so this
metric should fail entirely
*/
m = Metric{
Metric: "_cpu",
Tags: map[string]string{
"core": "0",
},
Values: []float64{
0,
},
Timestamps: []int64{
0,
},
}
res, err = modifyData(m, false)
if err == nil {
t.Fatalf("Invalid metric %v parsed?", m)
}
/*
Bad character in metric name
metric names cannot have `.`, so this
should be converted to `_`
*/
m = Metric{
Metric: "cpu.name",
Tags: map[string]string{
"core": "0",
},
Values: []float64{
0,
},
Timestamps: []int64{
0,
},
}
res, err = modifyData(m, false)
if err != nil {
t.Fatalf("Valid metric failed to parse? %v", err)
}
if res.Metric != "cpu_name" {
t.Fatalf("Metric name not correctly converted from 'cpu.name' to 'cpu_name': %q", res.Metric)
}
/*
bad tag prefix (__)
Prometheus considers tags beginning with __
to be internal use only. They should not show up in incoming data.
this tag should be dropped from the result
*/
m = Metric{
Metric: "cpu",
Tags: map[string]string{
"__core": "0",
},
Values: []float64{
0,
},
Timestamps: []int64{
0,
},
}
res, err = modifyData(m, false)
if err != nil {
t.Fatalf("Valid metric failed to parse? %v", err)
}
found = false
for k := range res.Tags {
if k == "__core" {
found = true
break
}
}
if found {
t.Fatalf("Bad tag key prefix (__) found")
}
/*
bad tag key
tag keys cannot contain `.`, this should be
replaced with `_`
*/
m = Metric{
Metric: "cpu",
Tags: map[string]string{
"core.name": "0",
},
Values: []float64{
0,
},
Timestamps: []int64{
0,
},
}
res, err = modifyData(m, false)
if err != nil {
t.Fatalf("Valid metric failed to parse? %v", err)
}
found = false
for k := range res.Tags {
if k == "core.name" {
found = true
break
}
}
if found {
t.Fatalf("Bad tag key 'core.name' not converted")
}
/*
test normalize
All characters should be returned lowercase
*/
m = Metric{
Metric: "CPU",
Tags: map[string]string{
"core": "0",
},
Values: []float64{
0,
},
Timestamps: []int64{
0,
},
}
res, err = modifyData(m, true)
if err != nil {
t.Fatalf("Valid metric failed to parse? %v", err)
}
if res.Metric != "cpu" {
t.Fatalf("Normalization of metric name didn't happen!")
}
}

View File

@@ -0,0 +1,398 @@
{
"outputs": [
{
"id": "a",
"alias": "query",
"dps": [
[
1614099600000,
0.28
],
[
1614099660000,
0.22
],
[
1614099720000,
0.18
],
[
1614099780000,
0.14
],
[
1614099840000,
0.24
],
[
1614099900000,
0.19
],
[
1614099960000,
0.22
],
[
1614100020000,
0.2
],
[
1614100080000,
0.18
],
[
1614100140000,
0.22
],
[
1614100200000,
0.17
],
[
1614100260000,
0.16
],
[
1614100320000,
0.22
],
[
1614100380000,
0.3
],
[
1614100440000,
0.28
],
[
1614100500000,
0.27
],
[
1614100560000,
0.26
],
[
1614100620000,
0.23
],
[
1614100680000,
0.18
],
[
1614100740000,
0.3
],
[
1614100800000,
0.24
],
[
1614100860000,
0.19
],
[
1614100920000,
0.16
],
[
1614100980000,
0.19
],
[
1614101040000,
0.23
],
[
1614101100000,
0.18
],
[
1614101160000,
0.15
],
[
1614101220000,
0.12
],
[
1614101280000,
0.1
],
[
1614101340000,
0.24
],
[
1614101400000,
0.19
],
[
1614101460000,
0.16
],
[
1614101520000,
0.14
],
[
1614101580000,
0.12
],
[
1614101640000,
0.14
],
[
1614101700000,
0.12
],
[
1614101760000,
0.13
],
[
1614101820000,
0.12
],
[
1614101880000,
0.11
],
[
1614101940000,
0.36
],
[
1614102000000,
0.35
],
[
1614102060000,
0.3
],
[
1614102120000,
0.32
],
[
1614102180000,
0.27
],
[
1614102240000,
0.26
],
[
1614102300000,
0.21
],
[
1614102360000,
0.18
],
[
1614102420000,
0.15
],
[
1614102480000,
0.12
],
[
1614102540000,
0.24
],
[
1614102600000,
0.2
],
[
1614102660000,
0.17
],
[
1614102720000,
0.18
],
[
1614102780000,
0.14
],
[
1614102840000,
0.39
],
[
1614102900000,
0.31
],
[
1614102960000,
0.3
],
[
1614103020000,
0.24
],
[
1614103080000,
0.26
],
[
1614103140000,
0.21
],
[
1614103200000,
0.17
],
[
1614103260000,
0.15
],
[
1614103320000,
0.2
],
[
1614103380000,
0.2
],
[
1614103440000,
0.22
],
[
1614103500000,
0.19
],
[
1614103560000,
0.22
],
[
1614103620000,
0.29
],
[
1614103680000,
0.31
],
[
1614103740000,
0.28
],
[
1614103800000,
0.23
]
],
"dpsMeta": {
"firstTimestamp": 1614099600000,
"lastTimestamp": 1614103800000,
"setCount": 71,
"series": 1
},
"meta": [
{
"index": 0,
"metrics": [
"timestamp"
]
},
{
"index": 1,
"metrics": [
"system.load5"
],
"commonTags": {
"rack": "undef",
"host": "use1-mon-metrics-1",
"row": "undef",
"dc": "us-east-1",
"group": "monitoring"
},
"aggregatedTags": []
}
]
}
],
"query": {
"name": null,
"time": {
"start": "1h-ago",
"end": null,
"timezone": null,
"downsampler": {
"interval": "1m",
"aggregator": "avg",
"fillPolicy": {
"policy": "nan",
"value": "NaN"
}
},
"aggregator": "sum",
"rate": false
},
"filters": [
{
"id": "f1",
"tags": [
{
"tagk": "host",
"filter": "use1-mon-metrics-1",
"group_by": true,
"type": "literal_or"
},
{
"tagk": "group",
"filter": "monitoring",
"group_by": true,
"type": "literal_or"
},
{
"tagk": "dc",
"filter": "us-east-1",
"group_by": true,
"type": "literal_or"
},
{
"tagk": "rack",
"filter": "undef",
"group_by": true,
"type": "literal_or"
},
{
"tagk": "row",
"filter": "undef",
"group_by": true,
"type": "literal_or"
}
],
"explicitTags": false
}
],
"metrics": [
{
"metric": "system.load5",
"id": "a",
"filter": "f1",
"aggregator": null,
"timeOffset": null,
"fillPolicy": {
"policy": "nan",
"value": "NaN"
}
}
],
"expressions": [],
"outputs": [
{
"id": "a",
"alias": "query"
}
]
}
}

Some files were not shown because too many files have changed in this diff Show More