Compare commits

...

293 Commits

Author SHA1 Message Date
Aliaksandr Valialkin
61cc13c16f docs/CHANGELOG.md: cut v1.63.0 2021-07-15 14:02:13 +03:00
Aliaksandr Valialkin
b060d8bf53 vendor: make vendor-update 2021-07-15 12:55:40 +03:00
Aliaksandr Valialkin
d472b03e34 lib/storage: make sure the second call to DeduplicateSamples and deduplicateSamplesDuringMerge doesnt change samples 2021-07-15 12:17:45 +03:00
Aliaksandr Valialkin
682662b2ae lib/storage: remove cache directory if it contains reset_cache_on_startup file
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447
2021-07-13 17:58:51 +03:00
Aliaksandr Valialkin
2b0e3efa5c .github/workflows/wiki.yml: properly copy subdirectories 2021-07-13 17:35:02 +03:00
Aliaksandr Valialkin
8d99e94a52 docs: clarify why spare CPU and RAM resources are needed in capacity planning 2021-07-13 15:48:25 +03:00
Aliaksandr Valialkin
8d0ec47be9 docs/Cluster-VictoriaMetrics.md: clarify the docs about the needed values for -dedup.minScrapeInterval at vmselect during replication when the data is pushed from HA pair 2021-07-13 15:43:02 +03:00
Aliaksandr Valialkin
2df66dad7b lib/httpserver: add is_set label to flag metrics
This label allows determining the set flags with the query `flag{is_set="true"}`
2021-07-13 15:10:13 +03:00
Aliaksandr Valialkin
5bc240bffe vendor: make vendor-update 2021-07-13 14:30:19 +03:00
Aliaksandr Valialkin
244d0fe5d7 deployment/docker: update Go builder from v1.16.5 to v1.16.6
Ths Go release has the following bugfixes: https://github.com/golang/go/issues?q=milestone%3AGo1.16.6+label%3ACherryPickApproved
2021-07-13 14:25:41 +03:00
Aliaksandr Valialkin
a925d5a3e1 app/vmselect/promql: duration handling improvements in MetricsQL queries
- Support durations anywhere in MetricsQL queries. E.g. sum_over_time(m[1h])/1h is equivalent to sum_over_time(m[1h])/3600
- Support durations without suffix. E.g. rate(m[300]) is equivalent to rate(m[5m])
2021-07-12 17:16:41 +03:00
Aliaksandr Valialkin
f9de546139 lib/storage: reset perKeyMisses stats less frequently
This should reduce CPU usage for queries executed with intervals higher than 30 seconds
2021-07-12 14:33:42 +03:00
Aliaksandr Valialkin
4f80b2f230 lib/storage: properly limit the size of storage/date_metricID cache 2021-07-12 14:25:44 +03:00
Aliaksandr Valialkin
f3a5465ece docs/CHANGELOG.md: document the change from bfba4c28a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1444
2021-07-12 12:44:06 +03:00
Aliaksandr Valialkin
bfba4c28a4 app/vmalert: accept Prometheus-like durations in interval config option inside group section 2021-07-12 12:35:17 +03:00
Aliaksandr Valialkin
a684408e27 docs: update http://slack.victoriametrics.com to https://slack.victoriametrics.com 2021-07-12 10:57:16 +03:00
Aliaksandr Valialkin
8ca2799478 lib/storage: properly determine when the deduplication is needed in needsDedup
Previously needsDedup() could return true if the de-duplication wasn't needed for the following case:

         d < interval
           /     \
   |        v | v        |
     interval   interval

Now it properly returns false for this case
2021-07-12 10:53:30 +03:00
Aliaksandr Valialkin
f539772ca6 docs: sync with the cluster branch 2021-07-10 12:45:38 +03:00
Aliaksandr Valialkin
8c764e88f0 app/vmui: move source code from https://github.com/VictoriaMetrics/vmui to app/vmui
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
2021-07-09 17:15:23 +03:00
Aliaksandr Valialkin
2be340a4c9 docs: clarify what does "workload" mean in capacity planning docs 2021-07-09 12:49:53 +03:00
Aliaksandr Valialkin
c5f0b454f0 app/vmselect: follow-up after aa11ef6d3b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413
2021-07-07 17:43:35 +03:00
tony
e9e35a7d6a add vmui for vmselect component (#1431)
Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-07-07 17:33:02 +03:00
Aliaksandr Valialkin
92b8380fdf docs/Cluster-VictoriaMetrics.md: improve capacity planning recommendations 2021-07-07 16:22:51 +03:00
Aliaksandr Valialkin
1152de4321 vendor: make vendor-update 2021-07-07 16:05:04 +03:00
Aliaksandr Valialkin
7e77980608 vendor: update github.com/VictoriaMetrics/metrics from v1.17.2 to v1.17.3 2021-07-07 16:00:25 +03:00
Aliaksandr Valialkin
6e0553c92e lib/mergeset: cache indexBlock items only on the second request
This should reduce the indexdb/indexBlocks cache size, since it won't contain one-time-wonders items.
2021-07-07 15:23:06 +03:00
Aliaksandr Valialkin
ed944313b0 app/{vminsert,vmselect}: export vminsert_request_duration_seconds and vmselect_request_duration_seconds histograms 2021-07-07 13:25:21 +03:00
Aliaksandr Valialkin
766edbc421 lib/httpserver: print full requestURI in httpserver.Errorf
This should simplify debugging.
2021-07-07 13:09:40 +03:00
Aliaksandr Valialkin
c55a64ba08 docs: clarify capacity planning docs 2021-07-07 12:46:55 +03:00
Aliaksandr Valialkin
e843bd7bd7 lib/storage: do not cache inmemoryBlock entries requested only once (aka one-time-wonder items)
This should reduce the cache size and memory usage for the indexdb/dataBlocks cache
2021-07-07 10:58:51 +03:00
Aliaksandr Valialkin
8b262d4ba7 lib/storage: periodically reset prefetchedMetricIDs cache in order to limit its size under high churn rate 2021-07-07 10:58:51 +03:00
Roman Khavronenko
2f54559c89 alerts: sync alert expression for DiskRunsOutOfSpaceIn3Days with dashboard (#1436) 2021-07-07 10:31:09 +03:00
Aliaksandr Valialkin
a7694092b8 Revert "lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method"
This reverts commit 7c6d3981bf.

Reason for revert: high contention at bucket16Pool on systems with big number of CPU cores.
This slows down query processing significantly.
2021-07-06 18:21:35 +03:00
Aliaksandr Valialkin
8aa9bba9bd lib/{mergeset,storage}: switch from sync.Pool to chan-based pool for inmemoryPart objects
This should reduce memory usage on systems with big number of CPU cores,
since every inmemoryPart object occupies at least 64KB of memory and sync.Pool maintains
a separate pool inmemoryPart objects per each CPU core.

Though the new scheme for the pool worsens per-cpu cache locality, this should be amortized
by big sizes of inmemoryPart objects.
2021-07-06 16:28:41 +03:00
Aliaksandr Valialkin
7c6d3981bf lib/uint64set: allow reusing bucket16 structs inside uint64set.Set via uint64set.Release method
This reduces the load on memory allocator in Go runtime in production workload.
2021-07-06 15:35:03 +03:00
Aliaksandr Valialkin
78c9174682 lib/mergeset: increase pool capacity for inmemoryBlock according to collected profiles from production workload
CPU and memory profiles show that the pool capacity for inmemoryBlock objects is too small.
This results in the increased load on memory allocation code in Go runtime.
Increase the pool capacity in order to reduce the load on Go runtime.
2021-07-06 13:41:34 +03:00
Aliaksandr Valialkin
f71e4d1853 lib/mergeset: limit the frequency for flushCallback calls to once per 10 seconds
This should improve hit ratio for tagFiltersCache when big number of new time series are constantly registered
(aka high churn rate). This, in turn, should reduce CPU usage for queries over such time series.
2021-07-06 12:17:17 +03:00
Aliaksandr Valialkin
f3acf065c9 lib/storage: consistency renaming: tagCache -> tagFiltersCache
This improves code readability
2021-07-06 11:03:51 +03:00
Aliaksandr Valialkin
0020b9f904 lib/workingsetcache: properly update stats for requests and cache misses
Previously the stats for cache misses could be improperly counted, because it had inflated cache misses
if the entry was missing in the curr cache, but was existing in the prev cache.

The same applies to cache requests - they were inflated if the entry was missing in the curr cache.
2021-07-06 10:53:32 +03:00
Roman Khavronenko
75f12bfe78 add option to add Copy button for code snippets (#1433)
To add a Copy button wrap code snippet with the following element:
```
<div class="with-copy" markdown="1">

<your-code-snippet>

</div>
```

See the changes to `Kubernetes monitoring with VictoriaMetrics Single` for details.
2021-07-06 08:23:39 +03:00
Aliaksandr Valialkin
4cf47163c1 lib/workingsetcache: fix cache capacity calculations after 4f0003f182 2021-07-05 17:11:57 +03:00
Aliaksandr Valialkin
4f0003f182 lib/workingsetcache: typo fixes after d0c830039d 2021-07-05 15:35:37 +03:00
Aliaksandr Valialkin
d0c830039d lib/storage: tune cache sizes according to production workload 2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
8f973e34fb lib/workingsetcache: properly switch to whole mode
Previously the switch from `split` to `whole` mode had been performed too early,
e.g. when the current cache size became bigger than 1/4 of the allowed cache size.

Now it is performed when the current cache size becomes bigger than 1/2 of the allowed cache size.

This change can reduce memory usage for data ingestion path when big number of active time series are ingested.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
43103be011 lib/{storage,mergeset}: increase cache timeout for data and index blocks from a minute to two minutes
One minute cache timeout result in slower queries in some production workloads where the interval
between query execution is in the range 1 minute - 2 minutes.
2021-07-05 15:16:11 +03:00
Aliaksandr Valialkin
54b9e1d3cb lib/cgroup: set GOGC to 50 by default if it isn't set
This should reduce memory usage for typical VictoriaMetrics workloads by up to 50%
2021-07-05 15:16:11 +03:00
Roman Khavronenko
bd6b8f7e31 move github-pages docs to the main repo (#1432)
* move github-pages docs to the main repo

* rm github actions for copying docs to VictoriaMetrics/VictoriaMetrics.github.io
2021-07-05 14:34:10 +03:00
Aliaksandr Valialkin
888d62e40c docs/CHANGELOG.md: document the bugfix for vm_merge_need_free_disk_space metric at 9a83e9018d 2021-07-05 12:01:08 +03:00
Aliaksandr Valialkin
6bef227388 docs/Articles.md: add an url to https://medium.com/ibm-garage/monitoring-of-multiple-openshift-clusters-with-victoriametrics-d4f0979e2544 2021-07-05 11:51:37 +03:00
Aliaksandr Valialkin
9a83e9018d lib/storage: properly detect free disk space shortage during data merge
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373
2021-07-02 17:40:54 +03:00
Aliaksandr Valialkin
f6bb130898 app/{vmselect,vmstorage}: clarify the description for -dedup.minScrapeInterval command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1426
2021-07-02 15:05:57 +03:00
Aliaksandr Valialkin
7088f17494 lib/promscrape/discovery/consul: use case-insensitive comparison for service names
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1424
2021-07-02 14:49:27 +03:00
Aliaksandr Valialkin
6e406083f2 lib/promauth: cache the client TLS certificate for up to a second
This should reduce CPU usage when TLS connections are established at a high rate.
2021-07-02 13:21:51 +03:00
Aliaksandr Valialkin
a93da746c0 vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.15 to v1.0.16
This should help with proper copying of tls.Config struct after it has been set up in lib/promauth.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-02 12:03:32 +03:00
Aliaksandr Valialkin
1e52f11e87 docs/Cluster-VictoriaMetrics.md: typo fix: siplify -> simplify 2021-07-02 10:50:55 +03:00
Aliaksandr Valialkin
5beb099ff9 docs/Cluster-VictoriaMetrics.md: add a chapter describing a toy cluster setup on a single host
While at it, refer to available tools, which can simplify cluster setup
2021-07-02 10:48:56 +03:00
Aliaksandr Valialkin
158c50c0ee lib/promauth: reload TLS certificates from disk on every mTLS connection as Prometheus does
This allows updating client certificates without the need to restart vmagent and/or single-node VictoriaMetrics.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/470
2021-07-01 15:27:47 +03:00
Aliaksandr Valialkin
02ed4b8b46 docs/CHANGELOG.md: document ae485c2bfd 2021-07-01 11:51:19 +03:00
Aliaksandr Valialkin
c25b839078 lib/workingsetcache: reset the cache mode when the cache is reset
This should reduce memory usage if the working set is reduced after the cache reset.
2021-07-01 11:50:11 +03:00
Nikolay
ae485c2bfd fixes /targets button style (#1423)
* fixes /targets button style
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1422

* updates boostrap version
2021-07-01 11:48:07 +03:00
Aliaksandr Valialkin
00bbe1608b deployment/docker: upgrade alpine image from v3.13.5 to v3.14.0 2021-07-01 10:56:56 +03:00
Roman Khavronenko
a38a6fe8ad dashboard: move panel Disk writes/reads to Resource usage row (#1417)
* dashboard: move panel `Disk writes/reads` to `Resource usage` row

* dashboard: make Stats panel consistent with Cluster dashboard
2021-07-01 05:46:26 +03:00
Aliaksandr Valialkin
c93cee8de8 lib/{mergeset,storage}: reduce the maximum lifetime for cached indexdb and data blocks from 2 minutes to a minute
This should reduce memory usage on a system with high number of active time series and a high churn rate.
One minute is enough for caching the blocks needed for repeated queries (e.g. alerting rules, recording rules and dashboard refreshes).
2021-06-29 19:57:07 +03:00
Aliaksandr Valialkin
fc12484734 lib/mergeset: switch from sync.Pool to a channel for a pool for inmemoryBlock structs
This should reduce memory usage for the pool on systems with big number of CPU cores.

The sync.Pool maintains per-CPU pools, so the total number of objects in the pool
is proportional to the number of available CPU cores. The channel limits the number
of pooled objects by its own capacity. This means smaller number of pooled objects on average.
2021-06-29 19:56:59 +03:00
Aliaksandr Valialkin
9ce211a514 lib/promscrape/discovery/docker: fix golint warning: struct field Id should be ID 2021-06-29 13:12:28 +03:00
Aliaksandr Valialkin
5506cff76e lib/storage: put indexDBName into the key for dateTagFilter cache and for uselessTagFilters cache
This should prevent from stats overwriting when the previous indexdb is queried.
2021-06-29 12:40:05 +03:00
Aliaksandr Valialkin
1b0501a09e lib/promscrape: typo fix in /targets output
The typo has been introduced in fb72a2133f

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1408
2021-06-28 21:26:37 +03:00
Aliaksandr Valialkin
3af2162085 docs/vmagent.md: mention about docker_sd_config support 2021-06-25 20:52:15 +03:00
Aliaksandr Valialkin
008033a374 docs/CHANGELOG.md: cut v1.62.0 2021-06-25 13:29:38 +03:00
Aliaksandr Valialkin
cb5453953f lib/promscrape: split docker and dockerswarm service discovery code bases, since they have very little in common
This is a follow up after c85a5b7fcb
2021-06-25 13:20:20 +03:00
Aliaksandr Valialkin
a69045e440 lib/promscrape: consistently sort service discovery routines
This should simplify further maintenance of the code
2021-06-25 12:10:46 +03:00
Lu Jiajing
c85a5b7fcb Support Docker ServiceDiscovery (#1402)
* add docker discovery

* add test

* add labels test and add scrape work

* remove TODO

* refactor to merge apiConfig and sdConfig

* apply suggestion
2021-06-25 11:42:47 +03:00
Nikolay
434e33da9b adds missing MustStop call to do and http sd (#1404) 2021-06-25 11:39:18 +03:00
Aliaksandr Valialkin
b50e2ec88c vendor: make vendor-update 2021-06-24 17:33:42 +03:00
Aliaksandr Valialkin
4345c07777 docs: consistently put the link to articles and slides about VictoriaMetrics after the links to case studies 2021-06-24 15:37:38 +03:00
Aliaksandr Valialkin
b2fca1ab22 docs/CaseStudies.md: add a case study for DFKI 2021-06-24 15:24:39 +03:00
Aliaksandr Valialkin
906fca9e88 docs/CaseStudies.md: add Groove X case study 2021-06-24 15:04:49 +03:00
Aliaksandr Valialkin
3c3694f72a README.md: add missing linke to Sensedia case study 2021-06-24 14:37:49 +03:00
Aliaksandr Valialkin
d25a161579 docs/CaseStudies.md: add Sensedia case study 2021-06-24 14:36:13 +03:00
Aliaksandr Valialkin
ca42410afd docs/CHANGELOG.md: document the bugfix in increase_pure() function from the commit fb4f758715 2021-06-24 12:05:39 +03:00
Aliaksandr Valialkin
5d64ed73c5 lib/protoparser/clusternative: do not pool unmarshalWork structs, since they can occupy big amounts of memory (more than 100MB per each struct)
This should reduce memory usage for vmstorage under high ingestion rate when the vmstorage runs on a system with big number of CPU cores
2021-06-23 15:46:50 +03:00
Aliaksandr Valialkin
cdfae0117a app/vmselect/promql: return the last timestamp for the max / min value from tmax_over_time() and tmin_over_time() function as most users expect 2021-06-23 14:19:00 +03:00
Aliaksandr Valialkin
70e2852376 docs/CHANGELOG.md: document the bugfix for incorrect stats collection for concurrently executed tag filter
Follow up for c22114c6f0
2021-06-23 14:05:28 +03:00
Aliaksandr Valialkin
94f3e40ab3 app/vminsert/netstorage: sort the -storageNode list passed to vminsert nodes
This should reduce resource usage (CPU, RAM, disk IO) at vmstorage nodes
if the addresses of vmstorage nodes are passed in random order to vminsert nodes.
2021-06-23 14:01:41 +03:00
Aliaksandr Valialkin
c22114c6f0 lib/storage: tune tag filters search logic
Tune the logic according to the logs provided at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-864293624

The previous logic had a race when multiple concurrent queries execute the same tag filter without prior stats.
This could result in incorrectly stored stats for such tag filter, which then could result in non-optimal sorting of tag filters
for further queries.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-23 13:29:39 +03:00
Aliaksandr Valialkin
e8a5bb92b7 lib/promscrape/discovery/consul: properly pass namespace to Consul watcher
Follow-up for 58a2989fe7
2021-06-22 17:42:41 +03:00
Aliaksandr Valialkin
ac54f34f9e lib/promscrape/discovery/http: follow up after e307bbb29a 2021-06-22 13:40:33 +03:00
Aliaksandr Valialkin
755040a171 lib/promscrape/discovery: support generic auth configs in Consul service discovery in the same way as Prometheus 2.28 does 2021-06-22 13:34:02 +03:00
Aliaksandr Valialkin
59e7755df9 docs/CHANGELOG.md: document the support for Consul namepsace
See 58a2989fe7
2021-06-22 13:34:02 +03:00
Nikolay
e307bbb29a adds http_sd (#1399)
* adds http_sd

* adds X-Prometheus-Refresh-Interval-Seconds header

* Update lib/promscrape/discovery/http/api.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 13:33:37 +03:00
Nikolay
58a2989fe7 adds consul enterprise namespace support (#1400)
* adds consul enterprise namespace support

* Update lib/promscrape/discovery/consul/consul.go

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
2021-06-22 12:49:44 +03:00
Aliaksandr Valialkin
3bc83d3b17 docs/PerTenantStatistic.md: document that the per-tenant statistic is a part of cluster version of VictoriaMetrics 2021-06-22 12:43:01 +03:00
Roman Khavronenko
da8c901fab vmctl: add more context to flags description in vm-native mode (#1395) 2021-06-18 19:20:01 +03:00
Aliaksandr Valialkin
a3262daac0 docs/CHANGELOG.md: typo fixes 2021-06-18 19:14:53 +03:00
Aliaksandr Valialkin
83a4db813e app/vmselect: log slow requests to all the /api/v1/* handlers if their execution time exceeds -search.logSlowQueryDuration 2021-06-18 19:04:42 +03:00
Aliaksandr Valialkin
4cfedc5931 docs/Single-server-VictoriaMetrics.md: mention that it is recommended to use a single scrape_interval across all the scrape targets 2021-06-18 15:39:33 +03:00
Aliaksandr Valialkin
570f36b344 app/vmctl: limit JSON line size by 10K samples (#1394)
This should reduce the maximum memory usage at VictoriaMetrics when importing time series with big number of samples.
2021-06-18 15:26:47 +03:00
Aliaksandr Valialkin
eb1af09a04 docs/Cluster-VictoriaMetrics.md: clarify docs about VictoriaMetrics cluster architecture 2021-06-18 14:36:39 +03:00
Aliaksandr Valialkin
cbd9159a22 docs/CHANGELOG.md: document the reduced disk write IO usage
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-18 14:02:59 +03:00
Aliaksandr Valialkin
fb72a2133f lib/promscrape: show jobs with empty scrape targets on /targets page 2021-06-18 10:53:52 +03:00
Aliaksandr Valialkin
d8ab409418 docs/{vmgateway,vmbackupmanager}: explicitly mention that these components are a part of an enterprise package 2021-06-17 17:19:49 +03:00
Nikolay
6c434b260e fixes DO service discovery labels (#1389)
adds test for digitalocean sd
2021-06-17 15:12:20 +03:00
Aliaksandr Valialkin
dcbc22552f lib/storage: fix infinite loop introduced in aa9b56a046 2021-06-17 14:28:10 +03:00
Aliaksandr Valialkin
aa9b56a046 lib/{mergeset,storage}: reduce the number of fsync calls on data ingestion path on systems with many cpu cores
VictoriaMetrics maintains a buffer per CPU core for the ingested data. These buffers are flushed to disk every second.
These buffers are flushed to disk in parallel starting from the commit 56b6b893ce .
This resulted in increased write disk IO usage on systems with many cpu cores
as described at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999 .

This commit merges the per-CPU buffers into bigger in-memory buffers before flushing them to disk.
This should reduce the rate of fsync syscalls and, consequently, the write disk IO on systems with many CPU cores.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
See also https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244
2021-06-17 13:52:08 +03:00
Aliaksandr Valialkin
12a83d25bf app/vmagent/remotewrite: go fmt after 0a796f7c3a 2021-06-17 13:52:06 +03:00
Aliaksandr Valialkin
9eb3fc346f docs/vmagent.md: sync with app/vmagent/README.md via make docs-sync 2021-06-16 12:36:49 +03:00
Aliaksandr Valialkin
6d17a4e12d docs/CHANGELOG.md: document the changed -remoteWrite.queues value
This is a follow-up for 0a796f7c3a

See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1385
2021-06-16 12:35:46 +03:00
Zongyang
0a796f7c3a Change default value of '-remoteWrite.queues' to cgroup.AvailableCPUs * 2 (#1385)
* Change default value of '-remoteWrite.queues' to cgroup.AvailableCPUS() * 2 to reduce scrape interval

Default value of vmagent option '-remotewrite.queues' is 4 and default
size of vmagent ScheudleUnmarshalWorkers is number of CPUs, when available
CPUs is much greater than 4, e.g 32, worker are competing push queues
which will increase scrape interval and may cause scrape timeout.

* Update README and flag description

Co-authored-by: xiaozy <xiaozy01@fenbi.com>
2021-06-16 12:16:44 +03:00
Roman Khavronenko
fb4f758715 promql: fix increase_pure calculation for cases with stale series (#1381)
Due to staleness handling, increase_pure were using incorrect previous value
during calculation in cases where series disappears for period longer
than staleness period and then returns back. The fix suppose to account
for a real datapoint value before staleness takes place. The fix should
remove unexpected spikes while using `increase_pure` for staled series.
2021-06-15 17:37:19 +03:00
Aliaksandr Valialkin
6a8369f0fc docs/Single-server-VictoriaMetrics.md: mention that VictoriaMetrics works great with APM workloads (aka Application Performance Monitoring) 2021-06-15 17:32:52 +03:00
Aliaksandr Valialkin
84fb59b0ba lib/storage: move deletedMetricIDs set from indexDB to Storage
This makes consitent the list of deleted metricIDs when it is used from both the current indexDB and the previous indexDB (aka extDB).
This should fix the issue, which could lead to storing new samples under deleted metricIDs after indexDB rotation.
See more details at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136 .

Thanks to @tangqipengleoo for the initial analysis and the pull request - https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

This commit resolves the issue in more generic way compared to https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1383 .

The downside of the commit is the deletedMetricIDs set isn't cleaned from the metricIDs outside the retention. It needs app restart.
This should be OK in most cases.
2021-06-15 15:04:30 +03:00
Aliaksandr Valialkin
e028ad241a lib/protoparser: stop reading the input stream as soon as the callback provided by the caller returns error
This is a follow-up for af90c3c43b
2021-06-14 15:18:49 +03:00
faceair
af90c3c43b lib/protoparser: stop read when callback error (#1380) 2021-06-14 15:10:58 +03:00
Aliaksandr Valialkin
36d55bff66 lib/promscrape: show the number of samples collected during the last scrape at /targets and /api/v1/targets pages
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1377
2021-06-14 14:04:00 +03:00
Aliaksandr Valialkin
7b283ee91c vendor: update github.com/klauspost/compress from v1.13.0 to v1.13.1 2021-06-14 13:42:46 +03:00
Roman Khavronenko
a90012ef26 dashboard: bump version requirements (#1378) 2021-06-14 13:31:59 +03:00
Aliaksandr Valialkin
05bc9667c1 docs/CHANGELOG.md: document the addition of DigitalOcean service discovery
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367
2021-06-14 13:18:19 +03:00
Nikolay
729c4eeb9c adds digital ocean sd (#1376)
* adds digital ocean sd config

* adds digital ocean sd
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367

* typo fix
2021-06-14 13:15:04 +03:00
Roman Khavronenko
b8526e88d3 Dashboard single (#1374)
* dashboard: update single version dash

The update contains the following changes:
* display anonymous memory usage metric. This metric suppose to reflect
memory usage of the process which can't be freed by OS;
* add legends to all panels. This is important for cases when users share
the screenshots;
* modify panels for Grafana v8.0.0

* dashboard: update single version dash tags

* dashboard: update vmagent dash

The update contains the following changes:
* display anonymous memory usage metric. This metric suppose to reflect
memory usage of the process which can't be freed by OS;
* add legends to all panels. This is important for cases when users share
the screenshots;
* modify panels for Grafana v8.0.0
2021-06-14 13:03:23 +03:00
Aliaksandr Valialkin
06b8e7d148 lib/promscrape: increase the duration for reading the full response in stream parsing mode
Increase the duration from 10x to 30x of the configured `scrape_interval'.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:28:09 +03:00
Aliaksandr Valialkin
48210130ac lib/protoparser: measure the duration for reading the whole block of data instead of a single read operation
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:25:52 +03:00
Aliaksandr Valialkin
3e5b6bae66 docs/Cluster-VictoriaMetrics.md: add lists for command-line flags for cluster components 2021-06-14 12:22:05 +03:00
Aliaksandr Valialkin
3c4366806c lib/protoparser/common: log the duration for reading a block of data in ReadLinesBlockExt on error
This may help debugging issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1365
2021-06-14 12:22:04 +03:00
Aliaksandr Valialkin
8a519f1518 docs/vmalert.md: follow-up after 6d5a8c28cd 2021-06-14 11:37:26 +03:00
Roman Khavronenko
6d5a8c28cd Vmalert docs (#1372)
* vmalert: mention what happens if `for` is set to 0 or omitted

* vmalert: add more context to docs
2021-06-11 13:25:53 +03:00
Aliaksandr Valialkin
2ef2e3d6f8 docs/CHANGELOG.md: cut v1.61.1 2021-06-11 13:01:31 +03:00
Aliaksandr Valialkin
ed83558646 app/vmauth: properly handle http.ErrAbortHandler panic
This panic can be raised by the reverseProxy on aborted request to the backend.
So handle it (e.g. suppress) at reverseProxy.ServeHTTP call.

Do not suppress the panic at lib/httpserver generic HTTP handler,
since it may result in an inconsistent state left after the panicking handler.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
2021-06-11 12:50:25 +03:00
Aliaksandr Valialkin
c4f3fbfa5d lib/storage: reset cache on disk during series deletion and during indexdb rotation
This should prevent from inconsistent behavior (aka partially missing data for some time series) after unclean shutdown.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347
2021-06-11 12:42:28 +03:00
Aliaksandr Valialkin
d979e14da2 docs/CHANGELOG.md: document the bugfix from 7adfe878e1 2021-06-11 11:28:10 +03:00
Roman Khavronenko
7adfe878e1 vmalert: fix mistake with object reuse while parsing response (#1370)
* vmalert: fix mistake with object reuse while parsing response

During the refactoring, the wrong optimisations was applied in
parse function which caused metric fields reset. The change removes
optimisation.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369

* vmalert: add test to cover multiple metrics in one response
2021-06-11 11:22:05 +03:00
Aliaksandr Valialkin
69b1482bdb lib/storage: consistency renaming: getMaxRawRowsPerPartition -> getMaxRawRowsPerShard 2021-06-11 10:57:23 +03:00
Aliaksandr Valialkin
044ab46824 lib/storage: reduce the amounts of memory which can be occupied by rawRow items during data ingestion on a system with many CPU cores 2021-06-11 10:57:23 +03:00
John Belmonte
67b17cdd68 spelling fix: synonym (#1363) 2021-06-10 08:32:52 +03:00
Aliaksandr Valialkin
b4008c1e65 vendor: make vendor-update 2021-06-09 20:40:50 +03:00
Aliaksandr Valialkin
b83a51366e docs/CHANGELOG.md: cut v1.61.0 2021-06-09 19:04:44 +03:00
Aliaksandr Valialkin
10d105ee25 docs/FAQ.md: add a chapter comparing VictoriaMetrics to QuestDB 2021-06-09 19:03:23 +03:00
Aliaksandr Valialkin
d0dca62026 app/vmselect/promql: typo fix in the comment 2021-06-09 18:34:31 +03:00
Aliaksandr Valialkin
46d2c7d640 docs/Articles.md: update the broken link to https://nordicapis.com/api-monitoring-with-prometheus-grafana-alertmanager-and-victoriametrics/ 2021-06-09 16:39:56 +03:00
Aliaksandr Valialkin
2422b5091f app/vmauth: improve readability for a config with multiple src_paths 2021-06-09 15:39:32 +03:00
Aliaksandr Valialkin
329b6cd146 docs/CHANGELOG.md: document the enterprise bugfix for the target property in Graphite Render API 2021-06-09 13:50:52 +03:00
Aliaksandr Valialkin
db1c548cb4 docs/CHANGELOG.md: document improvements in re-routing handling in vminsert
See the following commits:

* 1c09e71f5b
* 0d067eb112
* 2c6b917749
2021-06-09 13:40:37 +03:00
Aliaksandr Valialkin
f93a31b490 docs/vmagent.md: mention that vmagent supports scrape targets sharding 2021-06-09 12:29:34 +03:00
Aliaksandr Valialkin
ab15bf8c90 docs: document rules replay feature for vmalert
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

This is a follow-up for 2a259ef5e7
2021-06-09 12:27:34 +03:00
Roman Khavronenko
2a259ef5e7 vmalert: support rules backfilling (aka replay) (#1358)
* vmalert: support rules backfilling (aka `replay`)

vmalert can `replay` configured rules in the past
and backfill results via remote write protocol.
It supports MetricsQL/PromQL storage as data source,
and can backfill data to remote write compatible
storage.

Supports recording and alerting rules `replay`. See more
details in README.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836

* vmalert: review fixes

* vmalert: readme fixes
2021-06-09 12:20:38 +03:00
dependabot[bot]
408fc90b40 build(deps): bump codecov/codecov-action from 1.5.0 to 1.5.2 (#1362)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 1.5.0 to 1.5.2.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v1.5.0...v1.5.2)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-09 12:16:46 +03:00
Roman Khavronenko
5e9f3777bf alerts: add new alert LabelsLimitExceededOnIngestion (#1359) 2021-06-09 12:15:36 +03:00
Aliaksandr Valialkin
c16edf8287 docs/CHANGELOG.md: document the bugfix, which prevents panics for aborted http requests in vmauth
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353

This is a follow-up for 6b29b955c0
2021-06-09 12:11:53 +03:00
Nikolay
6b29b955c0 disables panic for net/httpAbortHandler (#1355) 2021-06-09 12:08:58 +03:00
Aliaksandr Valialkin
28c44ef065 deployment/docker/docker-compose.yml: update Grafana from v7.5.2 to v8.0.0
See https://github.com/grafana/grafana/releases/tag/v8.0.0
2021-06-09 02:25:24 +03:00
k1rk
668165f53d rename serviceHealth group name to vm-health (#1360)
this causes conflicts in `victoria-metrics-k8s-stack` chart =)
2021-06-08 23:34:38 +03:00
Aliaksandr Valialkin
c0feafefc8 vendor: make vendor-update 2021-06-08 15:49:30 +03:00
Aliaksandr Valialkin
8a7e6ad5cc deployment/docker: update Go builder from v1.16.4 to v1.16.5
See the fixed isses at https://github.com/golang/go/issues?q=milestone%3AGo1.16.5+label%3ACherryPickApproved
2021-06-08 15:45:44 +03:00
Aliaksandr Valialkin
645e18dd88 vendor: update github.com/klauspost/compress from v1.12.3 to v1.13.0 2021-06-08 15:45:43 +03:00
Aliaksandr Valialkin
96b691a0ab lib/storage: properly account the number of loops spent when matching for or suffixes
This may help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338
2021-06-08 13:06:12 +03:00
Aliaksandr Valialkin
661d2668f8 lib/promrelabel: add tests for labelsToString() function 2021-06-04 20:42:46 +03:00
Aliaksandr Valialkin
78f83dc5ad app/{vmagent,vminsert}: follow-up after 2fe045e2a4
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1343
2021-06-04 20:27:58 +03:00
jelmd
2fe045e2a4 new feature: debug relabeling (#1344)
* new feature: relabel logging

Use scrape_configs[x].relabel_debug = true to log metric names inkl.
labels before and after relabeling. After relabeling related metrics
get dropped, i.e. not submitted to servers.

* vminsert wants relabel logging, too.
2021-06-04 17:50:23 +03:00
Aliaksandr Valialkin
5ac25d2585 docs/CHANGELOG.md: document the bugfix from 6f19bb23a1 2021-06-04 11:53:00 +03:00
Hason Chan
6f19bb23a1 fix eureka_sd_configs HTTPClientConfig incorrect parsing (#1350) 2021-06-04 11:47:17 +03:00
Aliaksandr Valialkin
f963f04d3d app/vminsert: add -disableRerouting command-line flag for disabling re-routing if some vmstorage nodes have lower performance than the others 2021-06-04 04:42:01 +03:00
Aliaksandr Valialkin
2d8bd41f8a lib/storage: reduce memory allocations when syncing dateMetricIDCache 2021-06-03 16:20:42 +03:00
Aliaksandr Valialkin
d2d746c4fc docs/CHANGELOG.md: document that it is possible to build VictoriaMetrics components for Solaris
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1322

This is a follow-up for ddc8022702
2021-05-31 09:33:29 +03:00
Nikolay
ddc8022702 fixes solaris build (#1345) 2021-05-31 09:21:23 +03:00
Aliaksandr Valialkin
de8e4b6223 vendor: update github.com/VictoriaMetrics/fastcache from v1.5.8 to v1.6.0 2021-05-31 09:14:25 +03:00
Aliaksandr Valialkin
b22e380a34 app/vmauth: allow balancing the load among multiple backend nodes by specifying multiple urls in url_prefix config 2021-05-29 01:03:37 +03:00
Aliaksandr Valialkin
ffa4cd6fa5 docs/Articles.md: add a link to https://www.percona.com/blog/2021/05/26/compiling-a-percona-monitoring-and-management-v2-client-in-arm-raspberry-pi-3/ 2021-05-28 14:34:31 +03:00
Aliaksandr Valialkin
9315e3ded3 vendor: make vendor-update 2021-05-28 13:18:37 +03:00
Aliaksandr Valialkin
cecbdee00d docs/MetricsQL.md: add a link to technical details about rate() and increase() calculations in Prometheus and VictoriaMetrics 2021-05-28 13:13:50 +03:00
Aliaksandr Valialkin
634eeac804 docs/Single-server-VictoriaMetrics.md: remove misleading wording about querying Graphite metrics with MetricsQL 2021-05-28 02:39:14 +03:00
Aliaksandr Valialkin
a52a20659a lib/promscrape: fix tests after f0c21b6300 2021-05-28 01:32:50 +03:00
Aliaksandr Valialkin
d088923aef Revert "lib/mergeset: remove a pool for inmemoryBlock structs"
This reverts commit 793fe39921.

Reason to revert: production testing revealed possible slowdown when registering big number of new time series
2021-05-28 01:09:32 +03:00
Aliaksandr Valialkin
793fe39921 lib/mergeset: remove a pool for inmemoryBlock structs
The pool for inmemoryBlock struct doesn't give any performance gains in production workloads,
while it may result in excess memory usage for inmemoryBlock structs inside the pool during
background merge of indexdb.
2021-05-27 21:57:33 +03:00
Aliaksandr Valialkin
60341722d5 docs: document f0c21b6300 2021-05-27 15:03:30 +03:00
faceair
f0c21b6300 lib/promscrape: apply body size & sample limit to stream parse (#1331)
* lib/promscrape: apply body size limit to stream parse

Signed-off-by: faceair <git@faceair.me>

* lib/promscrape: apply sample limit to stream parse

Signed-off-by: faceair <git@faceair.me>
2021-05-27 14:52:44 +03:00
Aliaksandr Valialkin
eda6ee40b6 docs: make docs-sync after 2bbb1cc7c1 2021-05-26 12:32:16 +03:00
Roman Khavronenko
2bbb1cc7c1 Docs review (#1330)
* re-order components by prioritizing Cluster-VictoriaMetrics.md

* drop Home.md since it just duplicates other links
2021-05-26 12:28:58 +03:00
Roman Khavronenko
7ecaa2fe2c update the issue template (#1329)
The main changes are:
* ask for Grafana's dashboard screenshots;
* ask only for non-default cmd-line flags;
* explicitly ask about logs;
2021-05-26 12:26:45 +03:00
Aliaksandr Valialkin
7f531e3a60 docs/CHANGELOG.md: document changes from 2233d6ed8a and d210958fd0 2021-05-26 12:23:22 +03:00
Roman Khavronenko
d210958fd0 vmalert: automatically reload configuration on file change (#1326)
New flag `-rule.configCheckInterval` defines how often `vmalert` will re-read
config file. If it detects any changes, the config will be reloaded.
This behaviour is turned off by default.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512
2021-05-25 14:27:22 +01:00
Aliaksandr Valialkin
2233d6ed8a lib/uint64set: store pointers to bucket16 instead of bucket16 objects in bucket32
This speeds up bucket32.addBucketAtPos() when bucket32.buckets contains big number of items,
since the copying of bucket16 pointers is much faster than the copying of bucket16 objects.

This is a cpu profile for copying bucket16 objects:

      10ms     13.43s (flat, cum) 32.01% of Total
      10ms      120ms    650:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    651:	b.b16his[pos] = hi
         .     13.31s    652:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .          .    653:	b16 := &b.buckets[pos]
         .          .    654:	*b16 = bucket16{}
         .          .    655:	return b16
         .          .    656:}

This is a cpu profile for copying pointers to bucket16:

      10ms      1.14s (flat, cum)  2.19% of Total
         .      100ms    647:	b.b16his = append(b.b16his[:pos+1], b.b16his[pos:]...)
         .          .    648:	b.b16his[pos] = hi
      10ms      700ms    649:	b.buckets = append(b.buckets[:pos+1], b.buckets[pos:]...)
         .      330ms    650:	b16 := &bucket16{}
         .          .    651:	b.buckets[pos] = b16
         .          .    652:	return b16
         .          .    653:}
2021-05-25 14:20:52 +03:00
Aliaksandr Valialkin
fd264477bf snap: update Go builder from Go 1.15 to Go 1.16 2021-05-25 12:12:37 +03:00
Dan Fredell
c78d7dde92 Fix quote difference on label_move example (#1321)
Fix quote difference on label_move example
2021-05-24 23:20:38 +03:00
Aliaksandr Valialkin
08234aa7a0 docs/CHANGELOG.md: cut v1.60.0 2021-05-24 15:55:08 +03:00
Aliaksandr Valialkin
a47d4927d2 docs/Single-server-VictoriaMetrics.md: clarify that the storage size depends on the number of samples per series 2021-05-24 15:47:44 +03:00
Aliaksandr Valialkin
b1e8d92577 docs/vmalert.md: sync with app/vmalert/README.md via make docs-sync 2021-05-24 15:47:20 +03:00
Aliaksandr Valialkin
8e7d1f8824 app/vmagent/remotewrite: use WARN level instead of ERROR level for couldnt send a block with size ... bytes to ... log message
This is really warning, since vmagent re-tries sending the data block until success.
2021-05-24 15:43:59 +03:00
Aliaksandr Valialkin
39ef1e7a51 lib/storage: do not stop data ingestion on the first error in Storage.AddRows
Continue data ingestion for the rest of blocks.
2021-05-24 15:32:47 +03:00
Aliaksandr Valialkin
4b01c9fb2e lib/storage: limit the number of rows per each block in Storage.AddRows()
This should reduce memory usage when ingesting big blocks or rows.
2021-05-24 15:24:07 +03:00
Aliaksandr Valialkin
a4ff4b8e65 lib/storage: allow filling all the rows up to their capacity in rawRowsShard.addRows
This should reduce memory usage a bit on data ingestion path
2021-05-24 15:22:59 +03:00
Aliaksandr Valialkin
a46551245c lib/bloomfilter: fix TestLimiterConcurrent 2021-05-24 05:17:36 +03:00
Aliaksandr Valialkin
eafed5335d vendor: update github.com/valyala/gozstd from v1.10.0 to v1.11.0 2021-05-24 04:59:55 +03:00
Aliaksandr Valialkin
93d81b486d lib/fs: do not pass done callback to tryRemoveAll() func
This improves code readability a bit.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-24 04:51:57 +03:00
Aliaksandr Valialkin
f54133b200 lib/storage: do not populate MetricID->MetricName cache during data ingestion
This cache isn't needed during data ingestion, so there is no need in spending RAM on it.

This reduces RAM usage on data ingestion path by 30%
2021-05-24 03:02:46 +03:00
Aliaksandr Valialkin
24858820b5 docs/CHANGELOG.md: small typo fix 2021-05-23 14:15:01 +03:00
Aliaksandr Valialkin
04eb37a590 docs/CHANGELOG.md: document the addition of extra_filter_labels at 84cc0513e1 2021-05-23 14:10:33 +03:00
Aliaksandr Valialkin
ec79abc382 lib/{mergeset,storage}: reduce the number of IFNO log messages like merged ... items across ... blocks in ... seconds
Log these messages if the merge takes more than 30 seconds instead of 10 seconds.
2021-05-23 14:03:21 +03:00
Roman Khavronenko
84cc0513e1 vmalert: support extra_filter_labels setting per-group (#1319)
The new setting `extra_filter_labels` may be assigned to group.
If it is, then all rules within a group will automatically filter
for configured labels. The feature is well-described here
https://docs.victoriametrics.com#prometheus-querying-api-enhancements

New setting is compatible only with VM datasource.
2021-05-23 00:26:01 +03:00
Aliaksandr Valialkin
78dddfb98f lib/promauth: follow-up after 5b8176c68e 2021-05-22 18:01:11 +03:00
Nikolay
5b8176c68e basic OAuth2 support for remoteWrite and scrape targets (#1316)
* adds OAuth2 support for remoteWrite and scrapping

* adds tests
changes init
2021-05-22 16:20:18 +03:00
Aliaksandr Valialkin
e05dd475f0 lib/fs: concurrently remove up to 1024 blocked NFS directories
Previously the blocked directories were removed sequentially by a single goroutine.
This can be not enough for highly loaded VictoriaMetrics that accepts millions of sample per second,
when big number of LSM parts are created and removed at high rate.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:57:46 +03:00
Aliaksandr Valialkin
8e2985b53d lib/fs: wait for a while before giving up on NFS file removal if the removal queue is full
This should reduce the probability of the panic on a highly loaded VictoriaMetrics
accepting millions of samples per second.

See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313
2021-05-21 17:21:00 +03:00
Aliaksandr Valialkin
d173a9348c docs/MetricsQL.md: add a link to a list of supported timezones that can be passed to timezone_offset() function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306
2021-05-21 16:55:24 +03:00
Aliaksandr Valialkin
14ec2b9f26 docs/CHANGELOG.md: mention the bugfix from d626c5c2a9
Updates https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 16:37:14 +03:00
Aliaksandr Valialkin
c54bb73867 all: do not skip SIGHUP signal during service initialization
This can lead to stale or incomplete configs like in the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-21 16:34:06 +03:00
Nikolay
d626c5c2a9 changes vmalert query function (#1307)
* changes vmalert query function
for prometheus rules compatibility its better to use labels as map.
it simplifies template evaluation and allow to ignore can't evaluate field error
because map will return default value.
fixes https://github.com/VictoriaMetrics/operator/issues/243
2021-05-21 13:55:43 +03:00
Aliaksandr Valialkin
060664e221 docs/FAQ.md: re-order questions to be more attractive to visitors 2021-05-20 19:49:34 +03:00
Aliaksandr Valialkin
49ecbc765d app/vmauth: add ability to protect /-/reload endpoint with authKey 2021-05-20 18:47:01 +03:00
Aliaksandr Valialkin
362a49bdd1 vendor: make vendor-update 2021-05-20 18:46:26 +03:00
Aliaksandr Valialkin
4c7bb75fa2 Makefile: update golangci-lint from v1.29.0 to v1.40.1 2021-05-20 18:27:10 +03:00
Aliaksandr Valialkin
0d3e78b9ee docs/CHANGELOG.md: move tip to proper place 2021-05-20 17:56:03 +03:00
Aliaksandr Valialkin
480087944a docs/FAQ.md: add a question on how to run VictoriaMetrics on FreeBSD
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:16:35 +03:00
Aliaksandr Valialkin
49c39ab388 docs/FAQ.md: add can I use VictoriaMetrics instead of Prometheus?
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:10:36 +03:00
Aliaksandr Valialkin
6cc6a032cd docs/FAQ.md: add a question about memory limits for VictoriaMetrics components
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 16:09:41 +03:00
Aliaksandr Valialkin
d06aae9454 docs/FAQ.md: add a question about multi-tenancy
The question has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1284
2021-05-20 15:52:40 +03:00
Aliaksandr Valialkin
e394ff6466 app/vmagent/remotewrite: expose metrics with the current number of active series per day and per hour
These numbers are exposed via the following metrics:

- vmagent_hourly_series_limit_current_series
- vmagent_daily_series_limit_current_series

Expose also the limits via the following metrics:

- vmagent_hourly_series_limit_max_series
- vmagent_daily_series_limit_max_series
2021-05-20 15:28:09 +03:00
Aliaksandr Valialkin
ad73f226ff app/vmstorage: add ability to limit series cardinality via -storage.maxHourlySeries and -storage.maxDailySeries command-line flags 2021-05-20 14:15:19 +03:00
Aliaksandr Valialkin
7e526effaa app/vmagent: add ability to limit series cardinality on a per-hour and per-day basis 2021-05-20 13:13:40 +03:00
Aliaksandr Valialkin
3cd8606abd docs/CHANGELOG.md: document the bugfix in vmctl import for InfluxDB lines with identical names for field and tag
See dcf8803bbd

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299
2021-05-20 12:06:16 +03:00
Aliaksandr Valialkin
98e425ee09 docs/CHANGELOG.md: refer to the issue related to timezone_offset() function 2021-05-20 12:03:39 +03:00
Roman Khavronenko
dcf8803bbd vmctl: explicitly set ::tag type for labels selector in influx mode (#1310)
The `::tag` type is needed in cases when field and tag names are equal, which
results into unexpected results in InfluxQL. Setting the type explicitly helps
InfluxDB to understand which exact column we apply filter to.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299
2021-05-20 12:03:16 +03:00
Aliaksandr Valialkin
22585531ad lib/promscrape/discovery/kubernetes: make golangci-lint happy by removing empty branches 2021-05-20 12:00:29 +03:00
Aliaksandr Valialkin
0842bb9294 app/vmselect/promql: add timezone_offset(tz) function
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306
2021-05-20 11:53:09 +03:00
Aliaksandr Valialkin
009e136d88 lib/storage: remove possible data race when logging dropped labels 2021-05-20 02:47:22 +03:00
Aliaksandr Valialkin
f7e73fbe8b app/vmagent/remotewrite: sort labels before sending the series to per-remoteWrite.url queues 2021-05-20 02:12:36 +03:00
Aliaksandr Valialkin
eb8093ca6b lib/promscrape/discovery/kubernetes: reload objects on object parse error
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 23:25:48 +03:00
Neo He
e8a6c6927d app/{vmbackup,vmrestore},docs/vmrestore.md: typo fix: vbackup -> vmbackup (#1305) 2021-05-18 16:37:28 +03:00
Aliaksandr Valialkin
f4719889da lib/httpserver: typo fix in -http.shutdownDelay command-line flag description: servier -> server 2021-05-18 16:26:16 +03:00
Aliaksandr Valialkin
b30925738b docs/vmalert.md: document multitenant support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740
2021-05-18 16:26:14 +03:00
Aliaksandr Valialkin
0cf1fe19e6 lib/promscrape/discovery/kubernetes: simplify the reload logic for urlWatcher.objectsByKey 2021-05-18 15:40:46 +03:00
Aliaksandr Valialkin
bfb61de606 lib/promscrape/discovery/kubernetes: properly update vm_promscrape_discovery_kubernetes_scrape_works metric
Previously it wasn't descreased during config update.
2021-05-18 15:33:45 +03:00
Aliaksandr Valialkin
3166994244 lib/promscrape/discovery/kubernetes: log errors and stop service discovery when unexpected updates are received from Kubernetes API server
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-18 15:10:48 +03:00
Aliaksandr Valialkin
66aba00549 app/vmauth: reload -auth.config on the request to /-/reload
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1194
2021-05-18 02:23:55 +03:00
Aliaksandr Valialkin
3339ea41e7 docs/vmbackup.md: typo fix: snaphosts -> snapshots
Thanks to @jelmd - see 1ab27582a3 (r50884395)
2021-05-18 01:12:36 +03:00
Aliaksandr Valialkin
6c944b86d8 docs: dealay -> delay
Thanks to @jelmd . See 0b7e3510c8 (r50884991)
2021-05-18 01:07:52 +03:00
Aliaksandr Valialkin
ec6a284978 lib/promrelabel: add tests for conditional removal of label on another label match
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1294
2021-05-18 00:23:03 +03:00
Aliaksandr Valialkin
2eed410466 lib/promscrape/discovery/kubernetes: key ScrapeWork objects by urlWatcher instead of namespace
This makes the code less fragile if urlWatcher would depend on additional to namepsace properties.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-17 23:47:08 +03:00
Aliaksandr Valialkin
ede2ba5a45 docs/CHANGELOG.md: document b38edec7ee
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-17 01:57:34 +03:00
Aliaksandr Valialkin
55ed77760b vendor: make vendor-update 2021-05-17 01:47:33 +03:00
Aliaksandr Valialkin
2ec6f81b10 vendor: update github.com/valyala/gozstd from v1.9.0 to v1.10.0 2021-05-17 01:47:33 +03:00
Roman Khavronenko
b96b19f040 Docs update from victoriaMetrics.github.io (#1302)
* port change from 11ca65677b

* port change from afb41dfa43

* port change from f82e3733c9

* port change from d499ab0502
2021-05-16 18:43:09 +01:00
Roman Khavronenko
b38edec7ee vmalert: use stringified label keys for duplicates map in recroding rules (#1301)
duplicates map helps to determine wheter extra labels has overriden
labels which make time series unique. It was using a sorted hashed
labels sequence as a key. But hashing algorithm could have collisions,
so it is more convenient to not use hashing at all.

Log message for recording rules duplicates was improved as well.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293
2021-05-15 13:25:57 +03:00
Aliaksandr Valialkin
733706e6c6 lib/promscrape: reload auth tokens from files every second
Previously auth tokens were loaded at startup and couldn't be updated without vmagent restart.
Now there is no need in vmagent restart.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297
2021-05-14 20:00:08 +03:00
Aliaksandr Valialkin
a15145e597 docs/Articles.md: add a link to https://fly.io/blog/measuring-fly/ 2021-05-14 19:47:43 +03:00
Aliaksandr Valialkin
24cbf8b66c docs/vmauth.md: sync with app/vmauth/README.md after 10a47af631 2021-05-14 18:13:10 +03:00
Aliaksandr Valialkin
10a47af631 app/{vmalert,vmauth}: explicitly set MaxIdleConnsPerHost in net/http.Client.Transport
By default MaxIdleConnsPerHost is set to 2. This limits the possibility to re-use http keep-alive connections.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300
2021-05-14 18:12:24 +03:00
Aliaksandr Valialkin
e6ec442e96 docs/Single-server-VictoriaMetrics.md: document how to reduce memory usage when importing too long JSON lines into VictoriaMetrics
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1295
2021-05-14 17:22:08 +03:00
Denys Holius
70165a6758 Fix Cortex typo 2021-05-13 16:26:05 +03:00
Aliaksandr Valialkin
a422165dc6 app/vmagent/remotewrite: clarify the comment explaining why vmagent drops blocks if remote storage returns 400 or 409 status code 2021-05-13 16:16:16 +03:00
Aliaksandr Valialkin
fc3519fa26 lib/promscrape: limit scrape_timeout by scrape_interval like Prometheus does 2021-05-13 16:09:45 +03:00
Aliaksandr Valialkin
1f75ae6006 docs/CHANGELOG.md: document the bugfix from b4f5be8bd8 2021-05-13 11:18:39 +03:00
匠心零度
b4f5be8bd8 fix vagent imbalance problem (#1292)
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...

Co-authored-by: lirenzuo <lirenzuo@shein.com>
2021-05-13 11:14:51 +03:00
Aliaksandr Valialkin
e6fda03e8f vendor: update github.com/VictoriaMetrics/fasthttp from v1.0.14 to v1.0.15
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:44:14 +03:00
Aliaksandr Valialkin
f6a641de62 lib/promscrape: exponentially increase retry interval on unsuccesful requests to scrape targets or to service discovery services
This should reduce CPU load at vmagent and at remote side when the remote side doesn't accept HTTP requests.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289
2021-05-13 10:38:50 +03:00
Aliaksandr Valialkin
c0ec541559 lib/cgroup: document the ability to detect cgroup v2 memory and cpu limits. This is follow-up for b50024812e 2021-05-13 09:26:20 +03:00
Aliaksandr Valialkin
d7be2753c0 lib/storage: substitute GetTSDBStatusForDate with GetTSDBStatusWithFiltersForDate with nil tfss 2021-05-13 09:02:33 +03:00
Nikolay
b50024812e adds cgroupsv2 support (#1283)
* adds cgroupv2 limits support
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269

* small fix

* changes Atoi to ParseUint
2021-05-13 09:02:13 +03:00
Aliaksandr Valialkin
a22a17dc66 lib/storage: merge getTSDBStatusForDate with getTSDBStatusWithFiltersForDate
These functions are non-trivial, while their code has minimal differences.
It is better from maintainability PoV to merge these functions into a single function.
2021-05-12 17:56:53 +03:00
Aliaksandr Valialkin
beddc0c0d5 docs/Single-server-VictoriaMetrics.md: typo fix: retuns->returns 2021-05-12 17:22:34 +03:00
Aliaksandr Valialkin
832651c6c2 app/vmselect: follow up after 8a0678678b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168
2021-05-12 17:18:30 +03:00
Nikolay
8a0678678b Adds tsdb match filters (#1282)
* init work on filters

* init propose for status filters

* fixes tsdb status
adds test

* fix bug

* removes checks from test
2021-05-12 15:18:45 +03:00
Aliaksandr Valialkin
cfd6aa28e1 lib/promscrape/discovery/kubernetes: refresh endpoints and endpointslices scrape targets every 5 seconds, since they may depend on changed service and pod objects
This should make endpoints and endpointslices scrape targets eventually consistent with the maximum delay of 5 seconds after the related service or pod object changes.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-12 14:10:34 +03:00
Aliaksandr Valialkin
afec68ad13 lib/httpserver: add new X-Server-Hostname header instead of overwriting already exsiting header
This makes possible tracking origins of chained requests over multiple hops.
2021-05-11 23:48:59 +03:00
Aliaksandr Valialkin
f2d5c4e2d0 lib/httpserver: return X-Server-Hostname http header in all the responses for better debuggability 2021-05-11 22:03:48 +03:00
Aliaksandr Valialkin
33f7bacb01 lib/storage: properly apply time range when matching an empty filter
It must match all the time series on the given time range.
Previously it was matched to all the time series without the restriction on the given time range.
2021-05-11 01:12:05 +03:00
Aliaksandr Valialkin
3fb3ce2a6d Revert ".github/dependabot.yml: remove automated dependency version checks"
This reverts commit 5b986c95dd.

This check verifies only dependencies needed for github-actions. This is OK.
2021-05-10 12:05:09 +03:00
Aliaksandr Valialkin
8b65920a8b Add make check-licenses rule for the ability to manually check licenses in vendored dependencies
This is a follow-up for c687536956
2021-05-10 11:54:43 +03:00
Aliaksandr Valialkin
5b986c95dd .github/dependabot.yml: remove automated dependency version checks
Dependency updates must be under manual control, since the resulting code diffs must be reviewed manually for the sake of security.
It is done with `make vendor-update` now.
2021-05-10 11:41:23 +03:00
Artem Navoiev
c687536956 Add vendor license checker, update codecov action, add dependbot for … (#1280)
* Add vendor license checker, update codecov action, add dependbot for github actions

* update gitingore, temprorary turn on check

* fix action name

* change action rules to trigger only when vendor changes

* remove obsolete line from main action
2021-05-10 11:38:56 +03:00
Aliaksandr Valialkin
229d9d6dd7 docs/CHANGELOG.md: document -datasource.roundDigits added at 5c448126dc 2021-05-10 11:18:26 +03:00
Roman Khavronenko
5c448126dc vmalert: add support for round_digits param in datasource package (#1278)
Starting from v1.56.0 VM supports `round_digits` which allows to limit
the number of digits after the decimal point in response value. The feature
can be used to reduce entropy of produced by recording rules values
and significantly improve the compression. See more details in link below.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525
2021-05-10 11:11:45 +03:00
Aliaksandr Valialkin
3b0966c00c docs/CHANGELOG.md: document vmalert fix for state restoration on startup 2021-05-10 11:10:04 +03:00
Roman Khavronenko
4247168a2d vmalert: fix error when rule didn't start if restore failed (#1279)
Previously, `startGroup` could exit on restore errors despite the
`remoteRead.ignoreRestoreErrors` flag value. Now vmalert checks the
flag value before deciding whether to return error or just log it.
2021-05-10 11:06:31 +03:00
Aliaksandr Valialkin
6ff19096be docs/vmagent.md: add stream parsing mode chapter 2021-05-08 23:14:07 +03:00
Aliaksandr Valialkin
cbd0569ce2 docs/CHANGELOG.md: mention the comment, which gives an example of multi-level vminsert setup 2021-05-08 22:57:04 +03:00
Aliaksandr Valialkin
120927ab65 vendor: make vendor-update 2021-05-08 20:21:57 +03:00
Aliaksandr Valialkin
75c2c813fc lib/storage: remove dead code after the commit 3ccf7ea20c 2021-05-08 20:14:11 +03:00
Aliaksandr Valialkin
f8d50e9641 deployment/dm: update Go builder from v1.16.3 to v1.16.4
See https://github.com/golang/go/issues?q=milestone%3AGo1.16.4+label%3ACherryPickApproved for details
2021-05-08 20:04:05 +03:00
Aliaksandr Valialkin
7aea5f58c4 lib/ingestserver: properly close incoming connections during graceful shutdown 2021-05-08 19:52:58 +03:00
Aliaksandr Valialkin
12d733dd5d app/vminsert: add support for data ingestion via other vminsert nodes 2021-05-08 19:52:57 +03:00
Aliaksandr Valialkin
237e9f9fd7 app/vmalert: add missing comment for ErrStateRestore 2021-05-08 15:59:35 +03:00
Aliaksandr Valialkin
d6439490f5 docs/Single-server-VictoriaMetrics.md: add links to vmauth and vmgateway as auth proxy examples 2021-05-07 10:46:05 +03:00
Aliaksandr Valialkin
79ec35cef9 app/vmbackup: make sure that -snapshotName isnt set if -snapshot.createURL is set 2021-05-07 08:44:25 +03:00
Aliaksandr Valialkin
d9e3872b1c docs/CHANGELOG.md: document 904bbffc7f
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 20:33:05 +03:00
Aliaksandr Valialkin
4bea1afc6d docs/CHANGELOG.md: document 9cdd4696fe
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 20:28:58 +03:00
Aliaksandr Valialkin
904bbffc7f lib/promscrape/discovery/kubernetes: start watchers for pods and services before starting watchers for endpoints
This should eliminate possible race when an update on endpoints depends on pods and/or services, which are missing in the cache yet.
This could result in missing targets based on endpoints or endpointslices.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
2021-05-05 12:23:50 +03:00
Roman Khavronenko
9cdd4696fe vmalert: add flag to control behaviour on startup for state restore errors (#1265)
Alerting rules now can return specific error type ErrStateRestore to indicate
whether restore state procedure failed. Such errors were returned and logged
before as well. But now user can specify whether to just log these errors
(remoteRead.ignoreRestoreErrors=true) or to stop the process
(remoteRead.ignoreRestoreErrors=false). The latter is important when VM isn't
ready yet to serve queries from vmalert and it needs to wait.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252
2021-05-05 10:07:19 +03:00
Denis Fondras
cdd1b06473 Update to OpenBSD 6.9 and VictoriaMetrics 1.59 (#1263)
* Update to OpenBSD 6.9 and VictoriaMetrics 1.58

While at it, also build vmagent

* Bump to v1.59 and remove zstd dependency
2021-05-03 14:14:13 +03:00
Aliaksandr Valialkin
76027093ca lib/storage: use WARNING instead of INFO level for logging dropped labels 2021-05-03 13:56:43 +03:00
Aliaksandr Valialkin
ce9e163e94 lib/httpserver: stop the process on panics in request handlers
Panics may leave the process in inconsistent state. That's why it is better to stop the process after the panic
instead of recovering from the panic. Unfortunately, the standard net/http.Server recovers panics in request handlers.
See https://github.com/golang/go/issues/16542 . That's lib/httpserver must stop the process on itself after the panic.
2021-05-03 11:59:40 +03:00
Aliaksandr Valialkin
988c3b386f docs/CHANGELOG.md: document the bugfix for proper removal of stale parts (477369b62f) 2021-05-03 11:38:15 +03:00
Nikolay
477369b62f adds stalePartsRemover (#1261)
for new created partitions
2021-05-03 11:34:00 +03:00
Aliaksandr Valialkin
9a930720b3 lib/promrelabel: add tests for removing the specified {label="value"} pair 2021-05-03 11:26:53 +03:00
Aliaksandr Valialkin
0969b446b3 deployment/docker: update base docker image from alpine:3.13.2 to alpine:3.13.5 2021-05-01 10:50:09 +03:00
675 changed files with 46632 additions and 21224 deletions

View File

@@ -4,12 +4,12 @@ about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
It would be a great [upgrading](https://docs.victoriametrics.com/#how-to-upgrade) to [the latest avaialble release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
It would be a great [upgrading](https://docs.victoriametrics.com/#how-to-upgrade)
to [the latest available release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
and verifying whether the bug is reproducible there.
It is also recommended reading [troubleshooting docs](https://docs.victoriametrics.com/#troubleshooting).
@@ -19,9 +19,22 @@ Steps to reproduce the behavior.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Logs**
Check if any warnings or errors were logged by VictoriaMetrics components
or components in communication with VictoriaMetrics (e.g. Prometheus, Grafana).
**Screenshots**
If applicable, add screenshots to help explain your problem.
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [montioring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
**Version**
The line returned when passing `--version` command line flag to binary. For example:
```
@@ -30,15 +43,5 @@ victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
```
**Used command-line flags**
Command-line flags are listed as `flag{name="httpListenAddr", value=":443"} 1` lines at the `/metrics` page.
See the following docs for details:
Please provide applied command-line flags used for running VictoriaMetrics and its components.
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)
* [montioring for VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#monitoring)
**Additional context**
Add any other context about the problem here such as error logs from VictoriaMetrics and Prometheus,
`/metrics` output, screenshots from the official Grafana dashboards for VictoriaMetrics:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/dashboards/10229)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)

6
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"

23
.github/workflows/check-licenses.yml vendored Normal file
View File

@@ -0,0 +1,23 @@
name: license-check
on:
push:
paths:
- 'vendor'
pull_request:
paths:
- 'vendor'
jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: Setup Go
uses: actions/setup-go@main
with:
go-version: 1.16
id: go
- name: Code checkout
uses: actions/checkout@master
- name: Check License
run: |
make check-licenses

View File

@@ -1,30 +0,0 @@
name: github-pages
on:
push:
paths:
- 'docs/*'
- 'README.md'
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: publish
shell: bash
env:
TOKEN: ${{secrets.CI_TOKEN}}
run: |
git clone https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.github.io.git gpages
cp docs/* gpages
cp README.md gpages
cd gpages
git config --local user.email "info@victoriametrics.com"
git config --local user.name "Vika"
git add .
git commit -m "update github pages"
remote_repo="https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.github.io.git"
git push "${remote_repo}"
cd ..
rm -rf gpages

View File

@@ -60,7 +60,7 @@ jobs:
GOOS=darwin go build -mod=vendor ./app/vmctl
CGO_ENABLED=0 GOOS=windows go build -mod=vendor ./app/vmagent
- name: Publish coverage
uses: codecov/codecov-action@v1.0.6
uses: codecov/codecov-action@v1.5.2
with:
file: ./coverage.txt

View File

@@ -16,7 +16,7 @@ jobs:
TOKEN: ${{secrets.CI_TOKEN}}
run: |
git clone https://vika:${TOKEN}@github.com/VictoriaMetrics/VictoriaMetrics.wiki.git wiki
cp docs/* wiki
cp -r docs/* wiki
cd wiki
git config --local user.email "info@victoriametrics.com"
git config --local user.name "Vika"

1
.gitignore vendored
View File

@@ -15,3 +15,4 @@
/package/temp-rpm-*
/package/*.deb
/package/*.rpm
.DS_store

5
.wwhrd.yml Normal file
View File

@@ -0,0 +1,5 @@
allowlist:
- Apache-2.0
- MIT
- BSD-3-Clause
- BSD-2-Clause

View File

@@ -190,7 +190,7 @@ lint: install-golint
golint app/...
install-golint:
which golint || go install golang.org/x/lint/golint
which golint || GO111MODULE=off go get golang.org/x/lint/golint
errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./lib/...
@@ -205,7 +205,7 @@ errcheck: install-errcheck
errcheck -exclude=errcheck_excludes.txt ./app/vmctl/...
install-errcheck:
which errcheck || go install github.com/kisielk/errcheck
which errcheck || GO111MODULE=off go get github.com/kisielk/errcheck
check-all: fmt vet lint errcheck golangci-lint
@@ -254,14 +254,20 @@ quicktemplate-gen: install-qtc
qtc
install-qtc:
which qtc || go install github.com/valyala/quicktemplate/qtc
which qtc || GO111MODULE=off go get github.com/valyala/quicktemplate/qtc
golangci-lint: install-golangci-lint
golangci-lint run --exclude '(SA4003|SA1019|SA5011):' -D errcheck -D structcheck --timeout 2m
install-golangci-lint:
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.29.0
which golangci-lint || curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(shell go env GOPATH)/bin v1.40.1
install-wwhrd:
which wwhrd || GO111MODULE=off go get github.com/frapposelli/wwhrd
check-licenses: install-wwhrd
wwhrd check -f .wwhrd.yml
copy-docs:
echo "---\nsort: ${ORDER}\n---\n" > ${DST}
@@ -271,12 +277,13 @@ copy-docs:
# Cluster docs are supposed to be ordered as 9th.
# For The rest of docs is ordered manually.t
docs-sync:
cp README.md docs/README.md
SRC=README.md DST=docs/Single-server-VictoriaMetrics.md ORDER=1 $(MAKE) copy-docs
SRC=app/vmagent/README.md DST=docs/vmagent.md ORDER=2 $(MAKE) copy-docs
SRC=app/vmalert/README.md DST=docs/vmalert.md ORDER=3 $(MAKE) copy-docs
SRC=app/vmauth/README.md DST=docs/vmauth.md ORDER=4 $(MAKE) copy-docs
SRC=app/vmbackup/README.md DST=docs/vmbackup.md ORDER=5 $(MAKE) copy-docs
SRC=app/vmrestore/README.md DST=docs/vmrestore.md ORDER=6 $(MAKE) copy-docs
SRC=app/vmctl/README.md DST=docs/vmctl.md ORDER=7 $(MAKE) copy-docs
SRC=app/vmgateway/README.md DST=docs/vmgateway.md ORDER=8 $(MAKE) copy-docs
SRC=app/vmbackupmanager/README.md DST=docs/vmbackupmanager.md ORDER=9 $(MAKE) copy-docs
SRC=app/vmagent/README.md DST=docs/vmagent.md ORDER=3 $(MAKE) copy-docs
SRC=app/vmalert/README.md DST=docs/vmalert.md ORDER=4 $(MAKE) copy-docs
SRC=app/vmauth/README.md DST=docs/vmauth.md ORDER=5 $(MAKE) copy-docs
SRC=app/vmbackup/README.md DST=docs/vmbackup.md ORDER=6 $(MAKE) copy-docs
SRC=app/vmrestore/README.md DST=docs/vmrestore.md ORDER=7 $(MAKE) copy-docs
SRC=app/vmctl/README.md DST=docs/vmctl.md ORDER=8 $(MAKE) copy-docs
SRC=app/vmgateway/README.md DST=docs/vmgateway.md ORDER=9 $(MAKE) copy-docs
SRC=app/vmbackupmanager/README.md DST=docs/vmbackupmanager.md ORDER=10 $(MAKE) copy-docs

151
README.md
View File

@@ -2,7 +2,7 @@
[![Latest Release](https://img.shields.io/github/release/VictoriaMetrics/VictoriaMetrics.svg?style=flat-square)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
[![Docker Pulls](https://img.shields.io/docker/pulls/victoriametrics/victoria-metrics.svg?maxAge=604800)](https://hub.docker.com/r/victoriametrics/victoria-metrics)
[![Slack](https://img.shields.io/badge/join%20slack-%23victoriametrics-brightgreen.svg)](http://slack.victoriametrics.com/)
[![Slack](https://img.shields.io/badge/join%20slack-%23victoriametrics-brightgreen.svg)](https://slack.victoriametrics.com/)
[![GitHub license](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics.svg)](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/LICENSE)
[![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics)](https://goreportcard.com/report/github.com/VictoriaMetrics/VictoriaMetrics)
[![Build Status](https://github.com/VictoriaMetrics/VictoriaMetrics/workflows/main/badge.svg)](https://github.com/VictoriaMetrics/VictoriaMetrics/actions)
@@ -28,7 +28,7 @@ See [features available for enterprise customers](https://victoriametrics.com/en
## Case studies and talks
Alphabetically sorted links to case studies:
Case studies:
* [adidas](https://docs.victoriametrics.com/CaseStudies.html#adidas)
* [Adsterra](https://docs.victoriametrics.com/CaseStudies.html#adsterra)
@@ -37,14 +37,19 @@ Alphabetically sorted links to case studies:
* [CERN](https://docs.victoriametrics.com/CaseStudies.html#cern)
* [COLOPL](https://docs.victoriametrics.com/CaseStudies.html#colopl)
* [Dreamteam](https://docs.victoriametrics.com/CaseStudies.html#dreamteam)
* [German Research Center for Artificial Intelligence](https://docs.victoriametrics.com/CaseStudies.html#german-research-center-for-artificial-intelligence)
* [Groove X](https://docs.victoriametrics.com/CaseStudies.html#groove-x)
* [Idealo.de](https://docs.victoriametrics.com/CaseStudies.html#idealode)
* [MHI Vestas Offshore Wind](https://docs.victoriametrics.com/CaseStudies.html#mhi-vestas-offshore-wind)
* [Sensedia](https://docs.victoriametrics.com/CaseStudies.html#sensedia)
* [Synthesio](https://docs.victoriametrics.com/CaseStudies.html#synthesio)
* [Wedos.com](https://docs.victoriametrics.com/CaseStudies.html#wedoscom)
* [Wix.com](https://docs.victoriametrics.com/CaseStudies.html#wixcom)
* [Zerodha](https://docs.victoriametrics.com/CaseStudies.html#zerodha)
* [zhihu](https://docs.victoriametrics.com/CaseStudies.html#zhihu)
See also [articles and slides about VictoriaMetrics from our users](https://docs.victoriametrics.com/Articles.html#third-party-articles-and-slides-about-victoriametrics)
## Prominent features
@@ -93,7 +98,8 @@ Alphabetically sorted links to case studies:
* [Prometheus exposition format](#how-to-import-data-in-prometheus-exposition-format).
* [Arbitrary CSV data](#how-to-import-csv-data).
* Supports metrics' relabeling. See [these docs](#relabeling) for details.
* Ideally works with big amounts of time series data from Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data and various Enterprise workloads.
* Can deal with high cardinality and high churn rate issues using [series limiter](#cardinality-limiter).
* Ideally works with big amounts of time series data from APM, Kubernetes, IoT sensors, connected cars, industrial telemetry, financial data and various Enterprise workloads.
* Has open source [cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
* See also technical [Articles about VictoriaMetrics](https://docs.victoriametrics.com/Articles.html).
@@ -154,6 +160,8 @@ Alphabetically sorted links to case studies:
* [Security](#security)
* [Tuning](#tuning)
* [Monitoring](#monitoring)
* [TSDB stats](#tsdb-stats)
* [Cardinality limiter](#cardinality-limiter)
* [Troubleshooting](#troubleshooting)
* [Data migration](#data-migration)
* [Backfilling](#backfilling)
@@ -338,8 +346,11 @@ Currently the following [scrape_config](https://prometheus.io/docs/prometheus/la
* [consul_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config)
* [dns_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config)
* [openstack_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config)
* [docker_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config)
* [dockerswarm_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config)
* [eureka_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config)
* [digitalocean_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config)
* [http_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config)
Other `*_sd_config` types will be supported in the future.
@@ -456,11 +467,7 @@ The `/api/v1/export` endpoint should return the following response:
Data sent to VictoriaMetrics via `Graphite plaintext protocol` may be read via the following APIs:
* [Graphite API](#graphite-api-usage)
* [Prometheus querying API](#prometheus-querying-api-usage). Graphite metric names may special chars such as `-`, which may clash
with [MetricsQL operations](https://docs.victoriametrics.com/MetricsQL.html). Such metrics can be queries via `{__name__="foo-bar.baz"}`.
VictoriaMetrics supports `__graphite__` pseudo-label for selecting time series with Graphite-compatible filters in [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
For example, `{__graphite__="foo.*.bar"}` is equivalent to `{__name__=~"foo[.][^.]*[.]bar"}`, but it works faster
and it is easier to use when migrating from Graphite to VictoriaMetrics.
* [Prometheus querying API](#prometheus-querying-api-usage). VictoriaMetrics supports `__graphite__` pseudo-label for selecting time series with Graphite-compatible filters in [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). For example, `{__graphite__="foo.*.bar"}` is equivalent to `{__name__=~"foo[.][^.]*[.]bar"}`, but it works faster and it is easier to use when migrating from Graphite to VictoriaMetrics.
* [go-graphite/carbonapi](https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/carbonapi.example.victoriametrics.yaml)
## How to send data from OpenTSDB-compatible agents
@@ -549,9 +556,7 @@ VictoriaMetrics supports the following handlers from [Prometheus querying API](h
* [/api/v1/series](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers)
* [/api/v1/labels](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names)
* [/api/v1/label/.../values](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values)
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats). VictoriaMetrics accepts optional `topN=N` and `date=YYYY-MM-DD`
query args for this handler, where `N` is the number of top entries to return in the response and `YYYY-MM-DD` is the date for collecting the stats.
By default top 10 entries are returned and the stats is collected for the current day.
* [/api/v1/status/tsdb](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats). See [these docs](#tsdb-stats) for details.
* [/api/v1/targets](https://prometheus.io/docs/prometheus/latest/querying/api/#targets) - see [these docs](#how-to-scrape-prometheus-exporters-such-as-node-exporter) for more details.
These handlers can be queried from Prometheus-compatible clients such as Grafana or curl.
@@ -563,7 +568,7 @@ All the Prometheus querying API handlers can be prepended with `/prometheus` pre
VictoriaMetrics accepts optional `extra_label=<label_name>=<label_value>` query arg, which can be used for enforcing additional label filters for queries. For example,
`/api/v1/query_range?extra_label=user_id=123&query=<query>` would automatically add `{user_id="123"}` label filter to the given `<query>`. This functionality can be used
for limiting the scope of time series visible to the given tenant. It is expected that the `extra_label` query arg is automatically set by auth proxy sitting
in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if you need assistance with such a proxy.
in front of VictoriaMetrics. See [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmgateway](https://docs.victoriametrics.com/vmgateway.html) as examples of such proxies.
VictoriaMetrics accepts relative times in `time`, `start` and `end` query args additionally to unix timestamps and [RFC3339](https://www.ietf.org/rfc/rfc3339.txt).
For example, the following query would return data for the last 30 minutes: `/api/v1/query_range?start=-30m&query=...`.
@@ -572,15 +577,9 @@ VictoriaMetrics accepts `round_digits` query arg for `/api/v1/query` and `/api/v
By default, VictoriaMetrics returns time series for the last 5 minutes from `/api/v1/series`, while the Prometheus API defaults to all time. Use `start` and `end` to select a different time range.
VictoriaMetrics accepts additional args for `/api/v1/labels` and `/api/v1/label/.../values` handlers.
* Any number [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) via `match[]` query arg.
* Optional `start` and `end` query args for limiting the time range for the selected labels or label values.
See [this feature request](https://github.com/prometheus/prometheus/issues/6178) for details.
Additionally VictoriaMetrics provides the following handlers:
* `/vmui` - Basic Web UI
* `/api/v1/series/count` - returns the total number of time series in the database. Some notes:
* the handler scans all the inverted index, so it can be slow if the database contains tens of millions of time series;
* the handler may count [deleted time series](#how-to-delete-time-series) additionally to normal time series due to internal implementation restrictions;
@@ -659,7 +658,7 @@ to your needs or when testing bugfixes.
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make victoria-metrics` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics` binary and puts it into the `bin` folder.
@@ -675,7 +674,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make victoria-metrics-arm` or `make victoria-metrics-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics-arm` or `victoria-metrics-arm64` binary respectively and puts it into the `bin` folder.
@@ -689,7 +688,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
`Pure Go` mode builds only Go code without [cgo](https://golang.org/cmd/cgo/) dependencies.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make victoria-metrics-pure` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `victoria-metrics-pure` binary and puts it into the `bin` folder.
@@ -959,6 +958,8 @@ For example, `/api/v1/import?extra_label=foo=bar` would add `"foo":"bar"` label
Note that it could be required to flush response cache after importing historical data. See [these docs](#backfilling) for detail.
VictoriaMetrics parses input JSON lines one-by-one. It loads the whole JSON line in memory, then parses it and then saves the parsed samples into persistent storage. This means that VictoriaMetrics can occupy big amounts of RAM when importing too long JSON lines. The solution is to split too long JSON lines into smaller lines. It is OK if samples for a single time series are split among multiple JSON lines.
### How to import CSV data
@@ -1088,45 +1089,28 @@ on the interval `[now - max_lookback ... now]` is scraped for each time series.
For instance, `/federate?match[]=up&max_lookback=1h` would return last points on the `[now - 1h ... now]` interval. This may be useful for time series federation
with scrape intervals exceeding `5m`.
## Capacity planning
A rough estimation of the required resources for ingestion path:
VictoriaMetrics uses lower amounts of CPU, RAM and storage space on production workloads compared to competing solutions (Prometheus, Thanos, Cortex, TimescaleDB, InfluxDB, QuestDB, M3DB) according to [our case studies](https://docs.victoriametrics.com/CaseStudies.html).
* RAM size: less than 1KB per active time series. So, ~1GB of RAM is required for 1M active time series.
Time series is considered active if new data points have been added to it recently or if it has been recently queried.
The number of active time series may be obtained from `vm_cache_entries{type="storage/hour_metric_ids"}` metric
exported on the `/metrics` page.
VictoriaMetrics stores various caches in RAM. Memory size for these caches may be limited with `-memory.allowedPercent` or `-memory.allowedBytes` flags.
VictoriaMetrics capacity scales linearly with the available resources. The needed amounts of CPU and RAM highly depends on the workload - the number of active time series, series churn rate, query types, query qps, etc. It is recommended setting up a test VictoriaMetrics for your production workload and iteratively scaling CPU and RAM resources until it becomes stable according to [troubleshooting docs](#troubleshooting). A single-node VictoriaMetrics works perfectly with the following production workload according to [our case studies](https://docs.victoriametrics.com/CaseStudies.html):
* CPU cores: a CPU core per 300K inserted data points per second. So, ~4 CPU cores are required for processing
the insert stream of 1M data points per second. The ingestion rate may be lower for high cardinality data or for time series with high number of labels.
See [this article](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) for details.
If you see lower numbers per CPU core, then it is likely active time series info doesn't fit caches,
so you need more RAM for lowering CPU usage.
* Ingestion rate: 1.5+ million samples per second
* Active time series: 50+ million
* Total time series: 5+ billion
* Time series churn rate: 150+ million of new series per day
* Total number of samples: 10+ trillion
* Queries: 200+ qps
* Query latency (99th percentile): 1 second
* Storage space: less than a byte per data point on average. So, ~260GB is required for storing a month-long insert stream
of 100K data points per second.
The actual storage size heavily depends on data randomness (entropy). Higher randomness means higher storage size requirements.
Read [this article](https://medium.com/faun/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932)
for details.
The needed storage space for the given retention (the retention is set via `-retentionPeriod` command-line flag) can be extrapolated from disk space usage in a test run. For example, if `-storageDataPath` directory size becomes 10GB after a day-long test run on a production workload, then it will need at least `10GB*100=1TB` of disk space for `-retentionPeriod=100d` (100-days retention period).
* Network usage: outbound traffic is negligible. Ingress traffic is ~100 bytes per ingested data point via
[Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
The actual ingress bandwidth usage depends on the average number of labels per ingested metric and the average size
of label values. The higher number of per-metric labels and longer label values mean the higher ingress bandwidth.
It is recommended leaving the following amounts of spare resources:
The required resources for query path:
* RAM size: depends on the number of time series to scan in each query and the `step`
argument passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
The higher number of scanned time series and lower `step` argument results in the higher RAM usage.
* CPU cores: a CPU core per 30 millions of scanned data points per second.
This means that heavy queries that touch big number of time series (over 10K) and/or big number data points (over 100M)
usually require more CPU resources than tiny queries that touch a few time series with small number of data points.
* Network usage: depends on the frequency and the type of incoming requests. Typical Grafana dashboards usually
require negligible network bandwidth.
* 50% of free RAM for reducing the probability of OOM (out of memory) crashes and slowdowns during temporary spikes in workload.
* 50% of spare CPU for reducing the probability of slowdowns during temporary spikes in workload.
* At least 30% of free storage space at the directory pointed by `-storageDataPath` command-line flag.
## High availability
@@ -1176,7 +1160,7 @@ VictoriaMetrics de-duplicates data points if `-dedup.minScrapeInterval` command-
is set to positive duration. For example, `-dedup.minScrapeInterval=60s` would de-duplicate data points
on the same time series if they fall within the same discrete 60s bucket. The earliest data point will be kept. In the case of equal timestamps, an arbitrary data point will be kept.
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs.
The recommended value for `-dedup.minScrapeInterval` must equal to `scrape_interval` config from Prometheus configs. It is recommended to have a single `scrape_interval` across all the scrape targets. See [this article](https://www.robustperception.io/keep-it-simple-scrape_interval-id) for details.
The de-duplication reduces disk space usage if multiple identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus instances in HA pair
write data to the same VictoriaMetrics instance. These vmagent or Prometheus instances must have identical
@@ -1324,6 +1308,33 @@ VictoriaMetrics also exposes currently running queries with their execution time
See the example of alerting rules for VM components [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts.yml).
## TSDB stats
VictoriaMetrics returns TSDB stats at `/api/v1/status/tsdb` page in the way similar to Prometheus - see [these Prometheus docs](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats). VictoriaMetrics accepts the following optional query args at `/api/v1/status/tsdb` page:
* `topN=N` where `N` is the number of top entries to return in the response. By default top 10 entries are returned.
* `date=YYYY-MM-DD` where `YYYY-MM-DD` is the date for collecting the stats. By default the stats is collected for the current day.
* `match[]=SELECTOR` where `SELECTOR` is an arbitrary [time series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) for series to take into account during stats calculation. By default all the series are taken into account.
* `extra_label=LABEL=VALUE`. See [these docs](#prometheus-querying-api-enhancements) for more details.
## Cardinality limiter
By default VictoriaMetrics doesn't limit the number of stored time series. The limit can be enforced by setting the following command-line flags:
* `-storage.maxHourlySeries` - limits the number of time series that can be added during the last hour. Useful for limiting the number of active time series.
* `-storage.maxDailySeries` - limits the number of time series that can be added during the last day. Useful for limiting daily churn rate.
Both limits can be set simultaneously. If any of these limits is reached, then incoming samples for new time series are dropped. A sample of dropped series is put in the log with `WARNING` level.
The exceeded limits can be [monitored](#monitoring) with the following metrics:
* `vm_hourly_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded hourly limit on the number of unique time series.
* `vm_daily_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded daily limit on the number of unique time series.
These limits are approximate, so VictoriaMetrics can underflow/overflow the limit by a small percentage (usually less than 1%).
## Troubleshooting
* It is recommended to use default command-line flag values (i.e. don't set them explicitly) until the need
@@ -1380,10 +1391,7 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
It may be needed in order to suppress default gap filling algorithm used by VictoriaMetrics - by default it assumes
each time series is continuous instead of discrete, so it fills gaps between real samples with regular intervals.
* Metrics and labels leading to high cardinality or high churn rate can be determined at `/api/v1/status/tsdb` page.
See [these docs](https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats) for details.
VictoriaMetrics accepts optional `date=YYYY-MM-DD` and `topN=42` args on this page. By default `date` equals to the current date,
while `topN` equals to 10.
* Metrics and labels leading to high cardinality or high churn rate can be determined at `/api/v1/status/tsdb` page. See [these docs](#tsdb-stats) for details.
* New time series can be logged if `-logNewSeries` command-line flag is passed to VictoriaMetrics.
@@ -1397,6 +1405,11 @@ See the example of alerting rules for VM components [here](https://github.com/Vi
* VictoriaMetrics ignores `NaN` values during data ingestion.
## Cache removal
VictoriaMetrics uses various internal caches. These caches are stored to `<-storageDataPath>/cache` directory during graceful shutdown (e.g. when VictoriaMetrics is stopped by sending `SIGINT` signal). The caches are read on the next VictoriaMetrics startup. Sometimes it is needed to remove such caches on the next startup. This can be performed by placing `reset_cache_on_startup` file inside the `<-storageDataPath>/cache` directory before the restart of VictoriaMetrics. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447) for details.
## Data migration
Use [vmctl](https://docs.victoriametrics.com/vmctl.html) for data migration. It supports the following data migration types:
@@ -1502,7 +1515,7 @@ Contact us with any questions regarding VictoriaMetrics at [info@victoriametrics
Feel free asking any questions regarding VictoriaMetrics:
* [slack](http://slack.victoriametrics.com/)
* [slack](https://slack.victoriametrics.com/)
* [reddit](https://www.reddit.com/r/VictoriaMetrics/)
* [telegram-en](https://t.me/VictoriaMetrics_en)
* [telegram-ru](https://t.me/VictoriaMetrics_ru1)
@@ -1570,7 +1583,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-csvTrimTimestamp duration
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
-dedup.minScrapeInterval duration
Remove superflouos samples from time series if they are located closer to each other than this duration. This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. Deduplication is disabled if the -dedup.minScrapeInterval is 0
Leave only the first sample in every time series per each discrete interval equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication for details
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries
-denyQueriesOutsideRetention
@@ -1606,7 +1619,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealay, the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
@@ -1697,6 +1710,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.digitaloceanSDCheckInterval duration
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
@@ -1707,6 +1722,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
-promscrape.dnsSDCheckInterval duration
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
-promscrape.dockerSDCheckInterval duration
Interval for checking for changes in docker. This works only if docker_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config for details (default 30s)
-promscrape.dockerswarmSDCheckInterval duration
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
-promscrape.dropOriginalLabels
@@ -1719,6 +1736,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.httpSDCheckInterval duration
Interval for checking for changes in http endpoint service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
@@ -1738,6 +1757,8 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
-relabelConfig string
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. See https://docs.victoriametrics.com/#relabeling for details
-relabelDebug
Whether to log metrics before and after relabeling with -relabelConfig. If the -relabelDebug is enabled, then the metrics aren't sent to storage. This is useful for debugging the relabeling configs
-retentionPeriod value
Data with timestamps outside the retentionPeriod is automatically deleted
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
@@ -1754,7 +1775,7 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
-search.maxExportDuration duration
The maximum duration for /api/v1/export call (default 720h0m0s)
-search.maxLookback duration
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
Synonym to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values bigger than the horizontal resolution of the graph (default 30000)
-search.maxQueryDuration duration
@@ -1800,6 +1821,10 @@ Pass `-help` to VictoriaMetrics in order to see the list of supported command-li
authKey, which must be passed in query string to /snapshot* pages
-sortLabels
Whether to sort labels for incoming samples before writing them to storage. This may be needed for reducing memory usage at storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}. Enabled sorting for labels can slow down ingestion performance a bit
-storage.maxDailySeries int
The maximum number of unique series can be added to the storage during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -storage.maxHourlySeries
-storage.maxHourlySeries int
The maximum number of unique series can be added to the storage during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -storage.maxDailySeries
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls

View File

@@ -24,9 +24,8 @@ import (
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Remove superflouos samples from time series if they are located closer to each other than this duration. "+
"This may be useful for reducing overhead when multiple identically configured Prometheus instances write data to the same VictoriaMetrics. "+
"Deduplication is disabled if the -dedup.minScrapeInterval is 0")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the first sample in every time series per each discrete interval "+
"equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication for details")
dryRun = flag.Bool("dryRun", false, "Whether to check only -promscrape.config and then exit. "+
"Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse")
)
@@ -91,6 +90,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
fmt.Fprintf(w, "See docs at <a href='https://docs.victoriametrics.com/'>https://docs.victoriametrics.com/</a></br>")
fmt.Fprintf(w, "Useful endpoints:</br>")
httpserver.WriteAPIHelp(w, [][2]string{
{"/vmui", "Web UI"},
{"/targets", "discovered targets list"},
{"/api/v1/targets", "advanced information about discovered targets in JSON format"},
{"/metrics", "available service metrics"},

View File

@@ -34,7 +34,9 @@ to `vmagent` such as the ability to push metrics instead of pulling them. We did
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
* Uses lower amounts of RAM, CPU, disk IO and network bandwidth compared with Prometheus.
* Scrape targets can be spread among multiple `vmagent` instances when big number of targets must be scraped. See [these docs](#scraping-big-number-of-targets) for details.
* Scrape targets can be spread among multiple `vmagent` instances when big number of targets must be scraped. See [these docs](#scraping-big-number-of-targets).
* Can efficiently scrape targets that expose millions of time series such as [/federate endpoint in Prometheus](https://prometheus.io/docs/prometheus/latest/federation/). See [these docs](#stream-parsing-mode).
* Can deal with high cardinality and high churn rate issues by limiting the number of unique time series sent to remote storage systems. See [these docs](#cardinality-limiter).
## Quick Start
@@ -171,10 +173,16 @@ The following scrape types in [scrape_config](https://prometheus.io/docs/prometh
* `openstack_sd_configs` - is for scraping OpenStack targets.
See [openstack_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config) for details.
[OpenStack identity API v3](https://docs.openstack.org/api-ref/identity/v3/) is supported only.
* `docker_sd_configs` - is for scraping Docker targets.
See [docker_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config) for details.
* `dockerswarm_sd_configs` - is for scraping Docker Swarm targets.
See [dockerswarm_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config) for details.
* `eureka_sd_configs` - is for scraping targets registered in [Netflix Eureka](https://github.com/Netflix/eureka).
See [eureka_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config) for details.
* `digitalocean_sd_configs` is for scraping targerts registered in [DigitalOcean](https://www.digitalocean.com/)
See [digitalocean_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config) for details.
* `http_sd_configs` is for scraping targerts registered in http service discovery.
See [http_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config) for details.
Please file feature requests to [our issue tracker](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you need other service discovery mechanisms to be supported by `vmagent`.
@@ -184,7 +192,7 @@ Please file feature requests to [our issue tracker](https://github.com/VictoriaM
to save network bandwidth.
* `disable_keepalive: true` - to disable [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
By default, `vmagent` uses keep-alive connections to scrape targets to reduce overhead on connection re-establishing.
* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics.
* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics. See [these docs](#stream-parsing-mode).
Note that `vmagent` doesn't support `refresh_interval` option for these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
command-line flag instead. For example, `-promscrape.consulSDCheckInterval=60s` sets `refresh_interval` for all the `consul_sd_configs`
@@ -217,10 +225,10 @@ and also provides the following actions:
The relabeling can be defined in the following places:
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels.
* At the `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`.
* At the `-remoteWrite.relabelConfig` file. This relabeling is aplied to all the collected metrics before sending them to remote storage.
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`.
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels. This relabeling can be debugged by passing `relabel_debug: true` option to the corresponding `scrape_config` section. In this case `vmagent` logs target labels before and after the relabeling and then drops the logged target.
* At the `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`. This relabeling can be debugged by passing `metric_relabel_debug: true` option to the corresponding `scrape_config` section. In this case `vmagent` logs metrics before and after the relabeling and then drops the logged metrics.
* At the `-remoteWrite.relabelConfig` file. This relabeling is aplied to all the collected metrics before sending them to remote storage. This relabeling can be debugged by passing `-remoteWrite.relabelDebug` command-line option to `vmagent`. In this case `vmagent` logs metrics before and after the relabeling and then drops all the logged metrics instead of sending them to remote storage.
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`. This relabeling can be debugged by passing `-remoteWrite.urlRelabelDebug` command-line options to `vmagent`. In this case `vmagent` logs metrics before and after the relabeling and then drops all the logged metrics instead of sending them to the corresponding `-remoteWrite.url`.
You can read more about relabeling in the following articles:
@@ -232,10 +240,31 @@ You can read more about relabeling in the following articles:
* [relabel_configs vs metric_relabel_configs](https://www.robustperception.io/relabel_configs-vs-metric_relabel_configs)
## Stream parsing mode
By default `vmagent` reads the full response from scrape target into memory, then parses it, applies [relabeling](#relabeling) and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode. When this mode is enabled, then `vmagent` reads response from scrape target in chunks, then immediately processes every chunk and pushes the processed metrics to remote storage. This allows saving memory when scraping targets that expose millions of metrics. Stream parsing mode may be enabled either globally for all of the scrape targets by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
```yml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
honor_labels: true
metrics_path: /federate
params:
'match[]': ['{__name__!=""}']
```
Note that `sample_limit` option doesn't prevent from data push to remote storage if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed.
## Scraping big number of targets
A single `vmagent` instance can scrape tens of thousands of scrape targets. Sometimes this isn't enough due to limitations on CPU, network, RAM, etc.
In this case scrape targets can be split among multiple `vmagent` instances (aka `vmagent` horizontal scaling and clustering).
In this case scrape targets can be split among multiple `vmagent` instances (aka `vmagent` horizontal scaling, sharding and clustering).
Each `vmagent` instance in the cluster must use identical `-promscrape.config` files with distinct `-promscrape.cluster.memberNum` values.
The flag value must be in the range `0 ... N-1`, where `N` is the number of `vmagent` instances in the cluster.
The number of `vmagent` instances in the cluster must be passed to `-promscrape.cluster.membersCount` command-line flag. For example, the following commands
@@ -295,6 +324,22 @@ scrape_configs:
server_name: real-server-name
```
## Cardinality limiter
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`. The limit can be enforced by setting the following command-line flags:
* `-remoteWrite.maxHourlySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last hour. Useful for limiting the number of active time series.
* `-remoteWrite.maxDailySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last day. Useful for limiting daily churn rate.
Both limits can be set simultaneously. If any of these limits is reached, then samples for new time series are dropped instead of sending them to remote storage systems. A sample of dropped series is put in the log with `WARNING` level.
The exceeded limits can be [monitored](#monitoring) with the following metrics:
* `vmagent_hourly_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded hourly limit on the number of unique time series.
* `vmagent_daily_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded daily limit on the number of unique time series.
These limits are approximate, so `vmagent` can underflow/overflow the limit by a small percentage (usually less than 1%).
## Monitoring
@@ -322,6 +367,12 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
* We recommend you increase the maximum number of open files in the system (`ulimit -n`) when scraping a big number of targets,
as `vmagent` establishes at least a single TCP connection per target.
* If `vmagent` uses too big amounts of memory, then the following options can help:
* Enabling stream parsing. See [these docs](#stream-parsing-mode).
* Reducing the number of output queues with `-remoteWrite.queues` command-line option.
* Reducing the amounts of RAM vmagent can use for in-memory buffering with `-memory.allowedPercent` or `-memory.allowedBytes` command-line option. Another option is to reduce memory limits in Docker and/or Kuberntes if `vmagent` runs under these systems.
* Reducing the number of CPU cores vmagent can use by passing `GOMAXPROCS=N` environment variable to `vmagent`, where `N` is the desired limit on CPU cores. Another option is to reduce CPU limits in Docker or Kubernetes if `vmagent` runs under these systems.
* When `vmagent` scrapes many unreliable targets, it can flood the error log with scrape errors. These errors can be suppressed
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
and `http://vmagent-host:8429/api/v1/targets`.
@@ -333,25 +384,7 @@ It may be useful to perform `vmagent` rolling update without any scrape loss.
This option drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page, which may result in reduced debuggability for improperly configured per-target relabeling.
* If `vmagent` scrapes targets with millions of metrics per target (for example, when scraping [federation endpoints](https://prometheus.io/docs/prometheus/latest/federation/)),
we recommend enabling `stream parsing mode` in order to reduce memory usage during scraping. This mode may be enabled either globally for all of the scrape targets
by passing `-promscrape.streamParse` command-line flag or on a per-scrape target basis with `stream_parse: true` option. For example:
```yml
scrape_configs:
- job_name: 'big-federate'
stream_parse: true
static_configs:
- targets:
- big-prometeus1
- big-prometeus2
honor_labels: true
metrics_path: /federate
params:
'match[]': ['{__name__!=""}']
```
Note that `sample_limit` option doesn't work if stream parsing is enabled because the parsed data is pushed to remote storage as soon as it is parsed. Therefore the `sample_limit` option
doesn't make sense during stream parsing.
we recommend enabling [stream parsing mode](#stream-parsing-mode) in order to reduce memory usage during scraping.
* We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page grows constantly.
@@ -416,7 +449,7 @@ We recommend using [binary releases](https://github.com/VictoriaMetrics/Victoria
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmagent` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds the `vmagent` binary and puts it into the `bin` folder.
@@ -445,7 +478,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmagent-arm` or `make vmagent-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics)
It builds `vmagent-arm` or `vmagent-arm64` binary respectively and puts it into the `bin` folder.
@@ -515,7 +548,7 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealay, the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
@@ -600,6 +633,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Wait time used by Consul service discovery. Default value is used if not set
-promscrape.consulSDCheckInterval duration
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
-promscrape.digitaloceanSDCheckInterval duration
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
-promscrape.disableCompression
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
-promscrape.disableKeepAlive
@@ -622,6 +657,8 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
-promscrape.gceSDCheckInterval duration
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
-promscrape.httpSDCheckInterval duration
Interval for checking for changes in http service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
-promscrape.kubernetes.apiServerTimeout duration
How frequently to reload the full state from Kuberntes API server (default 30m0s)
-promscrape.kubernetesSDCheckInterval duration
@@ -642,12 +679,18 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.basicAuth.password array
Optional basic auth password to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.basicAuth.passwordFile array
Optional path to basic auth password to use for -remoteWrite.url. The file is re-read every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.basicAuth.username array
Optional basic auth username to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.bearerToken array
Optional bearer auth token to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.bearerTokenFile array
Optional path to bearer token file to use for -remoteWrite.url. The token is re-read from the file every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.flushInterval duration
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
-remoteWrite.label array
@@ -656,19 +699,40 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.maxBlockSize size
The maximum size in bytes of unpacked request to send to remote storage. It shouldn't exceed -maxInsertRequestSize from VictoriaMetrics
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
-remoteWrite.maxDailySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -remoteWrite.maxHourlySeries
-remoteWrite.maxDiskUsagePerURL size
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-remoteWrite.maxHourlySeries int
The maximum number of unique series vmagent can send to remote storage systems during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -remoteWrite.maxDailySeries
-remoteWrite.oauth2.clientID array
Optional OAuth2 clientID to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientSecret array
Optional OAuth2 clientSecret to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.clientSecretFile array
Optional OAuth2 clientSecretFile to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.scopes array
Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.oauth2.tokenUrl array
Optional OAuth2 tokenURL to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.proxyURL array
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.queues int
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage (default 4)
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage (default 2 * numberOfAvailableCPUs)
-remoteWrite.rateLimit array
Optional rate limit in bytes per second for data sent to -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
Supports array of values separated by comma or specified via multiple flags.
-remoteWrite.relabelConfig string
Optional path to file with relabel_config entries. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details
-remoteWrite.relabelDebug
Whether to log metrics before and after relabeling with -remoteWrite.relabelConfig. If the -remoteWrite.relabelDebug is enabled, then the metrics aren't sent to remote storage. This is useful for debugging the relabeling configs
-remoteWrite.roundDigits array
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
Supports array of values separated by comma or specified via multiple flags.
@@ -703,6 +767,9 @@ See the docs at https://docs.victoriametrics.com/vmagent.html .
-remoteWrite.urlRelabelConfig array
Optional path to relabel config for the corresponding -remoteWrite.url
Supports an array of values separated by comma or specified via multiple flags.
-remoteWrite.urlRelabelDebug array
Whether to log metrics before and after relabeling with -remoteWrite.urlRelabelConfig. If the -remoteWrite.urlRelabelDebug is enabled, then the metrics aren't sent to the corresponding -remoteWrite.url. This is useful for debugging the relabeling configs
Supports array of values separated by comma or specified via multiple flags.
-sortLabels
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit
-tls

View File

@@ -165,7 +165,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(r); err != nil {
prometheusWriteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -174,7 +174,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
vmimportRequests.Inc()
if err := vmimport.InsertHandler(r); err != nil {
vmimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -183,7 +183,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
csvimportRequests.Inc()
if err := csvimport.InsertHandler(r); err != nil {
csvimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -192,7 +192,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
prometheusimportRequests.Inc()
if err := prometheusimport.InsertHandler(r); err != nil {
prometheusimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -201,7 +201,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
nativeimportRequests.Inc()
if err := native.InsertHandler(r); err != nil {
nativeimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -210,7 +210,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
influxWriteRequests.Inc()
if err := influx.InsertHandlerForHTTP(r); err != nil {
influxWriteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)

View File

@@ -2,8 +2,6 @@ package remotewrite
import (
"bytes"
"crypto/tls"
"encoding/base64"
"fmt"
"io/ioutil"
"net/http"
@@ -42,17 +40,35 @@ var (
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPassword = flagutil.NewArray("remoteWrite.basicAuth.password", "Optional basic auth password to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
basicAuthPasswordFile = flagutil.NewArray("remoteWrite.basicAuth.passwordFile", "Optional path to basic auth password to use for -remoteWrite.url. "+
"The file is re-read every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerToken = flagutil.NewArray("remoteWrite.bearerToken", "Optional bearer auth token to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
bearerTokenFile = flagutil.NewArray("remoteWrite.bearerTokenFile", "Optional path to bearer token file to use for -remoteWrite.url. "+
"The token is re-read from the file every second. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientID = flagutil.NewArray("remoteWrite.oauth2.clientID", "Optional OAuth2 clientID to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecret = flagutil.NewArray("remoteWrite.oauth2.clientSecret", "Optional OAuth2 clientSecret to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2ClientSecretFile = flagutil.NewArray("remoteWrite.oauth2.clientSecretFile", "Optional OAuth2 clientSecretFile to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2TokenURL = flagutil.NewArray("remoteWrite.oauth2.tokenUrl", "Optional OAuth2 tokenURL to use for -remoteWrite.url. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
oauth2Scopes = flagutil.NewArray("remoteWrite.oauth2.scopes", "Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. "+
"If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url")
)
type client struct {
sanitizedURL string
remoteWriteURL string
authHeader string
fq *persistentqueue.FastQueue
hc *http.Client
authCfg *promauth.Config
rl rateLimiter
bytesSent *metrics.Counter
@@ -68,10 +84,11 @@ type client struct {
}
func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqueue.FastQueue, concurrency int) *client {
tlsCfg, err := getTLSConfig(argIdx)
authCfg, err := getAuthConfig(argIdx)
if err != nil {
logger.Panicf("FATAL: cannot initialize TLS config: %s", err)
logger.Panicf("FATAL: cannot initialize auth config: %s", err)
}
tlsCfg := authCfg.NewTLSConfig()
tr := &http.Transport{
Dial: statDial,
TLSClientConfig: tlsCfg,
@@ -92,26 +109,10 @@ func newClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persistentqu
}
tr.Proxy = http.ProxyURL(urlProxy)
}
authHeader := ""
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
if len(username) > 0 || len(password) > 0 {
// See https://en.wikipedia.org/wiki/Basic_access_authentication
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
authHeader = "Basic " + token64
}
token := bearerToken.GetOptionalArg(argIdx)
if len(token) > 0 {
if authHeader != "" {
logger.Fatalf("`-remoteWrite.bearerToken`=%q cannot be set when `-remoteWrite.basicAuth.*` flags are set", token)
}
authHeader = "Bearer " + token
}
c := &client{
sanitizedURL: sanitizedURL,
remoteWriteURL: remoteWriteURL,
authHeader: authHeader,
authCfg: authCfg,
fq: fq,
hc: &http.Client{
Transport: tr,
@@ -149,23 +150,48 @@ func (c *client) MustStop() {
logger.Infof("stopped client for -remoteWrite.url=%q", c.sanitizedURL)
}
func getTLSConfig(argIdx int) (*tls.Config, error) {
c := &promauth.TLSConfig{
func getAuthConfig(argIdx int) (*promauth.Config, error) {
username := basicAuthUsername.GetOptionalArg(argIdx)
password := basicAuthPassword.GetOptionalArg(argIdx)
passwordFile := basicAuthPasswordFile.GetOptionalArg(argIdx)
var basicAuthCfg *promauth.BasicAuthConfig
if username != "" || password != "" || passwordFile != "" {
basicAuthCfg = &promauth.BasicAuthConfig{
Username: username,
Password: password,
PasswordFile: passwordFile,
}
}
token := bearerToken.GetOptionalArg(argIdx)
tokenFile := bearerTokenFile.GetOptionalArg(argIdx)
var oauth2Cfg *promauth.OAuth2Config
clientSecret := oauth2ClientSecret.GetOptionalArg(argIdx)
clientSecretFile := oauth2ClientSecretFile.GetOptionalArg(argIdx)
if clientSecretFile != "" || clientSecret != "" {
oauth2Cfg = &promauth.OAuth2Config{
ClientID: oauth2ClientID.GetOptionalArg(argIdx),
ClientSecret: clientSecret,
ClientSecretFile: clientSecretFile,
TokenURL: oauth2TokenURL.GetOptionalArg(argIdx),
Scopes: strings.Split(oauth2Scopes.GetOptionalArg(argIdx), ";"),
}
}
tlsCfg := &promauth.TLSConfig{
CAFile: tlsCAFile.GetOptionalArg(argIdx),
CertFile: tlsCertFile.GetOptionalArg(argIdx),
KeyFile: tlsKeyFile.GetOptionalArg(argIdx),
ServerName: tlsServerName.GetOptionalArg(argIdx),
InsecureSkipVerify: tlsInsecureSkipVerify.GetOptionalArg(argIdx),
}
if c.CAFile == "" && c.CertFile == "" && c.KeyFile == "" && c.ServerName == "" && !c.InsecureSkipVerify {
return nil, nil
}
cfg, err := promauth.NewConfig(".", nil, nil, "", "", c)
authCfg, err := promauth.NewConfig(".", nil, basicAuthCfg, token, tokenFile, oauth2Cfg, tlsCfg)
if err != nil {
return nil, fmt.Errorf("cannot populate TLS config: %w", err)
return nil, fmt.Errorf("cannot populate OAuth2 config for remoteWrite idx: %d, err: %w", argIdx, err)
}
tlsCfg := cfg.NewTLSConfig()
return tlsCfg, nil
return authCfg, nil
}
func (c *client) runWorker() {
@@ -226,8 +252,8 @@ again:
h.Set("Content-Type", "application/x-protobuf")
h.Set("Content-Encoding", "snappy")
h.Set("X-Prometheus-Remote-Write-Version", "0.1.0")
if c.authHeader != "" {
req.Header.Set("Authorization", c.authHeader)
if ah := c.authCfg.GetAuthHeader(); ah != "" {
req.Header.Set("Authorization", ah)
}
startTime := time.Now()
@@ -239,7 +265,7 @@ again:
if retryDuration > time.Minute {
retryDuration = time.Minute
}
logger.Errorf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
logger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
t := timerpool.Get(retryDuration)
select {
@@ -260,11 +286,9 @@ again:
}
metrics.GetOrCreateCounter(fmt.Sprintf(`vmagent_remotewrite_requests_total{url=%q, status_code="%d"}`, c.sanitizedURL, statusCode)).Inc()
if statusCode == 409 || statusCode == 400 {
// Just drop block on 409 status code like Prometheus does.
// Just drop block on 409 and 400 status codes like Prometheus does.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
// drop block on 400 status code,
// not expected that remote server will be able to handle it on retry
// should fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
// and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
_ = resp.Body.Close()
c.packetsDropped.Inc()
return true

View File

@@ -128,7 +128,6 @@ func (wr *writeRequest) reset() {
}
func (wr *writeRequest) flush() {
sortLabelsIfNeeded(wr.tss)
wr.wr.Timeseries = wr.tss
wr.adjustSampleValues()
atomic.StoreUint64(&wr.lastFlushTime, fasttime.UnixTimestamp())

View File

@@ -17,7 +17,12 @@ var (
"Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage")
relabelConfigPathGlobal = flag.String("remoteWrite.relabelConfig", "", "Optional path to file with relabel_config entries. These entries are applied to all the metrics "+
"before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details")
relabelDebugGlobal = flag.Bool("remoteWrite.relabelDebug", false, "Whether to log metrics before and after relabeling with -remoteWrite.relabelConfig. "+
"If the -remoteWrite.relabelDebug is enabled, then the metrics aren't sent to remote storage. This is useful for debugging the relabeling configs")
relabelConfigPaths = flagutil.NewArray("remoteWrite.urlRelabelConfig", "Optional path to relabel config for the corresponding -remoteWrite.url")
relabelDebug = flagutil.NewArrayBool("remoteWrite.urlRelabelDebug", "Whether to log metrics before and after relabeling with -remoteWrite.urlRelabelConfig. "+
"If the -remoteWrite.urlRelabelDebug is enabled, then the metrics aren't sent to the corresponding -remoteWrite.url. "+
"This is useful for debugging the relabeling configs")
)
var labelsGlobal []prompbmarshal.Label
@@ -31,7 +36,7 @@ func CheckRelabelConfigs() error {
func loadRelabelConfigs() (*relabelConfigs, error) {
var rcs relabelConfigs
if *relabelConfigPathGlobal != "" {
global, err := promrelabel.LoadRelabelConfigs(*relabelConfigPathGlobal)
global, err := promrelabel.LoadRelabelConfigs(*relabelConfigPathGlobal, *relabelDebugGlobal)
if err != nil {
return nil, fmt.Errorf("cannot load -remoteWrite.relabelConfig=%q: %w", *relabelConfigPathGlobal, err)
}
@@ -47,7 +52,7 @@ func loadRelabelConfigs() (*relabelConfigs, error) {
// Skip empty relabel config.
continue
}
prc, err := promrelabel.LoadRelabelConfigs(path)
prc, err := promrelabel.LoadRelabelConfigs(path, relabelDebug.GetOptionalArg(i))
if err != nil {
return nil, fmt.Errorf("cannot load relabel configs from -remoteWrite.urlRelabelConfig=%q: %w", path, err)
}

View File

@@ -3,9 +3,13 @@ package remotewrite
import (
"flag"
"fmt"
"strconv"
"sync"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bloomfilter"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -13,6 +17,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/metrics"
xxhash "github.com/cespare/xxhash/v2"
)
@@ -23,8 +28,8 @@ var (
"Pass multiple -remoteWrite.url flags in order to write data concurrently to multiple remote storage systems")
tmpDataPath = flag.String("remoteWrite.tmpDataPath", "vmagent-remotewrite-data", "Path to directory where temporary data for remote write component is stored. "+
"See also -remoteWrite.maxDiskUsagePerURL")
queues = flag.Int("remoteWrite.queues", 4, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage")
queues = flag.Int("remoteWrite.queues", cgroup.AvailableCPUs()*2, "The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues "+
"isn't enough for sending high volume of collected data to remote storage. Default value if 2 * numberOfAvailableCPUs")
showRemoteWriteURL = flag.Bool("remoteWrite.showURL", false, "Whether to show -remoteWrite.url in the exported metrics. "+
"It is hidden by default, since it can contain sensitive info such as auth key")
maxPendingBytesPerURL = flagutil.NewBytes("remoteWrite.maxDiskUsagePerURL", 0, "The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath "+
@@ -38,6 +43,14 @@ var (
"Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. "+
"By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. "+
"This option may be used for improving data compression for the stored metrics")
sortLabels = flag.Bool("sortLabels", false, `Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. `+
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
maxHourlySeries = flag.Int("remoteWrite.maxHourlySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -remoteWrite.maxDailySeries")
maxDailySeries = flag.Int("remoteWrite.maxDailySeries", 0, "The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -remoteWrite.maxHourlySeries")
)
var rwctxs []*remoteWriteCtx
@@ -66,6 +79,24 @@ func Init() {
if len(*remoteWriteURLs) == 0 {
logger.Fatalf("at least one `-remoteWrite.url` command-line flag must be set")
}
if *maxHourlySeries > 0 {
hourlySeriesLimiter = bloomfilter.NewLimiter(*maxHourlySeries, time.Hour)
_ = metrics.NewGauge(`vmagent_hourly_series_limit_max_series`, func() float64 {
return float64(hourlySeriesLimiter.MaxItems())
})
_ = metrics.NewGauge(`vmagent_hourly_series_limit_current_series`, func() float64 {
return float64(hourlySeriesLimiter.CurrentItems())
})
}
if *maxDailySeries > 0 {
dailySeriesLimiter = bloomfilter.NewLimiter(*maxDailySeries, 24*time.Hour)
_ = metrics.NewGauge(`vmagent_daily_series_limit_max_series`, func() float64 {
return float64(dailySeriesLimiter.MaxItems())
})
_ = metrics.NewGauge(`vmagent_daily_series_limit_current_series`, func() float64 {
return float64(dailySeriesLimiter.CurrentItems())
})
}
if *queues > maxQueues {
*queues = maxQueues
}
@@ -73,6 +104,12 @@ func Init() {
*queues = 1
}
initLabelsGlobal()
// Register SIGHUP handler for config reload before loadRelabelConfigs.
// This guarantees that the config will be re-read if the signal arrives just after loadRelabelConfig.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
rcs, err := loadRelabelConfigs()
if err != nil {
logger.Fatalf("cannot load relabel configs: %s", err)
@@ -99,7 +136,6 @@ func Init() {
}
// Start config reloader.
sighupCh := procutil.NewSighupChan()
configReloaderWG.Add(1)
go func() {
defer configReloaderWG.Done()
@@ -173,8 +209,12 @@ func Push(wr *prompbmarshal.WriteRequest) {
tssBlock = rctx.applyRelabeling(tssBlock, labelsGlobal, pcsGlobal)
globalRelabelMetricsDropped.Add(tssBlockLen - len(tssBlock))
}
for _, rwctx := range rwctxs {
rwctx.Push(tssBlock)
sortLabelsIfNeeded(tssBlock)
tssBlock = limitSeriesCardinality(tssBlock)
if len(tssBlock) > 0 {
for _, rwctx := range rwctxs {
rwctx.Push(tssBlock)
}
}
if rctx != nil {
rctx.reset()
@@ -185,6 +225,87 @@ func Push(wr *prompbmarshal.WriteRequest) {
}
}
// sortLabelsIfNeeded sorts labels if -sortLabels command-line flag is set.
func sortLabelsIfNeeded(tss []prompbmarshal.TimeSeries) {
if !*sortLabels {
return
}
for i := range tss {
promrelabel.SortLabels(tss[i].Labels)
}
}
func limitSeriesCardinality(tss []prompbmarshal.TimeSeries) []prompbmarshal.TimeSeries {
if hourlySeriesLimiter == nil && dailySeriesLimiter == nil {
return tss
}
dst := make([]prompbmarshal.TimeSeries, 0, len(tss))
for i := range tss {
labels := tss[i].Labels
h := getLabelsHash(labels)
if hourlySeriesLimiter != nil && !hourlySeriesLimiter.Add(h) {
hourlySeriesLimitRowsDropped.Add(len(tss[i].Samples))
logSkippedSeries(labels, "-remoteWrite.maxHourlySeries", hourlySeriesLimiter.MaxItems())
continue
}
if dailySeriesLimiter != nil && !dailySeriesLimiter.Add(h) {
dailySeriesLimitRowsDropped.Add(len(tss[i].Samples))
logSkippedSeries(labels, "-remoteWrite.maxDailySeries", dailySeriesLimiter.MaxItems())
continue
}
dst = append(dst, tss[i])
}
return dst
}
var (
hourlySeriesLimiter *bloomfilter.Limiter
dailySeriesLimiter *bloomfilter.Limiter
hourlySeriesLimitRowsDropped = metrics.NewCounter(`vmagent_hourly_series_limit_rows_dropped_total`)
dailySeriesLimitRowsDropped = metrics.NewCounter(`vmagent_daily_series_limit_rows_dropped_total`)
)
func getLabelsHash(labels []prompbmarshal.Label) uint64 {
bb := labelsHashBufPool.Get()
b := bb.B[:0]
for _, label := range labels {
b = append(b, label.Name...)
b = append(b, label.Value...)
}
h := xxhash.Sum64(b)
bb.B = b
labelsHashBufPool.Put(bb)
return h
}
var labelsHashBufPool bytesutil.ByteBufferPool
func logSkippedSeries(labels []prompbmarshal.Label, flagName string, flagValue int) {
select {
case <-logSkippedSeriesTicker.C:
logger.Warnf("skip series %s because %s=%d reached", labelsToString(labels), flagName, flagValue)
default:
}
}
var logSkippedSeriesTicker = time.NewTicker(5 * time.Second)
func labelsToString(labels []prompbmarshal.Label) string {
var b []byte
b = append(b, '{')
for i, label := range labels {
b = append(b, label.Name...)
b = append(b, '=')
b = strconv.AppendQuote(b, label.Value)
if i+1 < len(labels) {
b = append(b, ',')
}
}
b = append(b, '}')
return string(b)
}
var globalRelabelMetricsDropped = metrics.NewCounter("vmagent_remotewrite_global_relabel_metrics_dropped_total")
type remoteWriteCtx struct {

View File

@@ -1,51 +0,0 @@
package remotewrite
import (
"flag"
"sort"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
var sortLabels = flag.Bool("sortLabels", false, `Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. `+
`This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. `+
`For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}`+
`Enabled sorting for labels can slow down ingestion performance a bit`)
// sortLabelsIfNeeded sorts labels if -sortLabels command-line flag is set.
func sortLabelsIfNeeded(tss []prompbmarshal.TimeSeries) {
if !*sortLabels {
return
}
// The slc is used for avoiding memory allocation when passing labels to sort.Sort.
slc := sortLabelsCtxPool.Get().(*sortLabelsCtx)
for i := range tss {
slc.labels = tss[i].Labels
sort.Sort(&slc.labels)
}
slc.labels = nil
sortLabelsCtxPool.Put(slc)
}
type sortLabelsCtx struct {
labels sortedLabels
}
var sortLabelsCtxPool = &sync.Pool{
New: func() interface{} {
return &sortLabelsCtx{}
},
}
type sortedLabels []prompbmarshal.Label
func (sl *sortedLabels) Len() int { return len(*sl) }
func (sl *sortedLabels) Less(i, j int) bool {
a := *sl
return a[i].Name < a[j].Name
}
func (sl *sortedLabels) Swap(i, j int) {
a := *sl
a[i], a[j] = a[j], a[i]
}

View File

@@ -66,7 +66,17 @@ run-vmalert: vmalert
-remoteRead.url=http://localhost:8428 \
-external.label=cluster=east-1 \
-external.label=replica=a \
-evaluationInterval=3s
-evaluationInterval=3s \
-rule.configCheckInterval=10s
replay-vmalert: vmalert
./bin/vmalert -rule=app/vmalert/config/testdata/rules-replay-good.rules \
-datasource.url=http://localhost:8428 \
-remoteWrite.url=http://localhost:8428 \
-external.label=cluster=east-1 \
-external.label=replica=a \
-replay.timeFrom=2021-05-11T07:21:43Z \
-replay.timeTo=2021-05-29T18:40:43Z
vmalert-amd64:
CGO_ENABLED=1 GOARCH=amd64 $(MAKE) vmalert-local-with-goarch

View File

@@ -1,8 +1,9 @@
# vmalert
`vmalert` executes a list of given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
`vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
rules against configured address.
rules against configured address. It is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
implementation and aims to be compatible with its syntax.
## Features
* Integration with [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) TSDB;
@@ -12,7 +13,8 @@ rules against configured address.
support;
* Integration with [Alertmanager](https://github.com/prometheus/alertmanager);
* Keeps the alerts [state on restarts](#alerts-state-on-restarts);
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite) for details.
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite);
* Recording and Alerting rules backfilling (aka `replay`). See [these docs](#rules-backfilling);
* Lightweight without extra dependencies.
## Limitations
@@ -39,21 +41,23 @@ To start using `vmalert` you will need the following things:
* datasource address - reachable VictoriaMetrics instance for rules execution;
* notifier address - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
aggregating alerts and sending notifications.
* remote write address - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries. This is optional.
* remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
compatible storage address for storing recording rules results and alerts state in for of timeseries.
Then configure `vmalert` accordingly:
```
./bin/vmalert -rule=alert.rules \
./bin/vmalert -rule=alert.rules \ # Path to the file with rules configuration. Supports wildcard
-datasource.url=http://localhost:8428 \ # PromQL compatible datasource
-notifier.url=http://localhost:9093 \ # AlertManager URL
-notifier.url=http://127.0.0.1:9093 \ # AlertManager replica URL
-remoteWrite.url=http://localhost:8428 \ # remote write compatible storage to persist rules
-remoteRead.url=http://localhost:8428 \ # PromQL compatible datasource to restore alerts state from
-remoteWrite.url=http://localhost:8428 \ # Remote write compatible storage to persist rules
-remoteRead.url=http://localhost:8428 \ # MetricsQL compatible datasource to restore alerts state from
-external.label=cluster=east-1 \ # External label to be applied for each rule
-external.label=replica=a # Multiple external labels may be set
```
See the fill list of configuration flags in [configuration](#configuration) section.
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
@@ -61,7 +65,7 @@ Configuration for [recording](https://prometheus.io/docs/prometheus/latest/confi
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
similar to Prometheus rules and configured using YAML. Configuration examples may be found
in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder.
Every `rule` belongs to `group` and every configuration file may contain arbitrary number of groups:
Every `rule` belongs to a `group` and every configuration file may contain arbitrary number of groups:
```yaml
groups:
[ - <rule_group> ]
@@ -69,15 +73,15 @@ groups:
### Groups
Each group has following attributes:
Each group has the following attributes:
```yaml
# The name of the group. Must be unique within a file.
name: <string>
# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]
[ interval: <duration> | default = -evaluationInterval flag ]
# How many rules execute at once. Increasing concurrency may speed
# How many rules execute at once within a group. Increasing concurrency may speed
# up round execution speed.
[ concurrency: <integer> | default = 1 ]
@@ -85,26 +89,37 @@ name: <string>
# By default "prometheus" rule type is used.
[ type: <string> ]
# Optional list of label filters applied to every rule's
# request withing a group. Is compatible only with VM datasource.
# See more details at https://docs.victoriametrics.com#prometheus-querying-api-enhancements
extra_filter_labels:
[ <labelname>: <labelvalue> ... ]
rules:
[ - <rule> ... ]
```
### Rules
Every rule contains `expr` field for [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/)
or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) expression. Vmalert will execute the configured
expression and then act according to the Rule type.
There are two types of Rules:
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
Alerting rules allows to define alert conditions via [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html)
and to send notifications about firing alerts to [Alertmanager](https://github.com/prometheus/alertmanager).
Alerting rules allows to define alert conditions via `expr` field and to send notifications
[Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty.
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
Recording rules allow you to precompute frequently needed or computationally expensive expressions
and save their result as a new set of time series.
Recording rules allows to define `expr` which result will be than backfilled to configured
`-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series.
`vmalert` forbids to define duplicates - rules with the same combination of name, expression and labels
within one group.
#### Alerting rules
The syntax for alerting rule is following:
The syntax for alerting rule is the following:
```yaml
# The name of the alert. Must be a valid metric name.
alert: <string>
@@ -114,12 +129,14 @@ alert: <string>
[ type: <string> ]
# The expression to evaluate. The expression language depends on the type value.
# By default MetricsQL expression is used. If type="graphite", then the expression
# By default PromQL/MetricsQL expression is used. If type="graphite", then the expression
# must contain valid Graphite expression.
expr: <string>
# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
# If param is omitted or set to 0 then alerts will be immediately considered
# as firing once they return.
[ for: <duration> | default = 0s ]
# Labels to add or overwrite for each alert.
@@ -157,12 +174,12 @@ labels:
[ <labelname>: <labelvalue> ]
```
For recording rules to work `-remoteWrite.url` must specified.
For recording rules to work `-remoteWrite.url` must be specified.
### Alerts state on restarts
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after reloading of `vmalert`
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after restart of `vmalert`
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
@@ -172,8 +189,48 @@ The state stored to the configured address on every rule evaluation.
from configured address by querying time series with name `ALERTS_FOR_STATE`.
Both flags are required for the proper state restoring. Restore process may fail if time series are missing
in configured `-remoteRead.url`, weren't updated in the last `1h` or received state doesn't match current `vmalert`
rules configuration.
in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`)
or received state doesn't match current `vmalert` rules configuration.
### Multitenancy
There are the following approaches for alerting and recording rules across
[multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy):
* To run a separate `vmalert` instance per each tenant.
The corresponding tenant must be specified in `-datasource.url` command-line flag
according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
For example, `/path/to/vmalert -datasource.url=http://vmselect:8481/select/123/prometheus`
would run alerts against `AccountID=123`. For recording rules the `-remoteWrite.url` command-line
flag must contain the url for the specific tenant as well.
For example, `-remoteWrite.url=http://vminsert:8480/insert/123/prometheus` would write recording
rules to `AccountID=123`.
* To specify `tenant` parameter per each alerting and recording group if
[enterprise version of vmalert](https://victoriametrics.com/enterprise.html) is used
with `-clusterMode` command-line flag. For example:
```yaml
groups:
- name: rules_for_tenant_123
tenant: "123"
rules:
# Rules for accountID=123
- name: rules_for_tenant_456:789
tenant: "456:789"
rules:
# Rules for accountID=456, projectID=789
```
If `-clusterMode` is enabled, then `-datasource.url`, `-remoteRead.url` and `-remoteWrite.url` must
contain only the hostname without tenant id. For example: `-datasource.url=http://vmselect:8481`.
`vmselect` automatically adds the specified tenant to urls per each recording rule in this case.
The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz` files
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
### WEB
@@ -195,190 +252,296 @@ implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/rende
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on query type.
## Rules backfilling
vmalert supports alerting and recording rules backfilling (aka `replay`). In replay mode vmalert
can read the same rules configuration as normally, evaluate them on the given time range and backfill
results via remote write to the configured storage. vmalert supports any PromQL/MetricsQL compatible
data source for backfilling.
### How it works
In `replay` mode vmalert works as a cli-tool and exits immediately after work is done.
To run vmalert in `replay` mode:
```
./bin/vmalert -rule=path/to/your.rules \ # path to files with rules you usually use with vmalert
-datasource.url=http://localhost:8428 \ # PromQL/MetricsQL compatible datasource
-remoteWrite.url=http://localhost:8428 \ # remote write compatible storage to persist results
-replay.timeFrom=2021-05-11T07:21:43Z \ # time from begin replay
-replay.timeTo=2021-05-29T18:40:43Z # time to finish replay
```
The output of the command will look like the following:
```
Replay mode:
from: 2021-05-11 07:21:43 +0000 UTC # set by -replay.timeFrom
to: 2021-05-29 18:40:43 +0000 UTC # set by -replay.timeTo
max data points per request: 1000 # set by -replay.maxDatapointsPerQuery
Group "ReplayGroup"
interval: 1m0s
requests to make: 27
max range per request: 16h40m0s
> Rule "type:vm_cache_entries:rate5m" (ID: 1792509946081842725)
27 / 27 [----------------------------------------------------------------------------------------------------] 100.00% 78 p/s
> Rule "go_cgo_calls_count:rate5m" (ID: 17958425467471411582)
27 / 27 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
Group "vmsingleReplay"
interval: 30s
requests to make: 54
max range per request: 8h20m0s
> Rule "RequestErrorsToAPI" (ID: 17645863024999990222)
54 / 54 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
> Rule "TooManyLogs" (ID: 9042195394653477652)
54 / 54 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
2021-06-07T09:59:12.098Z info app/vmalert/replay.go:68 replay finished! Imported 511734 samples
```
In `replay` mode all groups are executed sequentially one-by-one. Rules within the group are
executed sequentially as well (`concurrency` setting is ignored). Vmalert sends rule's expression
to [/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) endpoint
of the configured `-datasource.url`. Returned data then processed according to the rule type and
backfilled to `-remoteWrite.url` via [Remote Write protocol](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations).
Vmalert respects `evaluationInterval` value set by flag or per-group during the replay.
#### Recording rules
Result of recording rules `replay` should match with results of normal rules evaluation.
#### Alerting rules
Result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts).
To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression:
```
ALERTS{alertname="your_alertname", alertstate="firing"}
```
Execute the query against storage which was used for `-remoteWrite.url` during the `replay`.
### Additional configuration
There are following non-required `replay` flags:
* `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request.
In two words, it affects the max time range for every `/query_range` request. The higher the value,
the less requests will be issued during `replay`.
* `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries
per rule before giving up.
* `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining
(rules which depend on each other) rules. It is expected, that remote storage will be able to persist
previously accepted data during the delay, so data will be available for the subsequent queries.
Keep it equal or bigger than `-remoteWrite.flushInterval`.
See full description for these flags in `./vmalert --help`.
### Limitations
* Graphite engine isn't supported yet;
* `query` template function is disabled for performance reasons (might be changed in future);
## Configuration
The shortlist of configuration flags is the following:
```
-datasource.appendTypePrefix
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
-datasource.basicAuth.password string
Optional basic auth password for -datasource.url
-datasource.basicAuth.username string
Optional basic auth username for -datasource.url
-datasource.lookback duration
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
-datasource.maxIdleConnections int
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
-datasource.queryStep duration
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
-datasource.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
-datasource.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
-datasource.tlsInsecureSkipVerify
Whether to skip tls verification when connecting to -datasource.url
-datasource.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
-datasource.tlsServerName string
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
-datasource.url string
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
-dryRun -rule
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-evaluationInterval duration
How often to evaluate the rules (default 1m0s)
-external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports an array of values separated by comma or specified via multiple flags.
-external.url string
External URL is used as alert's source for sent alerts to the notifier
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealay, the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
Address to listen for http connections (default ":8880")
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-notifier.basicAuth.password array
Optional basic auth password for -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.basicAuth.username array
Optional basic auth username for -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsInsecureSkipVerify array
Whether to skip tls verification when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
-notifier.tlsKeyFile array
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsServerName array
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
Supports an array of values separated by comma or specified via multiple flags.
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-remoteRead.basicAuth.password string
Optional basic auth password for -remoteRead.url
-remoteRead.basicAuth.username string
Optional basic auth username for -remoteRead.url
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
-remoteRead.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
-remoteRead.tlsInsecureSkipVerify
Whether to skip tls verification when connecting to -remoteRead.url
-remoteRead.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
-remoteRead.tlsServerName string
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
-remoteRead.url vmalert
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
-remoteWrite.basicAuth.password string
Optional basic auth password for -remoteWrite.url
-remoteWrite.basicAuth.username string
Optional basic auth username for -remoteWrite.url
-remoteWrite.concurrency int
Defines number of writers for concurrent writing into remote querier (default 1)
-remoteWrite.flushInterval duration
Defines interval of flushes to remote write endpoint (default 5s)
-remoteWrite.maxBatchSize int
Defines defines max number of timeseries to be flushed at once (default 1000)
-remoteWrite.maxQueueSize int
Defines the max number of pending datapoints to remote write endpoint (default 100000)
-remoteWrite.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
-remoteWrite.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
-remoteWrite.tlsInsecureSkipVerify
Whether to skip tls verification when connecting to -remoteWrite.url
-remoteWrite.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
-remoteWrite.tlsServerName string
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
-remoteWrite.url string
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Supports an array of values separated by comma or specified via multiple flags.
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
Whether to validate annotation and label templates (default true)
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```
Pass `-help` to `vmalert` in order to see the full list of supported
command-line flags with their descriptions.
To reload configuration without `vmalert` restart send SIGHUP signal
or send GET request to `/-/reload` endpoint.
The shortlist of configuration flags is the following:
```
-datasource.appendTypePrefix
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
-datasource.basicAuth.password string
Optional basic auth password for -datasource.url
-datasource.basicAuth.username string
Optional basic auth username for -datasource.url
-datasource.lookback duration
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
-datasource.maxIdleConnections int
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
-datasource.queryStep duration
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
-datasource.roundDigits int
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
-datasource.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
-datasource.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
-datasource.tlsInsecureSkipVerify
Whether to skip tls verification when connecting to -datasource.url
-datasource.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
-datasource.tlsServerName string
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
-datasource.url string
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
-dryRun -rule
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
-envflag.enable
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
-envflag.prefix string
Prefix for environment variables if -envflag.enable is set
-evaluationInterval duration
How often to evaluate the rules (default 1m0s)
-external.alert.source string
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
-external.label array
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
Supports an array of values separated by comma or specified via multiple flags.
-external.url string
External URL is used as alert's source for sent alerts to the notifier
-fs.disableMmap
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
-http.connTimeout duration
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
-http.disableResponseCompression
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
-http.idleConnTimeout duration
Timeout for incoming idle http connections (default 1m0s)
-http.maxGracefulShutdownDuration duration
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
Address to listen for http connections (default ":8880")
-loggerDisableTimestamps
Whether to disable writing timestamps in logs
-loggerErrorsPerSecondLimit int
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
-loggerFormat string
Format for logs. Possible values: default, json (default "default")
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
-loggerOutput string
Output for the logs. Supported values: stderr, stdout (default "stderr")
-loggerTimezone string
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-notifier.basicAuth.password array
Optional basic auth password for -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.basicAuth.username array
Optional basic auth username for -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCAFile array
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsCertFile array
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsInsecureSkipVerify array
Whether to skip tls verification when connecting to -notifier.url
Supports array of values separated by comma or specified via multiple flags.
-notifier.tlsKeyFile array
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
Supports an array of values separated by comma or specified via multiple flags.
-notifier.tlsServerName array
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
Supports an array of values separated by comma or specified via multiple flags.
-notifier.url array
Prometheus alertmanager URL. Required parameter. e.g. http://127.0.0.1:9093
Supports an array of values separated by comma or specified via multiple flags.
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-remoteRead.basicAuth.password string
Optional basic auth password for -remoteRead.url
-remoteRead.basicAuth.username string
Optional basic auth username for -remoteRead.url
-remoteRead.ignoreRestoreErrors
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
-remoteRead.lookback duration
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
-remoteRead.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
-remoteRead.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
-remoteRead.tlsInsecureSkipVerify
Whether to skip tls verification when connecting to -remoteRead.url
-remoteRead.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
-remoteRead.tlsServerName string
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
-remoteRead.url vmalert
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428
-remoteWrite.basicAuth.password string
Optional basic auth password for -remoteWrite.url
-remoteWrite.basicAuth.username string
Optional basic auth username for -remoteWrite.url
-remoteWrite.concurrency int
Defines number of writers for concurrent writing into remote querier (default 1)
-remoteWrite.flushInterval duration
Defines interval of flushes to remote write endpoint (default 5s)
-remoteWrite.maxBatchSize int
Defines defines max number of timeseries to be flushed at once (default 1000)
-remoteWrite.maxQueueSize int
Defines the max number of pending datapoints to remote write endpoint (default 100000)
-remoteWrite.tlsCAFile string
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
-remoteWrite.tlsCertFile string
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
-remoteWrite.tlsInsecureSkipVerify
Whether to skip tls verification when connecting to -remoteWrite.url
-remoteWrite.tlsKeyFile string
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
-remoteWrite.tlsServerName string
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
-remoteWrite.url string
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. E.g. http://127.0.0.1:8428
-replay.maxDatapointsPerQuery int
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
-replay.ruleRetryAttempts int
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
-replay.rulesDelay duration
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule. Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
-replay.timeFrom string
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
-replay.timeTo string
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
-rule array
Path to the file with alert rules.
Supports patterns. Flag can be specified multiple times.
Examples:
-rule="/path/to/file". Path to a single file with alerting rules
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
Supports an array of values separated by comma or specified via multiple flags.
-rule.configCheckInterval duration
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
-rule.validateExpressions
Whether to validate rules expressions via MetricsQL engine (default true)
-rule.validateTemplates
Whether to validate annotation and label templates (default true)
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
-tlsKeyFile string
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
```
`vmalert` supports "hot" config reload via the following methods:
* send SIGHUP signal to `vmalert` process;
* send GET request to `/-/reload` endpoint;
* configure `-rule.configCheckInterval` flag for periodic reload
on config change.
## Contributing
@@ -395,7 +558,7 @@ It is recommended using
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmalert` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert` binary and puts it into the `bin` folder.
@@ -412,7 +575,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmalert-arm` or `make vmalert-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmalert-arm` or `vmalert-arm64` binary respectively and puts it into the `bin` folder.

View File

@@ -19,15 +19,16 @@ import (
// AlertingRule is basic alert entity
type AlertingRule struct {
Type datasource.Type
RuleID uint64
Name string
Expr string
For time.Duration
Labels map[string]string
Annotations map[string]string
GroupID uint64
GroupName string
Type datasource.Type
RuleID uint64
Name string
Expr string
For time.Duration
Labels map[string]string
Annotations map[string]string
GroupID uint64
GroupName string
EvalInterval time.Duration
q datasource.Querier
@@ -53,18 +54,20 @@ type alertingRuleMetrics struct {
func newAlertingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rule) *AlertingRule {
ar := &AlertingRule{
Type: cfg.Type,
RuleID: cfg.ID,
Name: cfg.Alert,
Expr: cfg.Expr,
For: cfg.For.Duration(),
Labels: cfg.Labels,
Annotations: cfg.Annotations,
GroupID: group.ID(),
GroupName: group.Name,
Type: cfg.Type,
RuleID: cfg.ID,
Name: cfg.Alert,
Expr: cfg.Expr,
For: cfg.For.Duration(),
Labels: cfg.Labels,
Annotations: cfg.Annotations,
GroupID: group.ID(),
GroupName: group.Name,
EvalInterval: group.Interval,
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: &cfg.Type,
EvaluationInterval: group.Interval,
ExtraLabels: group.ExtraFilterLabels,
}),
alerts: make(map[uint64]*notifier.Alert),
metrics: &alertingRuleMetrics{},
@@ -125,9 +128,66 @@ func (ar *AlertingRule) ID() uint64 {
return ar.RuleID
}
// ExecRange executes alerting rule on the given time range similarly to Exec.
// It doesn't update internal states of the Rule and meant to be used just
// to get time series for backfilling.
// It returns ALERT and ALERT_FOR_STATE time series as result.
func (ar *AlertingRule) ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
series, err := ar.q.QueryRange(ctx, ar.Expr, start, end)
if err != nil {
return nil, err
}
var result []prompbmarshal.TimeSeries
qFn := func(query string) ([]datasource.Metric, error) {
return nil, fmt.Errorf("`query` template isn't supported in replay mode")
}
for _, s := range series {
// extra labels could contain templates, so we expand them first
labels, err := expandLabels(s, qFn, ar)
if err != nil {
return nil, fmt.Errorf("failed to expand labels: %s", err)
}
for k, v := range labels {
// apply extra labels to datasource
// so the hash key will be consistent on restore
s.SetLabel(k, v)
}
a, err := ar.newAlert(s, time.Time{}, qFn) // initial alert
if err != nil {
return nil, fmt.Errorf("failed to create alert: %s", err)
}
if ar.For == 0 { // if alert is instant
a.State = notifier.StateFiring
for i := range s.Values {
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
}
continue
}
// if alert with For > 0
prevT := time.Time{}
//activeAt := time.Time{}
for i := range s.Values {
at := time.Unix(s.Timestamps[i], 0)
if at.Sub(prevT) > ar.EvalInterval {
// reset to Pending if there are gaps > EvalInterval between DPs
a.State = notifier.StatePending
//activeAt = at
a.Start = at
} else if at.Sub(a.Start) >= ar.For {
a.State = notifier.StateFiring
}
prevT = at
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
}
}
return result, nil
}
// Exec executes AlertingRule expression via the given Querier.
// Based on the Querier results AlertingRule maintains notifier.Alerts
func (ar *AlertingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal.TimeSeries, error) {
func (ar *AlertingRule) Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := ar.q.Query(ctx, ar.Expr)
ar.mu.Lock()
defer ar.mu.Unlock()
@@ -167,9 +227,9 @@ func (ar *AlertingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal.
}
updated[h] = struct{}{}
if a, ok := ar.alerts[h]; ok {
if a.Value != m.Value {
if a.Value != m.Values[0] {
// update Value field with latest value
a.Value = m.Value
a.Value = m.Values[0]
// and re-exec template since Value can be used
// in annotations
a.Annotations, err = a.ExecTemplate(qFn, ar.Annotations)
@@ -207,10 +267,7 @@ func (ar *AlertingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal.
alertsFired.Inc()
}
}
if series {
return ar.toTimeSeries(ar.lastExecTime), nil
}
return nil, nil
return ar.toTimeSeries(ar.lastExecTime.Unix()), nil
}
func expandLabels(m datasource.Metric, q notifier.QueryFn, ar *AlertingRule) (map[string]string, error) {
@@ -220,13 +277,13 @@ func expandLabels(m datasource.Metric, q notifier.QueryFn, ar *AlertingRule) (ma
}
tpl := notifier.AlertTplData{
Labels: metricLabels,
Value: m.Value,
Value: m.Values[0],
Expr: ar.Expr,
}
return notifier.ExecTemplate(q, ar.Labels, tpl)
}
func (ar *AlertingRule) toTimeSeries(timestamp time.Time) []prompbmarshal.TimeSeries {
func (ar *AlertingRule) toTimeSeries(timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
for _, a := range ar.alerts {
if a.State == notifier.StateInactive {
@@ -250,6 +307,8 @@ func (ar *AlertingRule) UpdateWith(r Rule) error {
ar.For = nr.For
ar.Labels = nr.Labels
ar.Annotations = nr.Annotations
ar.EvalInterval = nr.EvalInterval
ar.q = nr.q
return nil
}
@@ -277,13 +336,15 @@ func (ar *AlertingRule) newAlert(m datasource.Metric, start time.Time, qFn notif
GroupID: ar.GroupID,
Name: ar.Name,
Labels: map[string]string{},
Value: m.Value,
Value: m.Values[0],
Start: start,
Expr: ar.Expr,
}
// label defined here to make override possible by
// time series labels.
a.Labels[alertGroupNameLabel] = ar.GroupName
if ar.GroupName != "" {
a.Labels[alertGroupNameLabel] = ar.GroupName
}
for _, l := range m.Labels {
// drop __name__ to be consistent with Prometheus alerting
if l.Name == "__name__" {
@@ -372,7 +433,7 @@ const (
)
// alertToTimeSeries converts the given alert with the given timestamp to timeseries
func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp time.Time) []prompbmarshal.TimeSeries {
func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
tss = append(tss, alertToTimeSeries(ar.Name, a, timestamp))
if ar.For > 0 {
@@ -381,7 +442,7 @@ func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp time.Time
return tss
}
func alertToTimeSeries(name string, a *notifier.Alert, timestamp time.Time) prompbmarshal.TimeSeries {
func alertToTimeSeries(name string, a *notifier.Alert, timestamp int64) prompbmarshal.TimeSeries {
labels := make(map[string]string)
for k, v := range a.Labels {
labels[k] = v
@@ -389,19 +450,19 @@ func alertToTimeSeries(name string, a *notifier.Alert, timestamp time.Time) prom
labels["__name__"] = alertMetricName
labels[alertNameLabel] = name
labels[alertStateLabel] = a.State.String()
return newTimeSeries(1, labels, timestamp)
return newTimeSeries([]float64{1}, []int64{timestamp}, labels)
}
// alertForToTimeSeries returns a timeseries that represents
// state of active alerts, where value is time when alert become active
func alertForToTimeSeries(name string, a *notifier.Alert, timestamp time.Time) prompbmarshal.TimeSeries {
func alertForToTimeSeries(name string, a *notifier.Alert, timestamp int64) prompbmarshal.TimeSeries {
labels := make(map[string]string)
for k, v := range a.Labels {
labels[k] = v
}
labels["__name__"] = alertForStateMetricName
labels[alertNameLabel] = name
return newTimeSeries(float64(a.Start.Unix()), labels, timestamp)
return newTimeSeries([]float64{float64(a.Start.Unix())}, []int64{timestamp}, labels)
}
// Restore restores the state of active alerts basing on previously written timeseries.
@@ -443,7 +504,7 @@ func (ar *AlertingRule) Restore(ctx context.Context, q datasource.Querier, lookb
m.Labels = append(m.Labels, l)
}
a, err := ar.newAlert(m, time.Unix(int64(m.Value), 0), qFn)
a, err := ar.newAlert(m, time.Unix(int64(m.Values[0]), 0), qFn)
if err != nil {
return fmt.Errorf("failed to create alert: %w", err)
}

View File

@@ -24,11 +24,11 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
newTestAlertingRule("instant", 0),
&notifier.Alert{State: notifier.StateFiring},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "instant",
}, timestamp),
}),
},
},
{
@@ -38,13 +38,13 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
"instance": "bar",
}},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "instant extra labels",
"job": "foo",
"instance": "bar",
}, timestamp),
}),
},
},
{
@@ -54,48 +54,52 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
"__name__": "bar",
}},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "instant labels override",
}, timestamp),
}),
},
},
{
newTestAlertingRule("for", time.Second),
&notifier.Alert{State: notifier.StateFiring, Start: timestamp.Add(time.Second)},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
alertNameLabel: "for",
}, timestamp),
newTimeSeries(float64(timestamp.Add(time.Second).Unix()), map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for",
}, timestamp),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for",
}),
},
},
{
newTestAlertingRule("for pending", 10*time.Second),
&notifier.Alert{State: notifier.StatePending, Start: timestamp.Add(time.Second)},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StatePending.String(),
alertNameLabel: "for pending",
}, timestamp),
newTimeSeries(float64(timestamp.Add(time.Second).Unix()), map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for pending",
}, timestamp),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
alertNameLabel: "for pending",
}),
},
},
}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
tc.rule.alerts[tc.alert.ID] = tc.alert
tss := tc.rule.toTimeSeries(timestamp)
tss := tc.rule.toTimeSeries(timestamp.Unix())
if err := compareTimeSeries(t, tc.expTS, tss); err != nil {
t.Fatalf("timeseries missmatch: %s", err)
}
@@ -118,7 +122,7 @@ func TestAlertingRule_Exec(t *testing.T) {
{
newTestAlertingRule("empty labels", 0),
[][]datasource.Metric{
{datasource.Metric{}},
{datasource.Metric{Values: []float64{1}, Timestamps: []int64{1}}},
},
map[uint64]*notifier.Alert{
hash(datasource.Metric{}): {State: notifier.StateFiring},
@@ -299,7 +303,7 @@ func TestAlertingRule_Exec(t *testing.T) {
for _, step := range tc.steps {
fq.reset()
fq.add(step...)
if _, err := tc.rule.Exec(context.TODO(), false); err != nil {
if _, err := tc.rule.Exec(context.TODO()); err != nil {
t.Fatalf("unexpected err: %s", err)
}
// artificial delay between applying steps
@@ -321,6 +325,166 @@ func TestAlertingRule_Exec(t *testing.T) {
}
}
func TestAlertingRule_ExecRange(t *testing.T) {
testCases := []struct {
rule *AlertingRule
data []datasource.Metric
expAlerts []*notifier.Alert
}{
{
newTestAlertingRule("empty", 0),
[]datasource.Metric{},
nil,
},
{
newTestAlertingRule("empty labels", 0),
[]datasource.Metric{
{Values: []float64{1}, Timestamps: []int64{1}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
},
},
{
newTestAlertingRule("single-firing", 0),
[]datasource.Metric{
metricWithLabels(t, "name", "foo"),
},
[]*notifier.Alert{
{
Labels: map[string]string{"name": "foo"},
State: notifier.StateFiring,
},
},
},
{
newTestAlertingRule("single-firing-on-range", 0),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1e3, 2e3, 3e3}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
{State: notifier.StateFiring},
{State: notifier.StateFiring},
},
},
{
newTestAlertingRule("for-pending", time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(3, 0)},
{State: notifier.StatePending, Start: time.Unix(5, 0)},
},
},
{
newTestAlertingRule("for-firing", 3*time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StateFiring, Start: time.Unix(1, 0)},
},
},
{
newTestAlertingRule("for=>pending=>firing=>pending=>firing=>pending", time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1, 1, 1}, Timestamps: []int64{1, 2, 5, 6, 20}},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StateFiring, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(5, 0)},
{State: notifier.StateFiring, Start: time.Unix(5, 0)},
{State: notifier.StatePending, Start: time.Unix(20, 0)},
},
},
{
newTestAlertingRule("multi-series-for=>pending=>pending=>firing", 3*time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1}, Timestamps: []int64{1, 3, 5}},
{Values: []float64{1, 1}, Timestamps: []int64{1, 5},
Labels: []datasource.Label{{Name: "foo", Value: "bar"}},
},
},
[]*notifier.Alert{
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StatePending, Start: time.Unix(1, 0)},
{State: notifier.StateFiring, Start: time.Unix(1, 0)},
//
{State: notifier.StatePending, Start: time.Unix(1, 0),
Labels: map[string]string{
"foo": "bar",
}},
{State: notifier.StatePending, Start: time.Unix(5, 0),
Labels: map[string]string{
"foo": "bar",
}},
},
},
{
newTestRuleWithLabels("multi-series-firing", "source", "vm"),
[]datasource.Metric{
{Values: []float64{1, 1}, Timestamps: []int64{1, 100}},
{Values: []float64{1, 1}, Timestamps: []int64{1, 5},
Labels: []datasource.Label{{Name: "foo", Value: "bar"}},
},
},
[]*notifier.Alert{
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
//
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
},
},
}
fakeGroup := Group{Name: "TestRule_ExecRange"}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
fq.add(tc.data...)
gotTS, err := tc.rule.ExecRange(context.TODO(), time.Now(), time.Now())
if err != nil {
t.Fatalf("unexpected err: %s", err)
}
var expTS []prompbmarshal.TimeSeries
var j int
for _, series := range tc.data {
for _, timestamp := range series.Timestamps {
expTS = append(expTS, tc.rule.alertToTimeSeries(tc.expAlerts[j], timestamp)...)
j++
}
}
if len(gotTS) != len(expTS) {
t.Fatalf("expected %d time series; got %d", len(expTS), len(gotTS))
}
for i := range expTS {
got, exp := gotTS[i], expTS[i]
if !reflect.DeepEqual(got, exp) {
t.Fatalf("%d: expected \n%v but got \n%v", i, exp, got)
}
}
})
}
}
func TestAlertingRule_Restore(t *testing.T) {
testCases := []struct {
rule *AlertingRule
@@ -443,14 +607,14 @@ func TestAlertingRule_Exec_Negative(t *testing.T) {
// successful attempt
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "bar"))
_, err := ar.Exec(context.TODO(), false)
_, err := ar.Exec(context.TODO())
if err != nil {
t.Fatal(err)
}
// label `job` will collide with rule extra label and will make both time series equal
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "baz"))
_, err = ar.Exec(context.TODO(), false)
_, err = ar.Exec(context.TODO())
if !errors.Is(err, errDuplicate) {
t.Fatalf("expected to have %s error; got %s", errDuplicate, err)
}
@@ -459,7 +623,7 @@ func TestAlertingRule_Exec_Negative(t *testing.T) {
expErr := "connection reset by peer"
fq.setErr(errors.New(expErr))
_, err = ar.Exec(context.TODO(), false)
_, err = ar.Exec(context.TODO())
if err == nil {
t.Fatalf("expected to get err; got nil")
}
@@ -484,17 +648,15 @@ func TestAlertingRule_Template(t *testing.T) {
hash(metricWithLabels(t, "region", "east", "instance", "foo")): {
Annotations: map[string]string{},
Labels: map[string]string{
alertGroupNameLabel: "",
"region": "east",
"instance": "foo",
"region": "east",
"instance": "foo",
},
},
hash(metricWithLabels(t, "region", "east", "instance", "bar")): {
Annotations: map[string]string{},
Labels: map[string]string{
alertGroupNameLabel: "",
"region": "east",
"instance": "bar",
"region": "east",
"instance": "bar",
},
},
},
@@ -519,9 +681,8 @@ func TestAlertingRule_Template(t *testing.T) {
map[uint64]*notifier.Alert{
hash(metricWithLabels(t, "region", "east", "instance", "foo")): {
Labels: map[string]string{
alertGroupNameLabel: "",
"instance": "foo",
"region": "east",
"instance": "foo",
"region": "east",
},
Annotations: map[string]string{
"summary": `Too high connection number for "foo" for region east`,
@@ -530,9 +691,8 @@ func TestAlertingRule_Template(t *testing.T) {
},
hash(metricWithLabels(t, "region", "east", "instance", "bar")): {
Labels: map[string]string{
alertGroupNameLabel: "",
"instance": "bar",
"region": "east",
"instance": "bar",
"region": "east",
},
Annotations: map[string]string{
"summary": `Too high connection number for "bar" for region east`,
@@ -549,7 +709,7 @@ func TestAlertingRule_Template(t *testing.T) {
tc.rule.GroupID = fakeGroup.ID()
tc.rule.q = fq
fq.add(tc.metrics...)
if _, err := tc.rule.Exec(context.TODO(), false); err != nil {
if _, err := tc.rule.Exec(context.TODO()); err != nil {
t.Fatalf("unexpected err: %s", err)
}
for hash, expAlert := range tc.expAlerts {
@@ -579,5 +739,5 @@ func newTestRuleWithLabels(name string, labels ...string) *AlertingRule {
}
func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
return &AlertingRule{Name: name, alerts: make(map[uint64]*notifier.Alert), For: waitFor}
return &AlertingRule{Name: name, alerts: make(map[uint64]*notifier.Alert), For: waitFor, EvalInterval: waitFor}
}

View File

@@ -8,7 +8,6 @@ import (
"path/filepath"
"sort"
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
@@ -16,7 +15,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metricsql"
"gopkg.in/yaml.v2"
)
@@ -25,10 +23,14 @@ import (
type Group struct {
Type datasource.Type `yaml:"type,omitempty"`
File string
Name string `yaml:"name"`
Interval time.Duration `yaml:"interval,omitempty"`
Rules []Rule `yaml:"rules"`
Concurrency int `yaml:"concurrency"`
Name string `yaml:"name"`
Interval utils.PromDuration `yaml:"interval,omitempty"`
Rules []Rule `yaml:"rules"`
Concurrency int `yaml:"concurrency"`
// ExtraFilterLabels is a list label filters applied to every rule
// request withing a group. Is compatible only with VM datasources.
// See https://docs.victoriametrics.com#prometheus-querying-api-enhancements
ExtraFilterLabels map[string]string `yaml:"extra_filter_labels"`
// Checksum stores the hash of yaml definition for this group.
// May be used to detect any changes like rules re-ordering etc.
Checksum string
@@ -115,54 +117,18 @@ func (g *Group) Validate(validateAnnotations, validateExpressions bool) error {
// recording rule or alerting rule.
type Rule struct {
ID uint64
Type datasource.Type `yaml:"type,omitempty"`
Record string `yaml:"record,omitempty"`
Alert string `yaml:"alert,omitempty"`
Expr string `yaml:"expr"`
For PromDuration `yaml:"for"`
Labels map[string]string `yaml:"labels,omitempty"`
Annotations map[string]string `yaml:"annotations,omitempty"`
Type datasource.Type `yaml:"type,omitempty"`
Record string `yaml:"record,omitempty"`
Alert string `yaml:"alert,omitempty"`
Expr string `yaml:"expr"`
For utils.PromDuration `yaml:"for"`
Labels map[string]string `yaml:"labels,omitempty"`
Annotations map[string]string `yaml:"annotations,omitempty"`
// Catches all undefined fields and must be empty after parsing.
XXX map[string]interface{} `yaml:",inline"`
}
// PromDuration is Prometheus duration.
type PromDuration struct {
milliseconds int64
}
// NewPromDuration returns PromDuration for given d.
func NewPromDuration(d time.Duration) PromDuration {
return PromDuration{
milliseconds: d.Milliseconds(),
}
}
// MarshalYAML implements yaml.Marshaler interface.
func (pd PromDuration) MarshalYAML() (interface{}, error) {
return pd.Duration().String(), nil
}
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (pd *PromDuration) UnmarshalYAML(unmarshal func(interface{}) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
ms, err := metricsql.DurationValue(s, 0)
if err != nil {
return err
}
pd.milliseconds = ms
return nil
}
// Duration returns duration for pd.
func (pd *PromDuration) Duration() time.Duration {
return time.Duration(pd.milliseconds) * time.Millisecond
}
// UnmarshalYAML implements the yaml.Unmarshaler interface.
func (r *Rule) UnmarshalYAML(unmarshal func(interface{}) error) error {
type rule Rule

View File

@@ -8,10 +8,9 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"gopkg.in/yaml.v2"
)
func TestMain(m *testing.M) {
@@ -264,7 +263,7 @@ func TestGroup_Validate(t *testing.T) {
Rules: []Rule{
{
Expr: "sumSeries(time('foo.bar',10))",
For: PromDuration{milliseconds: 10},
For: utils.NewPromDuration(10 * time.Millisecond),
},
{
Expr: "sum(up == 0 ) by (host)",
@@ -280,7 +279,7 @@ func TestGroup_Validate(t *testing.T) {
Rules: []Rule{
{
Expr: "sum(up == 0 ) by (host)",
For: PromDuration{milliseconds: 10},
For: utils.NewPromDuration(10 * time.Millisecond),
},
{
Expr: "sumSeries(time('foo.bar',10))",
@@ -348,7 +347,7 @@ func TestHashRule(t *testing.T) {
true,
},
{
Rule{Alert: "alert", Expr: "up == 1", For: NewPromDuration(time.Minute)},
Rule{Alert: "alert", Expr: "up == 1", For: utils.NewPromDuration(time.Minute)},
Rule{Alert: "alert", Expr: "up == 1"},
true,
},

View File

@@ -0,0 +1,15 @@
groups:
- name: alertmanager.rules
rules:
- alert: AlertmanagerConfigInconsistent
annotations:
message: |
The configuration of the instances of the Alertmanager cluster `{{ $labels.namespace }}/{{ $labels.service }}` are out of sync.
{{ range printf "alertmanager_config_hash{namespace=\"%s\",service=\"%s\"}" $labels.namespace $labels.service | query }}
Configuration hash for pod {{ .Labels.pod }} is "{{ printf "%.f" .Value }}"
{{ end }}
expr: |
count by(namespace,service) (count_values by(namespace,service) ("config_hash", alertmanager_config_hash{job="alertmanager-main",namespace="openshift-monitoring"})) != 1
for: 5m
labels:
severity: critical

View File

@@ -0,0 +1,39 @@
groups:
- name: ReplayGroup
interval: 1m
concurrency: 1
rules:
- record: type:vm_cache_entries:rate5m
expr: sum(rate(vm_cache_entries[5m])) by (type)
labels:
recording: true
- record: go_cgo_calls_count:rate5m
expr: rate(go_cgo_calls_count{job="vmdb"}[5m])
labels:
recording: true
- name: vmsingleReplay
interval: 30s
concurrency: 2
rules:
- alert: RequestErrorsToAPI
expr: increase(vm_http_request_errors_total[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance }}"
summary: "Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance }})"
description: "Requests to path {{ $labels.path }} are receiving errors.
Please verify if clients are sending correct requests."
- alert: TooManyLogs
expr: sum(increase(vm_log_messages_total{level!="info"}[5m])) by (job, instance) > 0
for: 15m
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=67&var-instance={{ $labels.instance }}"
summary: "Too many logs printed for job \"{{ $labels.job }}\" ({{ $labels.instance }})"
description: "Logging rate for job \"{{ $labels.job }}\" ({{ $labels.instance }}) is {{ $value }} for last 15m.\n
Worth to check logs for specific error messages."

View File

@@ -2,6 +2,8 @@ groups:
- name: TestGroup
interval: 2s
concurrency: 2
extra_filter_labels:
job: victoriametrics
rules:
- alert: Conns
expr: sum(vm_tcplistener_conns) by(instance) > 1

View File

@@ -2,26 +2,33 @@ package datasource
import (
"context"
"time"
)
// Querier interface wraps Query and QueryRange methods
type Querier interface {
Query(ctx context.Context, query string) ([]Metric, error)
QueryRange(ctx context.Context, query string, from, to time.Time) ([]Metric, error)
}
// QuerierBuilder builds Querier with given params.
type QuerierBuilder interface {
BuildWithParams(params QuerierParams) Querier
}
// Querier interface wraps Query method which
// executes given query and returns list of Metrics
// as result
type Querier interface {
Query(ctx context.Context, query string) ([]Metric, error)
// QuerierParams params for Querier.
type QuerierParams struct {
DataSourceType *Type
EvaluationInterval time.Duration
// see https://docs.victoriametrics.com/#prometheus-querying-api-enhancements
ExtraLabels map[string]string
}
// Metric is the basic entity which should be return by datasource
// It represents single data point with full list of labels
type Metric struct {
Labels []Label
Timestamp int64
Value float64
Labels []Label
Timestamps []int64
Values []float64
}
// SetLabel adds or updates existing one label

View File

@@ -0,0 +1,18 @@
package datasource
import "testing"
func TestMetric_Label(t *testing.T) {
m := &Metric{}
m.AddLabel("foo", "bar")
checkEqualString(t, "bar", m.Label("foo"))
m.SetLabel("foo", "baz")
checkEqualString(t, "baz", m.Label("foo"))
m.SetLabel("qux", "quux")
checkEqualString(t, "quux", m.Label("qux"))
checkEqualString(t, "", m.Label("non-existing"))
}

View File

@@ -4,6 +4,7 @@ import (
"flag"
"fmt"
"net/http"
"strings"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
@@ -26,6 +27,8 @@ var (
"For example, if datasource.queryStep=15s then param \"step\" with value \"15s\" will be added to every query."+
"If queryStep isn't specified, rule's evaluationInterval will be used instead.")
maxIdleConnections = flag.Int("datasource.maxIdleConnections", 100, `Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state.`)
roundDigits = flag.Int("datasource.roundDigits", 0, `Adds "round_digits" GET param to datasource requests. `+
`In VM "round_digits" limits the number of digits after the decimal point in response values.`)
)
// Init creates a Querier from provided flag values.
@@ -33,11 +36,27 @@ func Init() (QuerierBuilder, error) {
if *addr == "" {
return nil, fmt.Errorf("datasource.url is empty")
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}
tr.MaxIdleConns = *maxIdleConnections
c := &http.Client{Transport: tr}
return NewVMStorage(*addr, *basicAuthUsername, *basicAuthPassword, *lookBack, *queryStep, *appendTypePrefix, c), nil
tr.MaxIdleConnsPerHost = *maxIdleConnections
var rd string
if *roundDigits > 0 {
rd = fmt.Sprintf("%d", *roundDigits)
}
return &VMStorage{
c: &http.Client{Transport: tr},
basicAuthUser: *basicAuthUsername,
basicAuthPass: *basicAuthPassword,
datasourceURL: strings.TrimSuffix(*addr, "/"),
appendTypePrefix: *appendTypePrefix,
lookBack: *lookBack,
queryStep: *queryStep,
roundDigits: rd,
dataSourceType: NewPrometheusType(),
}, nil
}

View File

@@ -2,76 +2,13 @@ package datasource
import (
"context"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"strconv"
"strings"
"time"
)
type response struct {
Status string `json:"status"`
Data struct {
ResultType string `json:"resultType"`
Result []struct {
Labels map[string]string `json:"metric"`
TV [2]interface{} `json:"value"`
} `json:"result"`
} `json:"data"`
ErrorType string `json:"errorType"`
Error string `json:"error"`
}
func (r response) metrics() ([]Metric, error) {
var ms []Metric
var m Metric
var f float64
var err error
for i, res := range r.Data.Result {
f, err = strconv.ParseFloat(res.TV[1].(string), 64)
if err != nil {
return nil, fmt.Errorf("metric %v, unable to parse float64 from %s: %w", res, res.TV[1], err)
}
m.Labels = nil
for k, v := range r.Data.Result[i].Labels {
m.AddLabel(k, v)
}
m.Timestamp = int64(res.TV[0].(float64))
m.Value = f
ms = append(ms, m)
}
return ms, nil
}
type graphiteResponse []graphiteResponseTarget
type graphiteResponseTarget struct {
Target string `json:"target"`
Tags map[string]string `json:"tags"`
DataPoints [][2]float64 `json:"datapoints"`
}
func (r graphiteResponse) metrics() []Metric {
var ms []Metric
for _, res := range r {
if len(res.DataPoints) < 1 {
continue
}
var m Metric
// add only last value to the result.
last := res.DataPoints[len(res.DataPoints)-1]
m.Value = last[0]
m.Timestamp = int64(last[1])
for k, v := range res.Tags {
m.AddLabel(k, v)
}
ms = append(ms, m)
}
return ms
}
// VMStorage represents vmstorage entity with ability to read and write metrics
type VMStorage struct {
c *http.Client
@@ -81,21 +18,11 @@ type VMStorage struct {
appendTypePrefix bool
lookBack time.Duration
queryStep time.Duration
roundDigits string
dataSourceType Type
evaluationInterval time.Duration
}
const queryPath = "/api/v1/query"
const graphitePath = "/render"
const prometheusPrefix = "/prometheus"
const graphitePrefix = "/graphite"
// QuerierParams params for Querier.
type QuerierParams struct {
DataSourceType *Type
EvaluationInterval time.Duration
extraLabels []string
}
// Clone makes clone of VMStorage, shares http client.
@@ -118,6 +45,9 @@ func (s *VMStorage) ApplyParams(params QuerierParams) *VMStorage {
s.dataSourceType = *params.DataSourceType
}
s.evaluationInterval = params.EvaluationInterval
for k, v := range params.ExtraLabels {
s.extraLabels = append(s.extraLabels, fmt.Sprintf("%s=%s", k, v))
}
return s
}
@@ -142,11 +72,21 @@ func NewVMStorage(baseURL, basicAuthUser, basicAuthPass string, lookBack time.Du
// Query executes the given query and returns parsed response
func (s *VMStorage) Query(ctx context.Context, query string) ([]Metric, error) {
req, err := s.prepareReq(query, time.Now())
req, err := s.newRequestPOST()
if err != nil {
return nil, err
}
ts := time.Now()
switch s.dataSourceType.name {
case "", prometheusType:
s.setPrometheusInstantReqParams(req, query, ts)
case graphiteType:
s.setGraphiteReqParams(req, query, ts)
default:
return nil, fmt.Errorf("engine not found: %q", s.dataSourceType.name)
}
resp, err := s.do(ctx, req)
if err != nil {
return nil, err
@@ -162,25 +102,32 @@ func (s *VMStorage) Query(ctx context.Context, query string) ([]Metric, error) {
return parseFn(req, resp)
}
func (s *VMStorage) prepareReq(query string, timestamp time.Time) (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
// QueryRange executes the given query on the given time range.
// For Prometheus type see https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries
// Graphite type isn't supported.
func (s *VMStorage) QueryRange(ctx context.Context, query string, start, end time.Time) ([]Metric, error) {
if s.dataSourceType.name != prometheusType {
return nil, fmt.Errorf("%q is not supported for QueryRange", s.dataSourceType.name)
}
req, err := s.newRequestPOST()
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json; charset=utf-8")
if s.basicAuthPass != "" {
req.SetBasicAuth(s.basicAuthUser, s.basicAuthPass)
if start.IsZero() {
return nil, fmt.Errorf("start param is missing")
}
switch s.dataSourceType.name {
case "", prometheusType:
s.setPrometheusReqParams(req, query, timestamp)
case graphiteType:
s.setGraphiteReqParams(req, query, timestamp)
default:
return nil, fmt.Errorf("engine not found: %q", s.dataSourceType.name)
if end.IsZero() {
return nil, fmt.Errorf("end param is missing")
}
return req, nil
s.setPrometheusRangeReqParams(req, query, start, end)
resp, err := s.do(ctx, req)
if err != nil {
return nil, err
}
defer func() {
_ = resp.Body.Close()
}()
return parsePrometheusResponse(req, resp)
}
func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response, error) {
@@ -196,74 +143,14 @@ func (s *VMStorage) do(ctx context.Context, req *http.Request) (*http.Response,
return resp, nil
}
func (s *VMStorage) setPrometheusReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
func (s *VMStorage) newRequestPOST() (*http.Request, error) {
req, err := http.NewRequest("POST", s.datasourceURL, nil)
if err != nil {
return nil, err
}
r.URL.Path += queryPath
q := r.URL.Query()
q.Set("query", query)
if s.lookBack > 0 {
timestamp = timestamp.Add(-s.lookBack)
req.Header.Set("Content-Type", "application/json; charset=utf-8")
if s.basicAuthPass != "" {
req.SetBasicAuth(s.basicAuthUser, s.basicAuthPass)
}
if s.evaluationInterval > 0 {
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
timestamp = timestamp.Truncate(s.evaluationInterval)
// set step as evaluationInterval by default
q.Set("step", s.evaluationInterval.String())
}
q.Set("time", fmt.Sprintf("%d", timestamp.Unix()))
if s.queryStep > 0 {
// override step with user-specified value
q.Set("step", s.queryStep.String())
}
r.URL.RawQuery = q.Encode()
}
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += graphitePrefix
}
r.URL.Path += graphitePath
q := r.URL.Query()
q.Set("format", "json")
q.Set("target", query)
from := "-5min"
if s.lookBack > 0 {
lookBack := timestamp.Add(-s.lookBack)
from = strconv.FormatInt(lookBack.Unix(), 10)
}
q.Set("from", from)
q.Set("until", "now")
r.URL.RawQuery = q.Encode()
}
const (
statusSuccess, statusError, rtVector = "success", "error", "vector"
)
func parsePrometheusResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &response{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL, err)
}
if r.Status == statusError {
return nil, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL, r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return nil, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
}
if r.Data.ResultType != rtVector {
return nil, fmt.Errorf("unknown result type:%s. Expected vector", r.Data.ResultType)
}
return r.metrics()
}
func parseGraphiteResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &graphiteResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL, err)
}
return r.metrics(), nil
return req, nil
}

View File

@@ -0,0 +1,67 @@
package datasource
import (
"encoding/json"
"fmt"
"net/http"
"strconv"
"time"
)
type graphiteResponse []graphiteResponseTarget
type graphiteResponseTarget struct {
Target string `json:"target"`
Tags map[string]string `json:"tags"`
DataPoints [][2]float64 `json:"datapoints"`
}
func (r graphiteResponse) metrics() []Metric {
var ms []Metric
for _, res := range r {
if len(res.DataPoints) < 1 {
continue
}
var m Metric
// add only last value to the result.
last := res.DataPoints[len(res.DataPoints)-1]
m.Values = append(m.Values, last[0])
m.Timestamps = append(m.Timestamps, int64(last[1]))
for k, v := range res.Tags {
m.AddLabel(k, v)
}
ms = append(ms, m)
}
return ms
}
func parseGraphiteResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &graphiteResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing graphite metrics for %s: %w", req.URL, err)
}
return r.metrics(), nil
}
const (
graphitePath = "/render"
graphitePrefix = "/graphite"
)
func (s *VMStorage) setGraphiteReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += graphitePrefix
}
r.URL.Path += graphitePath
q := r.URL.Query()
q.Set("format", "json")
q.Set("target", query)
from := "-5min"
if s.lookBack > 0 {
lookBack := timestamp.Add(-s.lookBack)
from = strconv.FormatInt(lookBack.Unix(), 10)
}
q.Set("from", from)
q.Set("until", "now")
r.URL.RawQuery = q.Encode()
}

View File

@@ -0,0 +1,165 @@
package datasource
import (
"encoding/json"
"fmt"
"net/http"
"strconv"
"time"
)
type promResponse struct {
Status string `json:"status"`
ErrorType string `json:"errorType"`
Error string `json:"error"`
Data struct {
ResultType string `json:"resultType"`
Result json.RawMessage `json:"result"`
} `json:"data"`
}
type promInstant struct {
Result []struct {
Labels map[string]string `json:"metric"`
TV [2]interface{} `json:"value"`
} `json:"result"`
}
type promRange struct {
Result []struct {
Labels map[string]string `json:"metric"`
TVs [][2]interface{} `json:"values"`
} `json:"result"`
}
func (r promInstant) metrics() ([]Metric, error) {
var result []Metric
for i, res := range r.Result {
f, err := strconv.ParseFloat(res.TV[1].(string), 64)
if err != nil {
return nil, fmt.Errorf("metric %v, unable to parse float64 from %s: %w", res, res.TV[1], err)
}
var m Metric
for k, v := range r.Result[i].Labels {
m.AddLabel(k, v)
}
m.Timestamps = append(m.Timestamps, int64(res.TV[0].(float64)))
m.Values = append(m.Values, f)
result = append(result, m)
}
return result, nil
}
func (r promRange) metrics() ([]Metric, error) {
var result []Metric
for i, res := range r.Result {
var m Metric
for _, tv := range res.TVs {
f, err := strconv.ParseFloat(tv[1].(string), 64)
if err != nil {
return nil, fmt.Errorf("metric %v, unable to parse float64 from %s: %w", res, tv[1], err)
}
m.Values = append(m.Values, f)
m.Timestamps = append(m.Timestamps, int64(tv[0].(float64)))
}
if len(m.Values) < 1 || len(m.Timestamps) < 1 {
return nil, fmt.Errorf("metric %v contains no values", res)
}
m.Labels = nil
for k, v := range r.Result[i].Labels {
m.AddLabel(k, v)
}
result = append(result, m)
}
return result, nil
}
const (
statusSuccess, statusError = "success", "error"
rtVector, rtMatrix = "vector", "matrix"
)
func parsePrometheusResponse(req *http.Request, resp *http.Response) ([]Metric, error) {
r := &promResponse{}
if err := json.NewDecoder(resp.Body).Decode(r); err != nil {
return nil, fmt.Errorf("error parsing prometheus metrics for %s: %w", req.URL, err)
}
if r.Status == statusError {
return nil, fmt.Errorf("response error, query: %s, errorType: %s, error: %s", req.URL, r.ErrorType, r.Error)
}
if r.Status != statusSuccess {
return nil, fmt.Errorf("unknown status: %s, Expected success or error ", r.Status)
}
switch r.Data.ResultType {
case rtVector:
var pi promInstant
if err := json.Unmarshal(r.Data.Result, &pi.Result); err != nil {
return nil, fmt.Errorf("umarshal err %s; \n %#v", err, string(r.Data.Result))
}
return pi.metrics()
case rtMatrix:
var pr promRange
if err := json.Unmarshal(r.Data.Result, &pr.Result); err != nil {
return nil, err
}
return pr.metrics()
default:
return nil, fmt.Errorf("unknown result type %q", r.Data.ResultType)
}
}
const (
prometheusInstantPath = "/api/v1/query"
prometheusRangePath = "/api/v1/query_range"
prometheusPrefix = "/prometheus"
)
func (s *VMStorage) setPrometheusInstantReqParams(r *http.Request, query string, timestamp time.Time) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
}
r.URL.Path += prometheusInstantPath
q := r.URL.Query()
if s.lookBack > 0 {
timestamp = timestamp.Add(-s.lookBack)
}
if s.evaluationInterval > 0 {
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232
timestamp = timestamp.Truncate(s.evaluationInterval)
}
q.Set("time", fmt.Sprintf("%d", timestamp.Unix()))
r.URL.RawQuery = q.Encode()
s.setPrometheusReqParams(r, query)
}
func (s *VMStorage) setPrometheusRangeReqParams(r *http.Request, query string, start, end time.Time) {
if s.appendTypePrefix {
r.URL.Path += prometheusPrefix
}
r.URL.Path += prometheusRangePath
q := r.URL.Query()
q.Add("start", fmt.Sprintf("%d", start.Unix()))
q.Add("end", fmt.Sprintf("%d", end.Unix()))
r.URL.RawQuery = q.Encode()
s.setPrometheusReqParams(r, query)
}
func (s *VMStorage) setPrometheusReqParams(r *http.Request, query string) {
q := r.URL.Query()
q.Set("query", query)
if s.evaluationInterval > 0 {
// set step as evaluationInterval by default
q.Set("step", s.evaluationInterval.String())
}
if s.queryStep > 0 {
// override step with user-specified value
q.Set("step", s.queryStep.String())
}
if s.roundDigits != "" {
q.Set("round_digits", s.roundDigits)
}
for _, l := range s.extraLabels {
q.Add("extra_label", l)
}
r.URL.RawQuery = q.Encode()
}

View File

@@ -7,6 +7,7 @@ import (
"net/http/httptest"
"reflect"
"strconv"
"strings"
"testing"
"time"
)
@@ -19,7 +20,7 @@ var (
queryRender = "constantLine(10)"
)
func TestVMSelectQuery(t *testing.T) {
func TestVMInstantQuery(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Errorf("should not be called")
@@ -65,7 +66,7 @@ func TestVMSelectQuery(t *testing.T) {
case 5:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix"}}`))
case 6:
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows"},"value":[1583786142,"13763"]}]}}`))
w.Write([]byte(`{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"vm_rows"},"value":[1583786142,"13763"]},{"metric":{"__name__":"vm_requests"},"value":[1583786140,"2000"]}]}}`))
}
})
@@ -99,16 +100,23 @@ func TestVMSelectQuery(t *testing.T) {
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
if len(m) != 2 {
t.Fatalf("expected 2 metrics got %d in %+v", len(m), m)
}
expected := Metric{
Labels: []Label{{Value: "vm_rows", Name: "__name__"}},
Timestamp: 1583786142,
Value: 13763,
expected := []Metric{
{
Labels: []Label{{Value: "vm_rows", Name: "__name__"}},
Timestamps: []int64{1583786142},
Values: []float64{13763},
},
{
Labels: []Label{{Value: "vm_requests", Name: "__name__"}},
Timestamps: []int64{1583786140},
Values: []float64{2000},
},
}
if !reflect.DeepEqual(m[0], expected) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
if !reflect.DeepEqual(m, expected) {
t.Fatalf("unexpected metric %+v want %+v", m, expected)
}
g := NewGraphiteType()
@@ -121,45 +129,146 @@ func TestVMSelectQuery(t *testing.T) {
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}
expected = Metric{
Labels: []Label{{Value: "constantLine(10)", Name: "name"}},
Timestamp: 1611758403,
Value: 10,
exp := Metric{
Labels: []Label{{Value: "constantLine(10)", Name: "name"}},
Timestamps: []int64{1611758403},
Values: []float64{10},
}
if !reflect.DeepEqual(m[0], expected) {
if !reflect.DeepEqual(m[0], exp) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
}
func TestPrepareReq(t *testing.T) {
func TestVMRangeQuery(t *testing.T) {
mux := http.NewServeMux()
mux.HandleFunc("/", func(_ http.ResponseWriter, _ *http.Request) {
t.Errorf("should not be called")
})
c := -1
mux.HandleFunc("/api/v1/query_range", func(w http.ResponseWriter, r *http.Request) {
c++
if r.Method != http.MethodPost {
t.Errorf("expected POST method got %s", r.Method)
}
if name, pass, _ := r.BasicAuth(); name != basicAuthName || pass != basicAuthPass {
t.Errorf("expected %s:%s as basic auth got %s:%s", basicAuthName, basicAuthPass, name, pass)
}
if r.URL.Query().Get("query") != query {
t.Errorf("expected %s in query param, got %s", query, r.URL.Query().Get("query"))
}
startTS := r.URL.Query().Get("start")
if startTS == "" {
t.Errorf("expected 'start' in query param, got nil instead")
}
if _, err := strconv.ParseInt(startTS, 10, 64); err != nil {
t.Errorf("failed to parse 'start' query param: %s", err)
}
endTS := r.URL.Query().Get("end")
if endTS == "" {
t.Errorf("expected 'end' in query param, got nil instead")
}
if _, err := strconv.ParseInt(endTS, 10, 64); err != nil {
t.Errorf("failed to parse 'end' query param: %s", err)
}
switch c {
case 0:
w.Write([]byte(`{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"vm_rows"},"values":[[1583786142,"13763"]]}]}}`))
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
s := NewVMStorage(srv.URL, basicAuthName, basicAuthPass, time.Minute, 0, false, srv.Client())
p := NewPrometheusType()
pq := s.BuildWithParams(QuerierParams{DataSourceType: &p, EvaluationInterval: 15 * time.Second})
_, err := pq.QueryRange(ctx, query, time.Now(), time.Time{})
expectError(t, err, "is missing")
_, err = pq.QueryRange(ctx, query, time.Time{}, time.Now())
expectError(t, err, "is missing")
start, end := time.Now().Add(-time.Minute), time.Now()
m, err := pq.QueryRange(ctx, query, start, end)
if err != nil {
t.Fatalf("unexpected %s", err)
}
if len(m) != 1 {
t.Fatalf("expected 1 metric got %d in %+v", len(m), m)
}
expected := Metric{
Labels: []Label{{Value: "vm_rows", Name: "__name__"}},
Timestamps: []int64{1583786142},
Values: []float64{13763},
}
if !reflect.DeepEqual(m[0], expected) {
t.Fatalf("unexpected metric %+v want %+v", m[0], expected)
}
g := NewGraphiteType()
gq := s.BuildWithParams(QuerierParams{DataSourceType: &g})
_, err = gq.QueryRange(ctx, queryRender, start, end)
expectError(t, err, "is not supported")
}
func TestRequestParams(t *testing.T) {
query := "up"
timestamp := time.Date(2001, 2, 3, 4, 5, 6, 0, time.UTC)
testCases := []struct {
name string
vm *VMStorage
checkFn func(t *testing.T, r *http.Request)
name string
queryRange bool
vm *VMStorage
checkFn func(t *testing.T, r *http.Request)
}{
{
"prometheus path",
false,
&VMStorage{
dataSourceType: NewPrometheusType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, queryPath, r.URL.Path)
checkEqualString(t, prometheusInstantPath, r.URL.Path)
},
},
{
"prometheus prefix",
false,
&VMStorage{
dataSourceType: NewPrometheusType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusPrefix+queryPath, r.URL.Path)
checkEqualString(t, prometheusPrefix+prometheusInstantPath, r.URL.Path)
},
},
{
"prometheus range path",
true,
&VMStorage{
dataSourceType: NewPrometheusType(),
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusRangePath, r.URL.Path)
},
},
{
"prometheus range prefix",
true,
&VMStorage{
dataSourceType: NewPrometheusType(),
appendTypePrefix: true,
},
func(t *testing.T, r *http.Request) {
checkEqualString(t, prometheusPrefix+prometheusRangePath, r.URL.Path)
},
},
{
"graphite path",
false,
&VMStorage{
dataSourceType: NewGraphiteType(),
},
@@ -169,6 +278,7 @@ func TestPrepareReq(t *testing.T) {
},
{
"graphite prefix",
false,
&VMStorage{
dataSourceType: NewGraphiteType(),
appendTypePrefix: true,
@@ -179,14 +289,38 @@ func TestPrepareReq(t *testing.T) {
},
{
"default params",
false,
&VMStorage{},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"default range params",
true,
&VMStorage{},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("end=%d&query=%s&start=%d", timestamp.Unix(), query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"basic auth",
false,
&VMStorage{
basicAuthUser: "foo",
basicAuthPass: "bar",
},
func(t *testing.T, r *http.Request) {
u, p, _ := r.BasicAuth()
checkEqualString(t, "foo", u)
checkEqualString(t, "bar", p)
},
},
{
"basic auth range",
true,
&VMStorage{
basicAuthUser: "foo",
basicAuthPass: "bar",
@@ -199,6 +333,7 @@ func TestPrepareReq(t *testing.T) {
},
{
"lookback",
false,
&VMStorage{
lookBack: time.Minute,
},
@@ -209,6 +344,7 @@ func TestPrepareReq(t *testing.T) {
},
{
"evaluation interval",
false,
&VMStorage{
evaluationInterval: 15 * time.Second,
},
@@ -221,6 +357,7 @@ func TestPrepareReq(t *testing.T) {
},
{
"lookback + evaluation interval",
false,
&VMStorage{
lookBack: time.Minute,
evaluationInterval: 15 * time.Second,
@@ -235,6 +372,7 @@ func TestPrepareReq(t *testing.T) {
},
{
"step override",
false,
&VMStorage{
queryStep: time.Minute,
},
@@ -243,14 +381,64 @@ func TestPrepareReq(t *testing.T) {
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"round digits",
false,
&VMStorage{
roundDigits: "10",
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("query=%s&round_digits=10&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"extra labels",
false,
&VMStorage{
extraLabels: []string{
"env=prod",
"query=es=cape",
},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("extra_label=env%%3Dprod&extra_label=query%%3Des%%3Dcape&query=%s&time=%d", query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
{
"extra labels range",
true,
&VMStorage{
extraLabels: []string{
"env=prod",
"query=es=cape",
},
},
func(t *testing.T, r *http.Request) {
exp := fmt.Sprintf("end=%d&extra_label=env%%3Dprod&extra_label=query%%3Des%%3Dcape&query=%s&start=%d",
timestamp.Unix(), query, timestamp.Unix())
checkEqualString(t, exp, r.URL.RawQuery)
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
req, err := tc.vm.prepareReq(query, timestamp)
req, err := tc.vm.newRequestPOST()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
switch tc.vm.dataSourceType.name {
case "", prometheusType:
if tc.queryRange {
tc.vm.setPrometheusRangeReqParams(req, query, timestamp, timestamp)
} else {
tc.vm.setPrometheusInstantReqParams(req, query, timestamp)
}
case graphiteType:
tc.vm.setGraphiteReqParams(req, query, timestamp)
}
tc.checkFn(t, req)
})
}
@@ -262,3 +450,13 @@ func checkEqualString(t *testing.T, exp, got string) {
t.Errorf("expected to get %q; got %q", exp, got)
}
}
func expectError(t *testing.T, err error, exp string) {
t.Helper()
if err == nil {
t.Errorf("expected non-nil error")
}
if !strings.Contains(err.Error(), exp) {
t.Errorf("expected error %q to contain %q", err, exp)
}
}

View File

@@ -18,14 +18,15 @@ import (
// Group is an entity for grouping rules
type Group struct {
mu sync.RWMutex
Name string
File string
Rules []Rule
Type datasource.Type
Interval time.Duration
Concurrency int
Checksum string
mu sync.RWMutex
Name string
File string
Rules []Rule
Type datasource.Type
Interval time.Duration
Concurrency int
Checksum string
ExtraFilterLabels map[string]string
doneCh chan struct{}
finishedCh chan struct{}
@@ -51,15 +52,17 @@ func newGroupMetrics(name, file string) *groupMetrics {
func newGroup(cfg config.Group, qb datasource.QuerierBuilder, defaultInterval time.Duration, labels map[string]string) *Group {
g := &Group{
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval,
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
updateCh: make(chan *Group),
Type: cfg.Type,
Name: cfg.Name,
File: cfg.File,
Interval: cfg.Interval.Duration(),
Concurrency: cfg.Concurrency,
Checksum: cfg.Checksum,
ExtraFilterLabels: cfg.ExtraFilterLabels,
doneCh: make(chan struct{}),
finishedCh: make(chan struct{}),
updateCh: make(chan *Group),
}
g.metrics = newGroupMetrics(g.Name, g.File)
if g.Interval == 0 {
@@ -115,6 +118,8 @@ func (g *Group) Restore(ctx context.Context, qb datasource.QuerierBuilder, lookb
if rr.For < 1 {
continue
}
// ignore g.ExtraFilterLabels on purpose, so it
// won't affect the restore procedure.
q := qb.BuildWithParams(datasource.QuerierParams{})
if err := rr.Restore(ctx, q, lookback, labels); err != nil {
return fmt.Errorf("error while restoring rule %q: %w", rule, err)
@@ -163,6 +168,7 @@ func (g *Group) updateWith(newGroup *Group) error {
}
g.Type = newGroup.Type
g.Concurrency = newGroup.Concurrency
g.ExtraFilterLabels = newGroup.ExtraFilterLabels
g.Checksum = newGroup.Checksum
g.Rules = newRules
return nil
@@ -263,15 +269,10 @@ type executor struct {
func (e *executor) execConcurrently(ctx context.Context, rules []Rule, concurrency int, interval time.Duration) chan error {
res := make(chan error, len(rules))
var returnSeries bool
if e.rw != nil {
returnSeries = true
}
if concurrency == 1 {
// fast path
for _, rule := range rules {
res <- e.exec(ctx, rule, returnSeries, interval)
res <- e.exec(ctx, rule, interval)
}
close(res)
return res
@@ -284,7 +285,7 @@ func (e *executor) execConcurrently(ctx context.Context, rules []Rule, concurren
sem <- struct{}{}
wg.Add(1)
go func(r Rule) {
res <- e.exec(ctx, r, returnSeries, interval)
res <- e.exec(ctx, r, interval)
<-sem
wg.Done()
}(rule)
@@ -303,14 +304,14 @@ var (
remoteWriteErrors = metrics.NewCounter(`vmalert_remotewrite_errors_total`)
)
func (e *executor) exec(ctx context.Context, rule Rule, returnSeries bool, interval time.Duration) error {
func (e *executor) exec(ctx context.Context, rule Rule, interval time.Duration) error {
execTotal.Inc()
execStart := time.Now()
defer func() {
execDuration.UpdateDuration(execStart)
}()
tss, err := rule.Exec(ctx, returnSeries)
tss, err := rule.Exec(ctx)
if err != nil {
execErrors.Inc()
return fmt.Errorf("rule %q: failed to execute: %w", rule, err)

View File

@@ -6,10 +6,10 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
func init() {
@@ -34,7 +34,7 @@ func TestUpdateWith(t *testing.T) {
[]config.Rule{{
Alert: "foo",
Expr: "up > 0",
For: config.NewPromDuration(time.Second),
For: utils.NewPromDuration(time.Second),
Labels: map[string]string{
"bar": "baz",
},
@@ -46,7 +46,7 @@ func TestUpdateWith(t *testing.T) {
[]config.Rule{{
Alert: "foo",
Expr: "up > 10",
For: config.NewPromDuration(time.Second),
For: utils.NewPromDuration(time.Second),
Labels: map[string]string{
"baz": "bar",
},

View File

@@ -7,6 +7,7 @@ import (
"sort"
"sync"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
@@ -42,6 +43,10 @@ func (fq *fakeQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Qu
return fq
}
func (fq *fakeQuerier) QueryRange(ctx context.Context, q string, _, _ time.Time) ([]datasource.Metric, error) {
return fq.Query(ctx, q)
}
func (fq *fakeQuerier) Query(_ context.Context, _ string) ([]datasource.Metric, error) {
fq.Lock()
defer fq.Unlock()
@@ -72,9 +77,16 @@ func (fn *fakeNotifier) getAlerts() []notifier.Alert {
}
func metricWithValueAndLabels(t *testing.T, value float64, labels ...string) datasource.Metric {
return metricWithValuesAndLabels(t, []float64{value}, labels...)
}
func metricWithValuesAndLabels(t *testing.T, values []float64, labels ...string) datasource.Metric {
t.Helper()
m := metricWithLabels(t, labels...)
m.Value = value
m.Values = values
for i := range values {
m.Timestamps = append(m.Timestamps, int64(i))
}
return m
}
@@ -83,7 +95,7 @@ func metricWithLabels(t *testing.T, labels ...string) datasource.Metric {
if len(labels) == 0 || len(labels)%2 != 0 {
t.Fatalf("expected to get even number of labels")
}
m := datasource.Metric{}
m := datasource.Metric{Values: []float64{1}, Timestamps: []int64{1}}
for i := 0; i < len(labels); i += 2 {
m.Labels = append(m.Labels, datasource.Label{
Name: labels[i],

View File

@@ -34,6 +34,9 @@ Examples:
absolute path to all .yaml files in root.
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.`)
rulesCheckInterval = flag.Duration("rule.configCheckInterval", 0, "Interval for checking for changes in '-rule' files. "+
"By default the checking is disabled. Send SIGHUP signal in order to force config check for changes")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
@@ -47,6 +50,7 @@ eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{
remoteReadLookBack = flag.Duration("remoteRead.lookback", time.Hour, "Lookback defines how far to look into past for alerts timeseries."+
" For example, if lookback=1h then range from now() to now()-1h will be scanned.")
remoteReadIgnoreRestoreErrors = flag.Bool("remoteRead.ignoreRestoreErrors", true, "Whether to ignore errors from remote storage when restoring alerts state on startup.")
dryRun = flag.Bool("dryRun", false, "Whether to check only config files without running vmalert. The rules file are validated. The `-rule` flag must be specified.")
)
@@ -64,42 +68,54 @@ func main() {
notifier.InitTemplateFunc(u)
groups, err := config.Parse(*rulePath, true, true)
if err != nil {
logger.Fatalf(err.Error())
logger.Fatalf("failed to parse %q: %s", *rulePath, err)
}
if len(groups) == 0 {
logger.Fatalf("No rules for validation. Please specify path to file(s) with alerting and/or recording rules using `-rule` flag")
}
return
}
if *replayFrom != "" || *replayTo != "" {
rw, err := remotewrite.Init(context.Background())
if err != nil {
logger.Fatalf("failed to init remoteWrite: %s", err)
}
eu, err := getExternalURL(*externalURL, *httpListenAddr, httpserver.IsTLS())
if err != nil {
logger.Fatalf("failed to init `external.url`: %s", err)
}
notifier.InitTemplateFunc(eu)
groupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
if err != nil {
logger.Fatalf("cannot parse configuration file: %s", err)
}
q, err := datasource.Init()
if err != nil {
logger.Fatalf("failed to init datasource: %s", err)
}
if err := replay(groupsCfg, q, rw); err != nil {
logger.Fatalf("replay failed: %s", err)
}
return
}
ctx, cancel := context.WithCancel(context.Background())
manager, err := newManager(ctx)
if err != nil {
logger.Fatalf("failed to init: %s", err)
}
if err := manager.start(ctx, *rulePath, *validateTemplates, *validateExpressions); err != nil {
logger.Infof("reading rules configuration file from %q", strings.Join(*rulePath, ";"))
groupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
if err != nil {
logger.Fatalf("cannot parse configuration file: %s", err)
}
if err := manager.start(ctx, groupsCfg); err != nil {
logger.Fatalf("failed to start: %s", err)
}
go func() {
// init reload metrics with positive values to improve alerting conditions
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
sigHup := procutil.NewSighupChan()
for {
<-sigHup
configReloads.Inc()
logger.Infof("SIGHUP received. Going to reload rules %q ...", *rulePath)
if err := manager.update(ctx, *rulePath, *validateTemplates, *validateExpressions, false); err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)
logger.Errorf("error while reloading rules: %s", err)
continue
}
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
logger.Infof("Rules reloaded successfully from %q", *rulePath)
}
}()
go configReload(ctx, manager, groupsCfg)
rh := &requestHandler{m: manager}
go httpserver.Serve(*httpListenAddr, rh.handler)
@@ -222,3 +238,62 @@ See the docs at https://docs.victoriametrics.com/vmalert.html .
`
flagutil.Usage(s)
}
func configReload(ctx context.Context, m *manager, groupsCfg []config.Group) {
// Register SIGHUP handler for config re-read just before manager.start call.
// This guarantees that the config will be re-read if the signal arrives during manager.start call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
var configCheckCh <-chan time.Time
if *rulesCheckInterval > 0 {
ticker := time.NewTicker(*rulesCheckInterval)
configCheckCh = ticker.C
defer ticker.Stop()
}
// init reload metrics with positive values to improve alerting conditions
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
for {
select {
case <-ctx.Done():
return
case <-sighupCh:
logger.Infof("SIGHUP received. Going to reload rules %q ...", *rulePath)
configReloads.Inc()
case <-configCheckCh:
}
newGroupsCfg, err := config.Parse(*rulePath, *validateTemplates, *validateExpressions)
if err != nil {
logger.Errorf("cannot parse configuration file: %s", err)
continue
}
if configsEqual(newGroupsCfg, groupsCfg) {
// config didn't change - skip it
continue
}
groupsCfg = newGroupsCfg
if err := m.update(ctx, groupsCfg, false); err != nil {
configReloadErrors.Inc()
configSuccess.Set(0)
logger.Errorf("error while reloading rules: %s", err)
continue
}
configSuccess.Set(1)
configTimestamp.Set(fasttime.UnixTimestamp())
logger.Infof("Rules reloaded successfully from %q", *rulePath)
}
}
func configsEqual(a, b []config.Group) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i].Checksum != b[i].Checksum {
return false
}
}
return true
}

View File

@@ -1,12 +1,16 @@
package main
import (
"context"
"fmt"
"io/ioutil"
"net/url"
"os"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
func TestGetExternalURL(t *testing.T) {
@@ -51,3 +55,95 @@ func TestGetAlertURLGenerator(t *testing.T) {
t.Errorf("unexpected url want %s, got %s", exp, fn(testAlert))
}
}
func TestConfigReload(t *testing.T) {
originalRulePath := *rulePath
defer func() {
*rulePath = originalRulePath
}()
const (
rules1 = `
groups:
- name: group-1
rules:
- alert: ExampleAlertAlwaysFiring
expr: sum by(job) (up == 1)
- record: handler:requests:rate5m
expr: sum(rate(prometheus_http_requests_total[5m])) by (handler)
`
rules2 = `
groups:
- name: group-1
rules:
- alert: ExampleAlertAlwaysFiring
expr: sum by(job) (up == 1)
- name: group-2
rules:
- record: handler:requests:rate5m
expr: sum(rate(prometheus_http_requests_total[5m])) by (handler)
`
)
f, err := ioutil.TempFile("", "")
if err != nil {
t.Fatal(err)
}
writeToFile(t, f.Name(), rules1)
*rulesCheckInterval = 200 * time.Millisecond
*rulePath = []string{f.Name()}
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
m := &manager{
querierBuilder: &fakeQuerier{},
groups: make(map[uint64]*Group),
labels: map[string]string{},
}
go configReload(ctx, m, nil)
lenLocked := func(m *manager) int {
m.groupsMu.RLock()
defer m.groupsMu.RUnlock()
return len(m.groups)
}
time.Sleep(*rulesCheckInterval * 2)
groupsLen := lenLocked(m)
if groupsLen != 1 {
t.Fatalf("expected to have exactly 1 group loaded; got %d", groupsLen)
}
writeToFile(t, f.Name(), rules2)
time.Sleep(*rulesCheckInterval * 2)
groupsLen = lenLocked(m)
if groupsLen != 2 {
fmt.Println(m.groups)
t.Fatalf("expected to have exactly 2 groups loaded; got %d", groupsLen)
}
writeToFile(t, f.Name(), rules1)
procutil.SelfSIGHUP()
time.Sleep(*rulesCheckInterval / 2)
groupsLen = lenLocked(m)
if groupsLen != 1 {
t.Fatalf("expected to have exactly 1 group loaded; got %d", groupsLen)
}
writeToFile(t, f.Name(), `corrupted`)
procutil.SelfSIGHUP()
time.Sleep(*rulesCheckInterval / 2)
groupsLen = lenLocked(m)
if groupsLen != 1 { // should remain unchanged
t.Fatalf("expected to have exactly 1 group loaded; got %d", groupsLen)
}
}
func writeToFile(t *testing.T, file, b string) {
t.Helper()
err := ioutil.WriteFile(file, []byte(b), 0644)
if err != nil {
t.Fatal(err)
}
}

View File

@@ -3,7 +3,6 @@ package main
import (
"context"
"fmt"
"strings"
"sync"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
@@ -50,8 +49,8 @@ func (m *manager) AlertAPI(gID, aID uint64) (*APIAlert, error) {
return nil, fmt.Errorf("can't find alert with id %q in group %q", aID, g.Name)
}
func (m *manager) start(ctx context.Context, path []string, validateTpl, validateExpr bool) error {
return m.update(ctx, path, validateTpl, validateExpr, true)
func (m *manager) start(ctx context.Context, groupsCfg []config.Group) error {
return m.update(ctx, groupsCfg, true)
}
func (m *manager) close() {
@@ -64,10 +63,13 @@ func (m *manager) close() {
m.wg.Wait()
}
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) {
func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) error {
if restore && m.rr != nil {
err := group.Restore(ctx, m.rr, *remoteReadLookBack, m.labels)
if err != nil {
if !*remoteReadIgnoreRestoreErrors {
return fmt.Errorf("failed to restore state for group %q: %w", group.Name, err)
}
logger.Errorf("error while restoring state for group %q: %s", group.Name, err)
}
}
@@ -79,15 +81,10 @@ func (m *manager) startGroup(ctx context.Context, group *Group, restore bool) {
m.wg.Done()
}()
m.groups[id] = group
return nil
}
func (m *manager) update(ctx context.Context, path []string, validateTpl, validateExpr, restore bool) error {
logger.Infof("reading rules configuration file from %q", strings.Join(path, ";"))
groupsCfg, err := config.Parse(path, validateTpl, validateExpr)
if err != nil {
return fmt.Errorf("cannot parse configuration file: %w", err)
}
func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore bool) error {
groupsRegistry := make(map[uint64]*Group)
for _, cfg := range groupsCfg {
ng := newGroup(cfg, m.querierBuilder, *evaluationInterval, m.labels)
@@ -117,7 +114,9 @@ func (m *manager) update(ctx context.Context, path []string, validateTpl, valida
}
}
for _, ng := range groupsRegistry {
m.startGroup(ctx, ng, restore)
if err := m.startGroup(ctx, ng, restore); err != nil {
return err
}
}
m.groupsMu.Unlock()
@@ -141,12 +140,14 @@ func (g *Group) toAPI() APIGroup {
ag := APIGroup{
// encode as string to avoid rounding
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.String(),
Concurrency: g.Concurrency,
ID: fmt.Sprintf("%d", g.ID()),
Name: g.Name,
Type: g.Type.String(),
File: g.File,
Interval: g.Interval.String(),
Concurrency: g.Concurrency,
ExtraFilterLabels: g.ExtraFilterLabels,
}
for _, r := range g.Rules {
switch v := r.(type) {

View File

@@ -9,8 +9,8 @@ import (
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
)
@@ -25,9 +25,8 @@ func TestMain(m *testing.M) {
// starting with empty rules folder
func TestManagerEmptyRulesDir(t *testing.T) {
m := &manager{groups: make(map[uint64]*Group)}
path := []string{"foo/bar"}
err := m.update(context.Background(), path, true, true, false)
if err != nil {
cfg := loadCfg(t, []string{"foo/bar"}, true, true)
if err := m.update(context.Background(), cfg, false); err != nil {
t.Fatalf("expected to load succesfully with empty rules dir; got err instead: %v", err)
}
}
@@ -50,8 +49,11 @@ func TestManagerUpdateConcurrent(t *testing.T) {
"config/testdata/rules1-good.rules",
"config/testdata/rules2-good.rules",
}
evalInterval := *evaluationInterval
defer func() { *evaluationInterval = evalInterval }()
*evaluationInterval = time.Millisecond
if err := m.start(context.Background(), []string{paths[0]}, true, true); err != nil {
cfg := loadCfg(t, []string{paths[0]}, true, true)
if err := m.start(context.Background(), cfg); err != nil {
t.Fatalf("failed to start: %s", err)
}
@@ -64,8 +66,11 @@ func TestManagerUpdateConcurrent(t *testing.T) {
defer wg.Done()
for i := 0; i < iterations; i++ {
rnd := rand.Intn(len(paths))
path := []string{paths[rnd]}
_ = m.update(context.Background(), path, true, true, false)
cfg, err := config.Parse([]string{paths[rnd]}, true, true)
if err != nil { // update can fail and this is expected
continue
}
_ = m.update(context.Background(), cfg, false)
}
}()
}
@@ -243,13 +248,16 @@ func TestManagerUpdate(t *testing.T) {
t.Run(tc.name, func(t *testing.T) {
ctx, cancel := context.WithCancel(context.TODO())
m := &manager{groups: make(map[uint64]*Group), querierBuilder: &fakeQuerier{}}
path := []string{tc.initPath}
if err := m.update(ctx, path, true, true, false); err != nil {
cfgInit := loadCfg(t, []string{tc.initPath}, true, true)
if err := m.update(ctx, cfgInit, false); err != nil {
t.Fatalf("failed to complete initial rules update: %s", err)
}
path = []string{tc.updatePath}
_ = m.update(ctx, path, true, true, false)
cfgUpdate, err := config.Parse([]string{tc.updatePath}, true, true)
if err == nil { // update can fail and that's expected
_ = m.update(ctx, cfgUpdate, false)
}
if len(tc.want) != len(m.groups) {
t.Fatalf("\nwant number of groups: %d;\ngot: %d ", len(tc.want), len(m.groups))
}
@@ -267,3 +275,12 @@ func TestManagerUpdate(t *testing.T) {
})
}
}
func loadCfg(t *testing.T, path []string, validateAnnotations, validateExpressions bool) []config.Group {
t.Helper()
cfg, err := config.Parse(path, validateAnnotations, validateExpressions)
if err != nil {
t.Fatal(err)
}
return cfg
}

View File

@@ -83,14 +83,16 @@ func TestAlert_ExecTemplate(t *testing.T) {
{Name: "foo", Value: "bar"},
{Name: "baz", Value: "qux"},
},
Value: 1,
Values: []float64{1},
Timestamps: []int64{1},
},
{
Labels: []datasource.Label{
{Name: "foo", Value: "garply"},
{Name: "baz", Value: "fred"},
},
Value: 2,
Values: []float64{2},
Timestamps: []int64{1},
},
}, nil
}

View File

@@ -28,6 +28,31 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
)
// metric is private copy of datasource.Metric,
// it is used for templating annotations,
// Labels as map simplifies templates evaluation.
type metric struct {
Labels map[string]string
Timestamp int64
Value float64
}
// datasourceMetricsToTemplateMetrics converts Metrics from datasource package to private copy for templating.
func datasourceMetricsToTemplateMetrics(ms []datasource.Metric) []metric {
mss := make([]metric, 0, len(ms))
for _, m := range ms {
labelsMap := make(map[string]string, len(m.Labels))
for _, labelValue := range m.Labels {
labelsMap[labelValue.Name] = labelValue.Value
}
mss = append(mss, metric{
Labels: labelsMap,
Timestamp: m.Timestamps[0],
Value: m.Values[0]})
}
return mss
}
// QueryFn is used to wrap a call to datasource into simple-to-use function
// for templating functions.
type QueryFn func(query string) ([]datasource.Metric, error)
@@ -211,34 +236,34 @@ func InitTemplateFunc(externalURL *url.URL) {
// For example, {{ query "foo" | first | value }} will
// execute "/api/v1/query?query=foo" request and will return
// the first value in response.
"query": func(q string) ([]datasource.Metric, error) {
"query": func(q string) ([]metric, error) {
// query function supposed to be substituted at funcsWithQuery().
// it is present here only for validation purposes, when there is no
// provided datasource.
//
// return non-empty slice to pass validation with chained functions in template
// see issue #989 for details
return []datasource.Metric{{}}, nil
return []metric{{}}, nil
},
// first returns the first by order element from the given metrics list.
// usually used alongside with `query` template function.
"first": func(metrics []datasource.Metric) (datasource.Metric, error) {
"first": func(metrics []metric) (metric, error) {
if len(metrics) > 0 {
return metrics[0], nil
}
return datasource.Metric{}, errors.New("first() called on vector with no elements")
return metric{}, errors.New("first() called on vector with no elements")
},
// label returns the value of the given label name for the given metric.
// usually used alongside with `query` template function.
"label": func(label string, m datasource.Metric) string {
return m.Label(label)
"label": func(label string, m metric) string {
return m.Labels[label]
},
// value returns the value of the given metric.
// usually used alongside with `query` template function.
"value": func(m datasource.Metric) float64 {
"value": func(m metric) float64 {
return m.Value
},
@@ -266,8 +291,12 @@ func funcsWithQuery(query QueryFn) textTpl.FuncMap {
for k, fn := range tmplFunc {
fm[k] = fn
}
fm["query"] = func(q string) ([]datasource.Metric, error) {
return query(q)
fm["query"] = func(q string) ([]metric, error) {
result, err := query(q)
if err != nil {
return nil, err
}
return datasourceMetricsToTemplateMetrics(result), nil
}
return fm
}

View File

@@ -3,8 +3,8 @@ package main
import (
"context"
"fmt"
"hash/fnv"
"sort"
"strings"
"sync"
"time"
@@ -66,6 +66,7 @@ func newRecordingRule(qb datasource.QuerierBuilder, group *Group, cfg config.Rul
q: qb.BuildWithParams(datasource.QuerierParams{
DataSourceType: &cfg.Type,
EvaluationInterval: group.Interval,
ExtraLabels: group.ExtraFilterLabels,
}),
}
@@ -87,12 +88,30 @@ func (rr *RecordingRule) Close() {
metrics.UnregisterMetric(rr.metrics.errors.name)
}
// Exec executes RecordingRule expression via the given Querier.
func (rr *RecordingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal.TimeSeries, error) {
if !series {
return nil, nil
// ExecRange executes recording rule on the given time range similarly to Exec.
// It doesn't update internal states of the Rule and meant to be used just
// to get time series for backfilling.
func (rr *RecordingRule) ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error) {
series, err := rr.q.QueryRange(ctx, rr.Expr, start, end)
if err != nil {
return nil, err
}
duplicates := make(map[string]struct{}, len(series))
var tss []prompbmarshal.TimeSeries
for _, s := range series {
ts := rr.toTimeSeries(s)
key := stringifyLabels(ts)
if _, ok := duplicates[key]; ok {
return nil, fmt.Errorf("original metric %v; resulting labels %q: %w", s.Labels, key, errDuplicate)
}
duplicates[key] = struct{}{}
tss = append(tss, ts)
}
return tss, nil
}
// Exec executes RecordingRule expression via the given Querier.
func (rr *RecordingRule) Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, error) {
qMetrics, err := rr.q.Query(ctx, rr.Expr)
rr.mu.Lock()
defer rr.mu.Unlock()
@@ -103,36 +122,41 @@ func (rr *RecordingRule) Exec(ctx context.Context, series bool) ([]prompbmarshal
return nil, fmt.Errorf("failed to execute query %q: %w", rr.Expr, err)
}
duplicates := make(map[uint64]prompbmarshal.TimeSeries, len(qMetrics))
duplicates := make(map[string]struct{}, len(qMetrics))
var tss []prompbmarshal.TimeSeries
for _, r := range qMetrics {
ts := rr.toTimeSeries(r, time.Unix(r.Timestamp, 0))
h := hashTimeSeries(ts)
if _, ok := duplicates[h]; ok {
ts := rr.toTimeSeries(r)
key := stringifyLabels(ts)
if _, ok := duplicates[key]; ok {
rr.lastExecError = errDuplicate
return nil, errDuplicate
return nil, fmt.Errorf("original metric %v; resulting labels %q: %w", r, key, errDuplicate)
}
duplicates[h] = ts
duplicates[key] = struct{}{}
tss = append(tss, ts)
}
return tss, nil
}
func hashTimeSeries(ts prompbmarshal.TimeSeries) uint64 {
hash := fnv.New64a()
func stringifyLabels(ts prompbmarshal.TimeSeries) string {
labels := ts.Labels
sort.Slice(labels, func(i, j int) bool {
return labels[i].Name < labels[j].Name
})
for _, l := range labels {
hash.Write([]byte(l.Name))
hash.Write([]byte(l.Value))
hash.Write([]byte("\xff"))
if len(labels) > 1 {
sort.Slice(labels, func(i, j int) bool {
return labels[i].Name < labels[j].Name
})
}
return hash.Sum64()
b := strings.Builder{}
for i, l := range labels {
b.WriteString(l.Name)
b.WriteString("=")
b.WriteString(l.Value)
if i != len(labels)-1 {
b.WriteString(",")
}
}
return b.String()
}
func (rr *RecordingRule) toTimeSeries(m datasource.Metric, timestamp time.Time) prompbmarshal.TimeSeries {
func (rr *RecordingRule) toTimeSeries(m datasource.Metric) prompbmarshal.TimeSeries {
labels := make(map[string]string)
for _, l := range m.Labels {
labels[l.Name] = l.Value
@@ -142,12 +166,10 @@ func (rr *RecordingRule) toTimeSeries(m datasource.Metric, timestamp time.Time)
for k, v := range rr.Labels {
labels[k] = v
}
return newTimeSeries(m.Value, labels, timestamp)
return newTimeSeries(m.Values, m.Timestamps, labels)
}
// UpdateWith copies all significant fields.
// alerts state isn't copied since
// it should be updated in next 2 Execs
func (rr *RecordingRule) UpdateWith(r Rule) error {
nr, ok := r.(*RecordingRule)
if !ok {
@@ -155,6 +177,7 @@ func (rr *RecordingRule) UpdateWith(r Rule) error {
}
rr.Expr = nr.Expr
rr.Labels = nr.Labels
rr.q = nr.q
return nil
}

View File

@@ -11,7 +11,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
func TestRecoridngRule_ToTimeSeries(t *testing.T) {
func TestRecoridngRule_Exec(t *testing.T) {
timestamp := time.Now()
testCases := []struct {
rule *RecordingRule
@@ -24,9 +24,9 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
"__name__", "bar",
)},
[]prompbmarshal.TimeSeries{
newTimeSeries(10, map[string]string{
newTimeSeries([]float64{10}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foo",
}, timestamp),
}),
},
},
{
@@ -37,18 +37,18 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
metricWithValueAndLabels(t, 3, "__name__", "baz", "job", "baz"),
},
[]prompbmarshal.TimeSeries{
newTimeSeries(1, map[string]string{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "foo",
}, timestamp),
newTimeSeries(2, map[string]string{
}),
newTimeSeries([]float64{2}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "bar",
}, timestamp),
newTimeSeries(3, map[string]string{
}),
newTimeSeries([]float64{3}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "baz",
}, timestamp),
}),
},
},
{
@@ -59,16 +59,16 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "foo"),
metricWithValueAndLabels(t, 1, "__name__", "bar", "job", "bar")},
[]prompbmarshal.TimeSeries{
newTimeSeries(2, map[string]string{
newTimeSeries([]float64{2}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "foo",
"source": "test",
}, timestamp),
newTimeSeries(1, map[string]string{
}),
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "bar",
"source": "test",
}, timestamp),
}),
},
},
}
@@ -77,7 +77,7 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tc.rule.q = fq
tss, err := tc.rule.Exec(context.TODO(), true)
tss, err := tc.rule.Exec(context.TODO())
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
}
@@ -88,7 +88,88 @@ func TestRecoridngRule_ToTimeSeries(t *testing.T) {
}
}
func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
func TestRecoridngRule_ExecRange(t *testing.T) {
timestamp := time.Now()
testCases := []struct {
rule *RecordingRule
metrics []datasource.Metric
expTS []prompbmarshal.TimeSeries
}{
{
&RecordingRule{Name: "foo"},
[]datasource.Metric{metricWithValuesAndLabels(t, []float64{10, 20, 30},
"__name__", "bar",
)},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{10, 20, 30},
[]int64{timestamp.UnixNano(), timestamp.UnixNano(), timestamp.UnixNano()},
map[string]string{
"__name__": "foo",
}),
},
},
{
&RecordingRule{Name: "foobarbaz"},
[]datasource.Metric{
metricWithValuesAndLabels(t, []float64{1}, "__name__", "foo", "job", "foo"),
metricWithValuesAndLabels(t, []float64{2, 3}, "__name__", "bar", "job", "bar"),
metricWithValuesAndLabels(t, []float64{4, 5, 6}, "__name__", "baz", "job", "baz"),
},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "foo",
}),
newTimeSeries([]float64{2, 3}, []int64{timestamp.UnixNano(), timestamp.UnixNano()}, map[string]string{
"__name__": "foobarbaz",
"job": "bar",
}),
newTimeSeries([]float64{4, 5, 6},
[]int64{timestamp.UnixNano(), timestamp.UnixNano(), timestamp.UnixNano()},
map[string]string{
"__name__": "foobarbaz",
"job": "baz",
}),
},
},
{
&RecordingRule{Name: "job:foo", Labels: map[string]string{
"source": "test",
}},
[]datasource.Metric{
metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "foo"),
metricWithValueAndLabels(t, 1, "__name__", "bar", "job", "bar")},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{2}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "foo",
"source": "test",
}),
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": "job:foo",
"job": "bar",
"source": "test",
}),
},
},
}
for _, tc := range testCases {
t.Run(tc.rule.Name, func(t *testing.T) {
fq := &fakeQuerier{}
fq.add(tc.metrics...)
tc.rule.q = fq
tss, err := tc.rule.ExecRange(context.TODO(), time.Now(), time.Now())
if err != nil {
t.Fatalf("unexpected Exec err: %s", err)
}
if err := compareTimeSeries(t, tc.expTS, tss); err != nil {
t.Fatalf("timeseries missmatch: %s", err)
}
})
}
}
func TestRecoridngRule_ExecNegative(t *testing.T) {
rr := &RecordingRule{Name: "job:foo", Labels: map[string]string{
"job": "test",
}}
@@ -97,7 +178,7 @@ func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
expErr := "connection reset by peer"
fq.setErr(errors.New(expErr))
rr.q = fq
_, err := rr.Exec(context.TODO(), true)
_, err := rr.Exec(context.TODO())
if err == nil {
t.Fatalf("expected to get err; got nil")
}
@@ -112,7 +193,7 @@ func TestRecoridngRule_ToTimeSeriesNegative(t *testing.T) {
fq.add(metricWithValueAndLabels(t, 1, "__name__", "foo", "job", "foo"))
fq.add(metricWithValueAndLabels(t, 2, "__name__", "foo", "job", "bar"))
_, err = rr.Exec(context.TODO(), true)
_, err = rr.Exec(context.TODO())
if err == nil {
t.Fatalf("expected to get err; got nil")
}

160
app/vmalert/replay.go Normal file
View File

@@ -0,0 +1,160 @@
package main
import (
"context"
"flag"
"fmt"
"strings"
"time"
"github.com/cheggaaa/pb/v3"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
var (
replayFrom = flag.String("replay.timeFrom", "",
"The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'")
replayTo = flag.String("replay.timeTo", "",
"The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'")
replayRulesDelay = flag.Duration("replay.rulesDelay", time.Second,
"Delay between rules evaluation within the group. Could be important if there are chained rules inside of the group"+
"and processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule."+
"Keep it equal or bigger than -remoteWrite.flushInterval.")
replayMaxDatapoints = flag.Int("replay.maxDatapointsPerQuery", 1e3,
"Max number of data points expected in one request. The higher the value, the less requests will be made during replay.")
replayRuleRetryAttempts = flag.Int("replay.ruleRetryAttempts", 5,
"Defines how many retries to make before giving up on rule if request for it returns an error.")
)
func replay(groupsCfg []config.Group, qb datasource.QuerierBuilder, rw *remotewrite.Client) error {
if *replayMaxDatapoints < 1 {
return fmt.Errorf("replay.maxDatapointsPerQuery can't be lower than 1")
}
tFrom, err := time.Parse(time.RFC3339, *replayFrom)
if err != nil {
return fmt.Errorf("failed to parse %q: %s", *replayFrom, err)
}
tTo, err := time.Parse(time.RFC3339, *replayTo)
if err != nil {
return fmt.Errorf("failed to parse %q: %s", *replayTo, err)
}
if !tTo.After(tFrom) {
return fmt.Errorf("replay.timeTo must be bigger than replay.timeFrom")
}
labels := make(map[string]string)
for _, s := range *externalLabels {
if len(s) == 0 {
continue
}
n := strings.IndexByte(s, '=')
if n < 0 {
return fmt.Errorf("missing '=' in `-label`. It must contain label in the form `name=value`; got %q", s)
}
labels[s[:n]] = s[n+1:]
}
fmt.Printf("Replay mode:"+
"\nfrom: \t%v "+
"\nto: \t%v "+
"\nmax data points per request: %d\n",
tFrom, tTo, *replayMaxDatapoints)
var total int
for _, cfg := range groupsCfg {
ng := newGroup(cfg, qb, *evaluationInterval, labels)
total += ng.replay(tFrom, tTo, rw)
}
logger.Infof("replay finished! Imported %d samples", total)
if rw != nil {
return rw.Close()
}
return nil
}
func (g *Group) replay(start, end time.Time, rw *remotewrite.Client) int {
var total int
step := g.Interval * time.Duration(*replayMaxDatapoints)
ri := rangeIterator{start: start, end: end, step: step}
iterations := int(end.Sub(start)/step) + 1
fmt.Printf("\nGroup %q"+
"\ninterval: \t%v"+
"\nrequests to make: \t%d"+
"\nmax range per request: \t%v\n",
g.Name, g.Interval, iterations, step)
for _, rule := range g.Rules {
fmt.Printf("> Rule %q (ID: %d)\n", rule, rule.ID())
bar := pb.StartNew(iterations)
ri.reset()
for ri.next() {
n, err := replayRule(rule, ri.s, ri.e, rw)
if err != nil {
logger.Fatalf("rule %q: %s", rule, err)
}
total += n
bar.Increment()
}
bar.Finish()
// sleep to let remote storage to flush data on-disk
// so chained rules could be calculated correctly
time.Sleep(*replayRulesDelay)
}
return total
}
func replayRule(rule Rule, start, end time.Time, rw *remotewrite.Client) (int, error) {
var err error
var tss []prompbmarshal.TimeSeries
for i := 0; i < *replayRuleRetryAttempts; i++ {
tss, err = rule.ExecRange(context.Background(), start, end)
if err == nil {
break
}
logger.Errorf("attempt %d to execute rule %q failed: %s", i+1, rule, err)
time.Sleep(time.Second)
}
if err != nil { // means all attempts failed
return 0, err
}
if len(tss) < 1 {
return 0, nil
}
var n int
for _, ts := range tss {
if err := rw.Push(ts); err != nil {
return n, fmt.Errorf("remote write failure: %s", err)
}
n += len(ts.Samples)
}
return n, nil
}
type rangeIterator struct {
step time.Duration
start, end time.Time
iter int
s, e time.Time
}
func (ri *rangeIterator) reset() {
ri.iter = 0
ri.s, ri.e = time.Time{}, time.Time{}
}
func (ri *rangeIterator) next() bool {
ri.s = ri.start.Add(ri.step * time.Duration(ri.iter))
if !ri.end.After(ri.s) {
return false
}
ri.e = ri.s.Add(ri.step)
if ri.e.After(ri.end) {
ri.e = ri.end
}
ri.iter++
return true
}

250
app/vmalert/replay_test.go Normal file
View File

@@ -0,0 +1,250 @@
package main
import (
"context"
"fmt"
"testing"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/config"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
)
type fakeReplayQuerier struct {
fakeQuerier
registry map[string]map[string]struct{}
}
func (fr *fakeReplayQuerier) BuildWithParams(_ datasource.QuerierParams) datasource.Querier {
return fr
}
func (fr *fakeReplayQuerier) QueryRange(_ context.Context, q string, from, to time.Time) ([]datasource.Metric, error) {
key := fmt.Sprintf("%s+%s", from.Format("15:04:05"), to.Format("15:04:05"))
dps, ok := fr.registry[q]
if !ok {
return nil, fmt.Errorf("unexpected query received: %q", q)
}
_, ok = dps[key]
if !ok {
return nil, fmt.Errorf("unexpected time range received: %q", key)
}
delete(dps, key)
if len(fr.registry[q]) < 1 {
delete(fr.registry, q)
}
return nil, nil
}
func TestReplay(t *testing.T) {
testCases := []struct {
name string
from, to string
maxDP int
cfg []config.Group
qb *fakeReplayQuerier
}{
{
name: "one rule + one response",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:00.000Z",
maxDP: 10,
cfg: []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {"12:00:00+12:02:00": {}},
},
},
},
{
name: "one rule + multiple responses",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:30.000Z",
maxDP: 1,
cfg: []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
},
},
{
name: "datapoints per step",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T15:02:30.000Z",
maxDP: 60,
cfg: []config.Group{
{Interval: utils.NewPromDuration(time.Minute), Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {
"12:00:00+13:00:00": {},
"13:00:00+14:00:00": {},
"14:00:00+15:00:00": {},
"15:00:00+15:02:30": {},
},
},
},
},
{
name: "multiple recording rules + multiple responses",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:30.000Z",
maxDP: 1,
cfg: []config.Group{
{Rules: []config.Rule{{Record: "foo", Expr: "sum(up)"}}},
{Rules: []config.Rule{{Record: "bar", Expr: "max(up)"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up)": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
},
},
{
name: "multiple alerting rules + multiple responses",
from: "2021-01-01T12:00:00.000Z",
to: "2021-01-01T12:02:30.000Z",
maxDP: 1,
cfg: []config.Group{
{Rules: []config.Rule{{Alert: "foo", Expr: "sum(up) > 1"}}},
{Rules: []config.Rule{{Alert: "bar", Expr: "max(up) < 1"}}},
},
qb: &fakeReplayQuerier{
registry: map[string]map[string]struct{}{
"sum(up) > 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
"max(up) < 1": {
"12:00:00+12:01:00": {},
"12:01:00+12:02:00": {},
"12:02:00+12:02:30": {},
},
},
},
},
}
from, to, maxDP := *replayFrom, *replayTo, *replayMaxDatapoints
retries, delay := *replayRuleRetryAttempts, *replayRulesDelay
defer func() {
*replayFrom, *replayTo = from, to
*replayMaxDatapoints, *replayRuleRetryAttempts = maxDP, retries
*replayRulesDelay = delay
}()
*replayRuleRetryAttempts = 1
*replayRulesDelay = time.Millisecond
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
*replayFrom = tc.from
*replayTo = tc.to
*replayMaxDatapoints = tc.maxDP
if err := replay(tc.cfg, tc.qb, nil); err != nil {
t.Fatalf("replay failed: %s", err)
}
if len(tc.qb.registry) > 0 {
t.Fatalf("not all requests were sent: %#v", tc.qb.registry)
}
})
}
}
func TestRangeIterator(t *testing.T) {
testCases := []struct {
ri rangeIterator
result [][2]time.Time
}{
{
ri: rangeIterator{
start: parseTime(t, "2021-01-01T12:00:00.000Z"),
end: parseTime(t, "2021-01-01T12:30:00.000Z"),
step: 5 * time.Minute,
},
result: [][2]time.Time{
{parseTime(t, "2021-01-01T12:00:00.000Z"), parseTime(t, "2021-01-01T12:05:00.000Z")},
{parseTime(t, "2021-01-01T12:05:00.000Z"), parseTime(t, "2021-01-01T12:10:00.000Z")},
{parseTime(t, "2021-01-01T12:10:00.000Z"), parseTime(t, "2021-01-01T12:15:00.000Z")},
{parseTime(t, "2021-01-01T12:15:00.000Z"), parseTime(t, "2021-01-01T12:20:00.000Z")},
{parseTime(t, "2021-01-01T12:20:00.000Z"), parseTime(t, "2021-01-01T12:25:00.000Z")},
{parseTime(t, "2021-01-01T12:25:00.000Z"), parseTime(t, "2021-01-01T12:30:00.000Z")},
},
},
{
ri: rangeIterator{
start: parseTime(t, "2021-01-01T12:00:00.000Z"),
end: parseTime(t, "2021-01-01T12:30:00.000Z"),
step: 45 * time.Minute,
},
result: [][2]time.Time{
{parseTime(t, "2021-01-01T12:00:00.000Z"), parseTime(t, "2021-01-01T12:30:00.000Z")},
{parseTime(t, "2021-01-01T12:30:00.000Z"), parseTime(t, "2021-01-01T12:30:00.000Z")},
},
},
{
ri: rangeIterator{
start: parseTime(t, "2021-01-01T12:00:12.000Z"),
end: parseTime(t, "2021-01-01T12:00:17.000Z"),
step: time.Second,
},
result: [][2]time.Time{
{parseTime(t, "2021-01-01T12:00:12.000Z"), parseTime(t, "2021-01-01T12:00:13.000Z")},
{parseTime(t, "2021-01-01T12:00:13.000Z"), parseTime(t, "2021-01-01T12:00:14.000Z")},
{parseTime(t, "2021-01-01T12:00:14.000Z"), parseTime(t, "2021-01-01T12:00:15.000Z")},
{parseTime(t, "2021-01-01T12:00:15.000Z"), parseTime(t, "2021-01-01T12:00:16.000Z")},
{parseTime(t, "2021-01-01T12:00:16.000Z"), parseTime(t, "2021-01-01T12:00:17.000Z")},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("case %d", i), func(t *testing.T) {
var j int
for tc.ri.next() {
if len(tc.result) < j+1 {
t.Fatalf("unexpected result for iterator on step %d: %v - %v",
j, tc.ri.s, tc.ri.e)
}
s, e := tc.ri.s, tc.ri.e
expS, expE := tc.result[j][0], tc.result[j][1]
if s != expS {
t.Fatalf("expected to get start=%v; got %v", expS, s)
}
if e != expE {
t.Fatalf("expected to get end=%v; got %v", expE, e)
}
j++
}
})
}
}
func parseTime(t *testing.T, s string) time.Time {
t.Helper()
tt, err := time.Parse("2006-01-02T15:04:05.000Z", s)
if err != nil {
t.Fatal(err)
}
return tt
}

View File

@@ -3,21 +3,21 @@ package main
import (
"context"
"errors"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"time"
)
// Rule represents alerting or recording rule
// that has unique ID, can be Executed and
// updated with other Rule.
type Rule interface {
// Returns unique ID that may be used for
// ID returns unique ID that may be used for
// identifying this Rule among others.
ID() uint64
// Exec executes the rule with given context
// and Querier. If returnSeries is true, Exec
// may return TimeSeries as result of execution
Exec(ctx context.Context, returnSeries bool) ([]prompbmarshal.TimeSeries, error)
Exec(ctx context.Context) ([]prompbmarshal.TimeSeries, error)
// ExecRange executes the rule on the given time range
ExecRange(ctx context.Context, start, end time.Time) ([]prompbmarshal.TimeSeries, error)
// UpdateWith performs modification of current Rule
// with fields of the given Rule.
UpdateWith(Rule) error

View File

@@ -7,17 +7,21 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
func newTimeSeries(value float64, labels map[string]string, timestamp time.Time) prompbmarshal.TimeSeries {
ts := prompbmarshal.TimeSeries{}
ts.Samples = append(ts.Samples, prompbmarshal.Sample{
Value: value,
Timestamp: timestamp.UnixNano() / 1e6,
})
func newTimeSeries(values []float64, timestamps []int64, labels map[string]string) prompbmarshal.TimeSeries {
ts := prompbmarshal.TimeSeries{
Samples: make([]prompbmarshal.Sample, len(values)),
}
for i := range values {
ts.Samples[i] = prompbmarshal.Sample{
Value: values[i],
Timestamp: time.Unix(timestamps[i], 0).UnixNano() / 1e6,
}
}
keys := make([]string, 0, len(labels))
for k := range labels {
keys = append(keys, k)
}
sort.Strings(keys)
sort.Strings(keys) // make order deterministic
for _, key := range keys {
ts.Labels = append(ts.Labels, prompbmarshal.Label{
Name: key,

View File

@@ -0,0 +1,43 @@
package utils
import (
"time"
"github.com/VictoriaMetrics/metricsql"
)
// PromDuration is Prometheus duration.
type PromDuration struct {
milliseconds int64
}
// NewPromDuration returns PromDuration for given d.
func NewPromDuration(d time.Duration) PromDuration {
return PromDuration{
milliseconds: d.Milliseconds(),
}
}
// MarshalYAML implements yaml.Marshaler interface.
func (pd PromDuration) MarshalYAML() (interface{}, error) {
return pd.Duration().String(), nil
}
// UnmarshalYAML implements yaml.Unmarshaler interface.
func (pd *PromDuration) UnmarshalYAML(unmarshal func(interface{}) error) error {
var s string
if err := unmarshal(&s); err != nil {
return err
}
ms, err := metricsql.DurationValue(s, 0)
if err != nil {
return err
}
pd.milliseconds = ms
return nil
}
// Duration returns duration for pd.
func (pd *PromDuration) Duration() time.Duration {
return time.Duration(pd.milliseconds) * time.Millisecond
}

View File

@@ -13,6 +13,7 @@ func TestTLSConfig(t *testing.T) {
}
if tlsCfg == nil {
t.Errorf("expected tlsConfig to be set, got nil")
return
}
if tlsCfg.ServerName != serverName {
t.Errorf("unexpected ServerName, want %s, got %s", serverName, tlsCfg.ServerName)

View File

@@ -34,7 +34,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/api/v1/groups":
data, err := rh.listGroups()
if err != nil {
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
@@ -43,7 +43,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/api/v1/alerts":
data, err := rh.listAlerts()
if err != nil {
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
@@ -61,7 +61,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
// /api/v1/<groupName>/<alertID>/status
data, err := rh.alert(r.URL.Path)
if err != nil {
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")

View File

@@ -20,14 +20,15 @@ type APIAlert struct {
// APIGroup represents Group for WEB view
type APIGroup struct {
Name string `json:"name"`
Type string `json:"type"`
ID string `json:"id"`
File string `json:"file"`
Interval string `json:"interval"`
Concurrency int `json:"concurrency"`
AlertingRules []APIAlertingRule `json:"alerting_rules"`
RecordingRules []APIRecordingRule `json:"recording_rules"`
Name string `json:"name"`
Type string `json:"type"`
ID string `json:"id"`
File string `json:"file"`
Interval string `json:"interval"`
Concurrency int `json:"concurrency"`
ExtraFilterLabels map[string]string `json:"extra_filter_labels"`
AlertingRules []APIAlertingRule `json:"alerting_rules"`
RecordingRules []APIRecordingRule `json:"recording_rules"`
}
// APIAlertingRule represents AlertingRule for WEB view

View File

@@ -1,8 +1,8 @@
# vmauth
`vmauth` is a simple auth proxy and router for [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics).
It reads username and password from [Basic Auth headers](https://en.wikipedia.org/wiki/Basic_access_authentication),
matches them against configs pointed by `-auth.config` command-line flag and proxies incoming HTTP requests to the configured per-user `url_prefix` on successful match.
`vmauth` is a simple auth proxy, router and [load balancer](#load-balancing) for [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics).
It reads auth credentials from `Authorization` http header ([Basic Auth](https://en.wikipedia.org/wiki/Basic_access_authentication) and `Bearer token` is supported),
matches them against configs pointed by [-auth.config](#auth-config) command-line flag and proxies incoming HTTP requests to the configured per-user `url_prefix` on successful match.
## Quick start
@@ -17,7 +17,7 @@ and pass the following flag to `vmauth` binary in order to start authorizing and
After that `vmauth` starts accepting HTTP requests on port `8427` and routing them according to the provided [-auth.config](#auth-config).
The port can be modified via `-httpListenAddr` command-line flag.
The auth config can be reloaded by passing `SIGHUP` signal to `vmauth`.
The auth config can be reloaded either by passing `SIGHUP` signal to `vmauth` or by querying `/-/reload` http endpoint.
Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags).
@@ -27,9 +27,14 @@ Feel free [contacting us](mailto:info@victoriametrics.com) if you need customize
accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.com/vmgateway.html).
## Load balancing
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
## Auth config
Auth config is represented in the following simple `yml` format:
`-auth.config` is represented in the following simple `yml` format:
```yml
@@ -61,31 +66,47 @@ users:
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8481/select/123/prometheus/api/v1/select
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
- username: "cluster-select-account-123"
password: "***"
url_prefix: "http://vmselect:8481/select/123/prometheus"
url_prefix:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
- username: "cluster-insert-account-42"
password: "***"
url_prefix: "http://vminsert:8480/insert/42/prometheus"
url_prefix:
- "http://vminsert1:8480/insert/42/prometheus"
- "http://vminsert2:8480/insert/42/prometheus"
# A single user for querying and inserting data:
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to http://vmselect:8481/select/42/prometheus.
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8480/select/42/prometheus/api/v1/query
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/42/prometheus
# - http://vmselect2:8481/select/42/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "foobar"
url_map:
- src_paths: ["/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^/]+/values"]
url_prefix: "http://vmselect:8481/select/42/prometheus"
- src_paths:
- "/api/v1/query"
- "/api/v1/query_range"
- "/api/v1/label/[^/]+/values"
url_prefix:
- "http://vmselect1:8481/select/42/prometheus"
- "http://vmselect2:8481/select/42/prometheus"
- src_paths: ["/api/v1/write"]
url_prefix: "http://vminsert:8480/insert/42/prometheus"
```
@@ -109,6 +130,8 @@ Do not transfer Basic Auth headers in plaintext over untrusted networks. Enable
Alternatively, [https termination proxy](https://en.wikipedia.org/wiki/TLS_termination_proxy) may be put in front of `vmauth`.
It is recommended protecting `/-/reload` endpoint with `-reloadAuthKey` command-line flag, so external users couldn't trigger config reload.
## Monitoring
@@ -123,7 +146,7 @@ It is recommended using [binary releases](https://github.com/VictoriaMetrics/Vic
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmauth` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmauth` binary and puts it into the `bin` folder.
@@ -200,7 +223,7 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealay, the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
@@ -221,6 +244,8 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
-loggerWarnsPerSecondLimit int
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
-maxIdleConnsPerBackend int
The maximum number of idle connections vmauth can open per each backend host (default 100)
-memory.allowedBytes size
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
@@ -230,6 +255,8 @@ See the docs at https://docs.victoriametrics.com/vmauth.html .
Auth key for /metrics. It overrides httpAuth settings
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-reloadAuthKey string
Auth key for /-/reload http endpoint. It must be passed as authKey=...
-tls
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
-tlsCertFile string

View File

@@ -6,7 +6,9 @@ import (
"fmt"
"io/ioutil"
"net/url"
"os"
"regexp"
"strconv"
"strings"
"sync"
"sync/atomic"
@@ -30,11 +32,11 @@ type AuthConfig struct {
// UserInfo is user information read from authConfigPath
type UserInfo struct {
BearerToken string `yaml:"bearer_token"`
Username string `yaml:"username"`
Password string `yaml:"password"`
URLPrefix *yamlURL `yaml:"url_prefix"`
URLMap []URLMap `yaml:"url_map"`
BearerToken string `yaml:"bearer_token"`
Username string `yaml:"username"`
Password string `yaml:"password"`
URLPrefix *URLPrefix `yaml:"url_prefix"`
URLMap []URLMap `yaml:"url_map"`
requests *metrics.Counter
}
@@ -42,7 +44,7 @@ type UserInfo struct {
// URLMap is a mapping from source paths to target urls.
type URLMap struct {
SrcPaths []*SrcPath `yaml:"src_paths"`
URLPrefix *yamlURL `yaml:"url_prefix"`
URLPrefix *URLPrefix `yaml:"url_prefix"`
}
// SrcPath represents an src path
@@ -51,25 +53,74 @@ type SrcPath struct {
re *regexp.Regexp
}
type yamlURL struct {
u *url.URL
// URLPrefix represents pased `url_prefix`
type URLPrefix struct {
n uint32
urls []*url.URL
}
func (yu *yamlURL) UnmarshalYAML(f func(interface{}) error) error {
var s string
if err := f(&s); err != nil {
func (up *URLPrefix) getNextURL() *url.URL {
n := atomic.AddUint32(&up.n, 1)
idx := n % uint32(len(up.urls))
return up.urls[idx]
}
// UnmarshalYAML unmarshals up from yaml.
func (up *URLPrefix) UnmarshalYAML(f func(interface{}) error) error {
var v interface{}
if err := f(&v); err != nil {
return err
}
u, err := url.Parse(s)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", s, err)
var urls []string
switch x := v.(type) {
case string:
urls = []string{x}
case []interface{}:
if len(x) == 0 {
return fmt.Errorf("`url_prefix` must contain at least a single url")
}
us := make([]string, len(x))
for i, xx := range x {
s, ok := xx.(string)
if !ok {
return fmt.Errorf("`url_prefix` must contain array of strings; got %T", xx)
}
us[i] = s
}
urls = us
default:
return fmt.Errorf("unexpected type for `url_prefix`: %T; want string or []string", v)
}
yu.u = u
pus := make([]*url.URL, len(urls))
for i, u := range urls {
pu, err := url.Parse(u)
if err != nil {
return fmt.Errorf("cannot unmarshal %q into url: %w", u, err)
}
pus[i] = pu
}
up.urls = pus
return nil
}
func (yu *yamlURL) MarshalYAML() (interface{}, error) {
return yu.u.String(), nil
// MarshalYAML marshals up to yaml.
func (up *URLPrefix) MarshalYAML() (interface{}, error) {
var b []byte
if len(up.urls) == 1 {
u := up.urls[0].String()
b = strconv.AppendQuote(b, u)
return string(b), nil
}
b = append(b, '[')
for i, pu := range up.urls {
u := pu.String()
b = strconv.AppendQuote(b, u)
if i+1 < len(up.urls) {
b = append(b, ',')
}
}
b = append(b, ']')
return string(b), nil
}
func (sp *SrcPath) match(s string) bool {
@@ -109,6 +160,12 @@ func initAuthConfig() {
if len(*authConfigPath) == 0 {
logger.Fatalf("missing required `-auth.config` command-line flag")
}
// Register SIGHUP handler for config re-read just before readAuthConfig call.
// This guarantees that the config will be re-read if the signal arrives during readAuthConfig call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
m, err := readAuthConfig(*authConfigPath)
if err != nil {
logger.Fatalf("cannot load auth config from `-auth.config=%s`: %s", *authConfigPath, err)
@@ -118,7 +175,7 @@ func initAuthConfig() {
authConfigWG.Add(1)
go func() {
defer authConfigWG.Done()
authConfigReloader()
authConfigReloader(sighupCh)
}()
}
@@ -127,8 +184,7 @@ func stopAuthConfig() {
authConfigWG.Wait()
}
func authConfigReloader() {
sighupCh := procutil.NewSighupChan()
func authConfigReloader(sighupCh <-chan os.Signal) {
for {
select {
case <-stopCh:
@@ -195,11 +251,9 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", authToken, ui.BearerToken, ui.Username)
}
if ui.URLPrefix != nil {
urlPrefix, err := sanitizeURLPrefix(ui.URLPrefix.u)
if err != nil {
if err := ui.URLPrefix.sanitize(); err != nil {
return nil, err
}
ui.URLPrefix.u = urlPrefix
}
for _, e := range ui.URLMap {
if len(e.SrcPaths) == 0 {
@@ -208,11 +262,9 @@ func parseAuthConfig(data []byte) (map[string]*UserInfo, error) {
if e.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix` in `url_map`")
}
urlPrefix, err := sanitizeURLPrefix(e.URLPrefix.u)
if err != nil {
if err := e.URLPrefix.sanitize(); err != nil {
return nil, err
}
e.URLPrefix.u = urlPrefix
}
if len(ui.URLMap) == 0 && ui.URLPrefix == nil {
return nil, fmt.Errorf("missing `url_prefix`")
@@ -242,6 +294,17 @@ func getAuthToken(bearerToken, username, password string) string {
return "Basic " + token64
}
func (up *URLPrefix) sanitize() error {
for i, pu := range up.urls {
puNew, err := sanitizeURLPrefix(pu)
if err != nil {
return err
}
up.urls[i] = puNew
}
return nil
}
func sanitizeURLPrefix(urlPrefix *url.URL) (*url.URL, error) {
// Remove trailing '/' from urlPrefix
for strings.HasSuffix(urlPrefix.Path, "/") {

View File

@@ -59,7 +59,21 @@ users:
f(`
users:
- username: foo
url_prefix: [bar]
url_prefix:
bar: baz
`)
f(`
users:
- username: foo
url_prefix:
- [foo]
`)
// empty url_prefix
f(`
users:
- username: foo
url_prefix: []
`)
// Username and bearer_token in a single config
@@ -117,6 +131,15 @@ users:
url_prefix: foo.bar
`)
// empty url_prefix in url_map
f(`
users:
- username: a
url_map:
- src_paths: ['/foo/bar']
url_prefix: []
`)
// Missing src_paths in url_map
f(`
users:
@@ -162,6 +185,25 @@ users:
},
})
// Multiple url_prefix entries
f(`
users:
- username: foo
password: bar
url_prefix:
- http://node1:343/bbb
- http://node2:343/bbb
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURLs([]string{
"http://node1:343/bbb",
"http://node2:343/bbb",
}),
},
})
// Multiple users
f(`
users:
@@ -188,7 +230,7 @@ users:
- src_paths: ["/api/v1/query","/api/v1/query_range","/api/v1/label/[^./]+/.+"]
url_prefix: http://vmselect/select/0/prometheus
- src_paths: ["/api/v1/write"]
url_prefix: http://vminsert/insert/0/prometheus
url_prefix: ["http://vminsert1/insert/0/prometheus","http://vminsert2/insert/0/prometheus"]
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
BearerToken: "foo",
@@ -198,8 +240,11 @@ users:
URLPrefix: mustParseURL("http://vmselect/select/0/prometheus"),
},
{
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURL("http://vminsert/insert/0/prometheus"),
SrcPaths: getSrcPaths([]string{"/api/v1/write"}),
URLPrefix: mustParseURLs([]string{
"http://vminsert1/insert/0/prometheus",
"http://vminsert2/insert/0/prometheus",
}),
},
},
},
@@ -238,12 +283,20 @@ func areEqualConfigs(a, b map[string]*UserInfo) error {
return nil
}
func mustParseURL(u string) *yamlURL {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
func mustParseURL(u string) *URLPrefix {
return mustParseURLs([]string{u})
}
func mustParseURLs(us []string) *URLPrefix {
pus := make([]*url.URL, len(us))
for i, u := range us {
pu, err := url.Parse(u)
if err != nil {
panic(fmt.Errorf("BUG: cannot parse %q: %w", u, err))
}
pus[i] = pu
}
return &yamlURL{
u: pu,
return &URLPrefix{
urls: pus,
}
}

View File

@@ -26,30 +26,46 @@ users:
# The user for querying account 123 in VictoriaMetrics cluster
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vmselect:8481/select/123/prometheus .
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8481/select/123/prometheus/api/v1/select
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
- username: "cluster-select-account-123"
password: "***"
url_prefix: "http://vmselect:8481/select/123/prometheus"
url_prefix:
- "http://vmselect1:8481/select/123/prometheus"
- "http://vmselect2:8481/select/123/prometheus"
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
# will be proxied to http://vminsert:8480/insert/42/prometheus .
# For example, http://vmauth:8427/api/v1/write is proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
- username: "cluster-insert-account-42"
password: "***"
url_prefix: "http://vminsert:8480/insert/42/prometheus"
url_prefix:
- "http://vminsert1:8480/insert/42/prometheus"
- "http://vminsert2:8480/insert/42/prometheus"
# A single user for querying and inserting data:
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to http://vmselect:8481/select/42/prometheus.
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect:8480/select/42/prometheus/api/v1/query
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
# - http://vmselect1:8481/select/42/prometheus
# - http://vmselect2:8481/select/42/prometheus
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write
- username: "foobar"
url_map:
- src_paths: ["/api/v1/query", "/api/v1/query_range", "/api/v1/label/[^/]+/values"]
url_prefix: "http://vmselect:8481/select/42/prometheus"
- src_paths:
- "/api/v1/query"
- "/api/v1/query_range"
- "/api/v1/label/[^/]+/values"
url_prefix:
- "http://vmselect1:8481/select/42/prometheus"
- "http://vmselect2:8481/select/42/prometheus"
- src_paths: ["/api/v1/write"]
url_prefix: "http://vminsert:8480/insert/42/prometheus"

View File

@@ -14,10 +14,13 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
"github.com/VictoriaMetrics/metrics"
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
)
func main() {
@@ -47,6 +50,18 @@ func main() {
}
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/-/reload":
authKey := r.FormValue("authKey")
if authKey != *reloadAuthKey {
httpserver.Errorf(w, r, "invalid authKey %q. It must match the value from -reloadAuthKey command line flag", authKey)
return true
}
configReloadRequests.Inc()
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
return true
}
authToken := r.Header.Get("Authorization")
if authToken == "" {
w.Header().Set("WWW-Authenticate", `Basic realm="Restricted"`)
@@ -66,10 +81,26 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
r.Header.Set("vm-target-url", targetURL.String())
reverseProxy.ServeHTTP(w, r)
proxyRequest(w, r)
return true
}
func proxyRequest(w http.ResponseWriter, r *http.Request) {
defer func() {
err := recover()
if err == nil || err == http.ErrAbortHandler {
// Suppress http.ErrAbortHandler panic.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353
return
}
// Forward other panics to the caller.
panic(err)
}()
reverseProxy.ServeHTTP(w, r)
}
var configReloadRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/-/reload"}`)
var reverseProxy = &httputil.ReverseProxy{
Director: func(r *http.Request) {
targetURL := r.Header.Get("vm-target-url")
@@ -85,6 +116,7 @@ var reverseProxy = &httputil.ReverseProxy{
tr.DisableCompression = true
// Disable HTTP/2.0, since VictoriaMetrics components don't support HTTP/2.0 (because there is no sense in this).
tr.ForceAttemptHTTP2 = false
tr.MaxIdleConnsPerHost = *maxIdleConnsPerBackend
return tr
}(),
FlushInterval: time.Second,

View File

@@ -7,6 +7,11 @@ import (
"strings"
)
func (up *URLPrefix) mergeURLs(requestURI *url.URL) *url.URL {
pu := up.getNextURL()
return mergeURLs(pu, requestURI)
}
func mergeURLs(uiURL, requestURI *url.URL) *url.URL {
targetURL := *uiURL
targetURL.Path += requestURI.Path
@@ -40,12 +45,12 @@ func createTargetURL(ui *UserInfo, uOrig *url.URL) (*url.URL, error) {
for _, e := range ui.URLMap {
for _, sp := range e.SrcPaths {
if sp.match(u.Path) {
return mergeURLs(e.URLPrefix.u, &u), nil
return e.URLPrefix.mergeURLs(&u), nil
}
}
}
if ui.URLPrefix != nil {
return mergeURLs(ui.URLPrefix.u, &u), nil
return ui.URLPrefix.mergeURLs(&u), nil
}
return nil, fmt.Errorf("missing route for %q", u.String())
}

View File

@@ -35,7 +35,7 @@ vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-
* `</path/to/victoria-metrics-data>` - path to VictoriaMetrics data pointed by `-storageDataPath` command-line flag in single-node VictoriaMetrics or in cluster `vmstorage`.
There is no need to stop VictoriaMetrics for creating backups, since they are performed from immutable [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots). `vmbackup` can create the snapshot on itself if `-snapshot.createURL` command-line flag is set to an url for creating snapshots. In this case `-snapshotName` flag isn't needed.
* `<bucket>` is an already existing name for [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets).
* `<path/to/new/backup>` is the destination path where new backup will be placed.
@@ -216,11 +216,11 @@ See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-
-origin string
Optional origin directory on the remote storage with old backup for server-side copying when performing full backup. This speeds up full backups
-snapshot.createURL string
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set
-snapshot.deleteURL string
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snaphosts will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete
-snapshotName string
Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots
Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set
-storageDataPath string
Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data")
-version
@@ -235,7 +235,7 @@ It is recommended using [binary releases](https://github.com/VictoriaMetrics/Vic
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmbackup` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmbackup` binary and puts it into the `bin` folder.

View File

@@ -19,11 +19,11 @@ import (
var (
storageDataPath = flag.String("storageDataPath", "victoria-metrics-data", "Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage")
snapshotName = flag.String("snapshotName", "", "Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots")
snapshotName = flag.String("snapshotName", "", "Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set")
snapshotCreateURL = flag.String("snapshot.createURL", "", "VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. "+
"Example: http://victoriametrics:8428/snapshot/create")
"Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set")
snapshotDeleteURL = flag.String("snapshot.deleteURL", "", "VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. "+
"All created snaphosts will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete")
"All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete")
dst = flag.String("dst", "", "Where to put the backup on the remote storage. "+
"Example: gcs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir\n"+
"-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded")
@@ -41,7 +41,9 @@ func main() {
logger.Init()
if len(*snapshotCreateURL) > 0 {
logger.Infof("Snapshots enabled")
if len(*snapshotName) > 0 {
logger.Fatalf("-snapshotName shouldn't be set if -snapshot.createURL is set, since snapshots are created automatically in this case")
}
logger.Infof("Snapshot create url %s", *snapshotCreateURL)
if len(*snapshotDeleteURL) <= 0 {
err := flag.Set("snapshot.deleteURL", strings.Replace(*snapshotCreateURL, "/create", "/delete", 1))
@@ -99,7 +101,7 @@ func usage() {
vmbackup performs backups for VictoriaMetrics data from instant snapshots to gcs, s3
or local filesystem. Backed up data can be restored with vmrestore.
See the docs at https://docs.victoriametrics.com/vbackup.html .
See the docs at https://docs.victoriametrics.com/vmbackup.html .
`
flagutil.Usage(s)
}

View File

@@ -1,8 +1,8 @@
## vmbackupmanager
VictoriaMetrics backup manager
***vmbackupmanager is a part of [enterprise package](https://victoriametrics.com/enterprise.html)***
This service automates regular backup procedures. It supports the following backup intervals: **hourly**, **daily**, **weekly** and **monthly**. Multiple backup intervals may be configured simultaneously. I.e. the backup manager creates hourly backups every hour, while it creates daily backups every day, etc. Backup manager must have read access to the storage data, so best practice is to install it on the same machine (or as a sidecar) where the storage node is installed.
The VictoriaMetrics backup manager automates regular backup procedures. It supports the following backup intervals: **hourly**, **daily**, **weekly** and **monthly**. Multiple backup intervals may be configured simultaneously. I.e. the backup manager creates hourly backups every hour, while it creates daily backups every day, etc. Backup manager must have read access to the storage data, so best practice is to install it on the same machine (or as a sidecar) where the storage node is installed.
The backup service makes a backup every hour and puts it to the latest folder and then copies data to the folders which represent the backup intervals (hourly, daily, weekly and monthly)
The required flags for running the service are as follows:
@@ -49,7 +49,7 @@ There are two flags which could help with performance tuning:
* -concurrency - The number of concurrent workers. Higher concurrency may improve upload speed (default 10)
### Example of Usage
## Example of Usage
GCS and cluster version. You need to have a credentials file in json format with following structure

View File

@@ -22,7 +22,7 @@ It is recommended using [binary releases](https://github.com/VictoriaMetrics/Vic
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmctl` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl` binary and puts it into the `bin` folder.
@@ -51,7 +51,7 @@ ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://b
#### Development ARM build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmctl-arm` or `make vmctl-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmctl-arm` or `vmctl-arm64` binary respectively and puts it into the `bin` folder.

View File

@@ -316,7 +316,7 @@ var (
Name: vmNativeSrcAddr,
Usage: "VictoriaMetrics address to perform export from. \n" +
" Should be the same as --httpListenAddr value for single-node version or vmselect component." +
" If exporting from cluster version - include the tenet token in address.",
" If exporting from cluster version see https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format",
Required: true,
},
&cli.StringFlag{
@@ -333,7 +333,7 @@ var (
Name: vmNativeDstAddr,
Usage: "VictoriaMetrics address to perform import to. \n" +
" Should be the same as --httpListenAddr value for single-node version or vminsert component." +
" If importing into cluster version - include the tenet token in address.",
" If importing into cluster version see https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format",
Required: true,
},
&cli.StringFlag{

View File

@@ -61,7 +61,7 @@ func (s Series) fetchQuery(timeFilter string) string {
}
for i, pair := range s.LabelPairs {
pairV := valueEscaper.Replace(pair.Value)
fmt.Fprintf(f, " %q='%s'", pair.Name, pairV)
fmt.Fprintf(f, " %q::tag='%s'", pair.Name, pairV)
if i != len(s.LabelPairs)-1 {
f.WriteString(" and")
}

View File

@@ -19,7 +19,7 @@ func TestFetchQuery(t *testing.T) {
},
},
},
expected: `select "value" from "cpu" where "foo"='bar'`,
expected: `select "value" from "cpu" where "foo"::tag='bar'`,
},
{
s: Series{
@@ -36,7 +36,7 @@ func TestFetchQuery(t *testing.T) {
},
},
},
expected: `select "value" from "cpu" where "foo"='bar' and "baz"='qux'`,
expected: `select "value" from "cpu" where "foo"::tag='bar' and "baz"::tag='qux'`,
},
{
s: Series{
@@ -50,7 +50,7 @@ func TestFetchQuery(t *testing.T) {
},
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where "foo"='b\'ar' and time >= now()`,
expected: `select "value" from "cpu" where "foo"::tag='b\'ar' and time >= now()`,
},
{
s: Series{
@@ -68,7 +68,7 @@ func TestFetchQuery(t *testing.T) {
},
},
timeFilter: "time >= now()",
expected: `select "value" from "cpu" where "name"='dev-mapper-centos\\x2dswap.swap' and "state"='dev-mapp\'er-c\'en\'tos' and time >= now()`,
expected: `select "value" from "cpu" where "name"::tag='dev-mapper-centos\\x2dswap.swap' and "state"::tag='dev-mapp\'er-c\'en\'tos' and time >= now()`,
},
{
s: Series{

View File

@@ -56,27 +56,37 @@ func (cw *cWriter) printf(format string, args ...interface{}) {
//"{"metric":{"__name__":"cpu_usage_guest","arch":"x64","hostname":"host_19",},"timestamps":[1567296000000,1567296010000],"values":[1567296000000,66]}
func (ts *TimeSeries) write(w io.Writer) (int, error) {
pointsCount := len(ts.Timestamps)
if pointsCount == 0 {
return 0, nil
}
timestamps := ts.Timestamps
values := ts.Values
cw := &cWriter{w: w}
cw.printf(`{"metric":{"__name__":%q`, ts.Name)
if len(ts.LabelPairs) > 0 {
for len(timestamps) > 0 {
// Split long lines with more than 10K samples into multiple JSON lines.
// This should limit memory usage at VictoriaMetrics during data ingestion,
// since it allocates memory for the whole JSON line and processes it in one go.
batchSize := 10000
if batchSize > len(timestamps) {
batchSize = len(timestamps)
}
timestampsBatch := timestamps[:batchSize]
valuesBatch := values[:batchSize]
timestamps = timestamps[batchSize:]
values = values[batchSize:]
cw.printf(`{"metric":{"__name__":%q`, ts.Name)
for _, lp := range ts.LabelPairs {
cw.printf(",%q:%q", lp.Name, lp.Value)
}
}
cw.printf(`},"timestamps":[`)
for i := 0; i < pointsCount-1; i++ {
cw.printf(`%d,`, ts.Timestamps[i])
pointsCount := len(timestampsBatch)
cw.printf(`},"timestamps":[`)
for i := 0; i < pointsCount-1; i++ {
cw.printf(`%d,`, timestampsBatch[i])
}
cw.printf(`%d],"values":[`, timestampsBatch[pointsCount-1])
for i := 0; i < pointsCount-1; i++ {
cw.printf(`%v,`, valuesBatch[i])
}
cw.printf("%v]}\n", valuesBatch[pointsCount-1])
}
cw.printf(`%d],"values":[`, ts.Timestamps[pointsCount-1])
for i := 0; i < pointsCount-1; i++ {
cw.printf(`%v,`, ts.Values[i])
}
cw.printf("%v]}\n", ts.Values[pointsCount-1])
return cw.n, cw.err
}

View File

@@ -1,5 +1,7 @@
# vmgateway
***vmgateway is a part of [enterprise package](https://victoriametrics.com/enterprise.html)***
<img alt="vmgateway" src="vmgateway-overview.jpeg">
@@ -217,7 +219,7 @@ The shortlist of configuration flags include the following:
-http.pathPrefix string
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
-http.shutdownDelay duration
Optional delay before http server shutdown. During this dealay, the servier returns non-OK responses from /health page, so load balancers can route new requests to other servers
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string

View File

@@ -6,6 +6,7 @@ import (
"net/http"
"strings"
"sync/atomic"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/graphite"
@@ -44,8 +45,8 @@ var (
)
var (
influxServer *influxserver.Server
graphiteServer *graphiteserver.Server
influxServer *influxserver.Server
opentsdbServer *opentsdbserver.Server
opentsdbhttpServer *opentsdbhttpserver.Server
)
@@ -56,12 +57,12 @@ func Init() {
storage.SetMaxLabelsPerTimeseries(*maxLabelsPerTimeseries)
common.StartUnmarshalWorkers()
writeconcurrencylimiter.Init()
if len(*influxListenAddr) > 0 {
influxServer = influxserver.MustStart(*influxListenAddr, influx.InsertHandlerForReader)
}
if len(*graphiteListenAddr) > 0 {
graphiteServer = graphiteserver.MustStart(*graphiteListenAddr, graphite.InsertHandler)
}
if len(*influxListenAddr) > 0 {
influxServer = influxserver.MustStart(*influxListenAddr, influx.InsertHandlerForReader)
}
if len(*opentsdbListenAddr) > 0 {
opentsdbServer = opentsdbserver.MustStart(*opentsdbListenAddr, opentsdb.InsertHandler, opentsdbhttp.InsertHandler)
}
@@ -74,12 +75,12 @@ func Init() {
// Stop stops vminsert.
func Stop() {
promscrape.Stop()
if len(*influxListenAddr) > 0 {
influxServer.MustStop()
}
if len(*graphiteListenAddr) > 0 {
graphiteServer.MustStop()
}
if len(*influxListenAddr) > 0 {
influxServer.MustStop()
}
if len(*opentsdbListenAddr) > 0 {
opentsdbServer.MustStop()
}
@@ -91,13 +92,16 @@ func Stop() {
// RequestHandler is a handler for Prometheus remote storage write API
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
startTime := time.Now()
defer requestDuration.UpdateDuration(startTime)
path := strings.Replace(r.URL.Path, "//", "/", -1)
switch path {
case "/prometheus/api/v1/write", "/api/v1/write":
prometheusWriteRequests.Inc()
if err := promremotewrite.InsertHandler(r); err != nil {
prometheusWriteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -106,7 +110,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
vmimportRequests.Inc()
if err := vmimport.InsertHandler(r); err != nil {
vmimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -115,7 +119,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
csvimportRequests.Inc()
if err := csvimport.InsertHandler(r); err != nil {
csvimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -124,7 +128,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
prometheusimportRequests.Inc()
if err := prometheusimport.InsertHandler(r); err != nil {
prometheusimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -133,7 +137,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
nativeimportRequests.Inc()
if err := native.InsertHandler(r); err != nil {
nativeimportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -142,7 +146,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
influxWriteRequests.Inc()
if err := influx.InsertHandlerForHTTP(r); err != nil {
influxWriteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -183,6 +187,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
var (
requestDuration = metrics.NewHistogram(`vminsert_request_duration_seconds`)
prometheusWriteRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/write", protocol="promremotewrite"}`)
prometheusWriteErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/api/v1/write", protocol="promremotewrite"}`)

View File

@@ -14,11 +14,20 @@ import (
"github.com/VictoriaMetrics/metrics"
)
var relabelConfig = flag.String("relabelConfig", "", "Optional path to a file with relabeling rules, which are applied to all the ingested metrics. "+
"See https://docs.victoriametrics.com/#relabeling for details")
var (
relabelConfig = flag.String("relabelConfig", "", "Optional path to a file with relabeling rules, which are applied to all the ingested metrics. "+
"See https://docs.victoriametrics.com/#relabeling for details")
relabelDebug = flag.Bool("relabelDebug", false, "Whether to log metrics before and after relabeling with -relabelConfig. If the -relabelDebug is enabled, "+
"then the metrics aren't sent to storage. This is useful for debugging the relabeling configs")
)
// Init must be called after flag.Parse and before using the relabel package.
func Init() {
// Register SIGHUP handler for config re-read just before loadRelabelConfig call.
// This guarantees that the config will be re-read if the signal arrives during loadRelabelConfig call.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240
sighupCh := procutil.NewSighupChan()
pcs, err := loadRelabelConfig()
if err != nil {
logger.Fatalf("cannot load relabelConfig: %s", err)
@@ -27,7 +36,6 @@ func Init() {
if len(*relabelConfig) == 0 {
return
}
sighupCh := procutil.NewSighupChan()
go func() {
for range sighupCh {
logger.Infof("received SIGHUP; reloading -relabelConfig=%q...", *relabelConfig)
@@ -48,7 +56,7 @@ func loadRelabelConfig() (*promrelabel.ParsedConfigs, error) {
if len(*relabelConfig) == 0 {
return nil, nil
}
pcs, err := promrelabel.LoadRelabelConfigs(*relabelConfig)
pcs, err := promrelabel.LoadRelabelConfigs(*relabelConfig, *relabelDebug)
if err != nil {
return nil, fmt.Errorf("error when reading -relabelConfig=%q: %w", *relabelConfig, err)
}

View File

@@ -1,6 +1,6 @@
# vmrestore
`vmrestore` restores data from backups created by [vmbackup](https://docs.victoriametrics.com/vbackup.html).
`vmrestore` restores data from backups created by [vmbackup](https://docs.victoriametrics.com/vmbackup.html).
VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data.
Restore process can be interrupted at any time. It is automatically resumed from the interruption point
@@ -17,7 +17,7 @@ vmrestore -src=gcs://<bucket>/<path/to/backup> -storageDataPath=<local/path/to/r
```
* `<bucket>` is [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets) name.
* `<path/to/backup>` is the path to backup made with [vmbackup](https://docs.victoriametrics.com/vbackup.html) on GCS bucket.
* `<path/to/backup>` is the path to backup made with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) on GCS bucket.
* `<local/path/to/restore>` is the path to folder where data will be restored. This folder must be passed
to VictoriaMetrics in `-storageDataPath` command-line flag after the restore process is complete.
@@ -131,7 +131,7 @@ It is recommended using [binary releases](https://github.com/VictoriaMetrics/Vic
### Development build
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.15.
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.16.
2. Run `make vmrestore` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
It builds `vmrestore` binary and puts it into the `bin` folder.

View File

@@ -1,2 +1,4 @@
`vmselect` performs the incoming queries and fetches the required data
from `vmstorage`.
The `vmui` directory contains static contents built from [app/vmui](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmui) package with `make vmui-update` command. The `vmui` page is available at `http://<victoria-metrics>:8428/vmui/`.

View File

@@ -1,6 +1,7 @@
package vmselect
import (
"embed"
"errors"
"flag"
"fmt"
@@ -28,9 +29,12 @@ var (
"It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration")
maxQueueDuration = flag.Duration("search.maxQueueDuration", 10*time.Second, "The maximum time the request waits for execution when -search.maxConcurrentRequests "+
"limit is reached; see also -search.maxQueryDuration")
resetCacheAuthKey = flag.String("search.resetCacheAuthKey", "", "Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call")
resetCacheAuthKey = flag.String("search.resetCacheAuthKey", "", "Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call")
logSlowQueryDuration = flag.Duration("search.logSlowQueryDuration", 5*time.Second, "Log queries with execution time exceeding this value. Zero disables slow query logging")
)
var slowQueries = metrics.NewCounter(`vm_slow_queries_total`)
func getDefaultMaxConcurrentRequests() int {
n := cgroup.AvailableCPUs()
if n <= 4 {
@@ -74,9 +78,22 @@ var (
})
)
// RequestHandler handles remote read API requests for Prometheus
//go:embed vmui
var vmuiFiles embed.FS
var vmuiFileServer = http.FileServer(http.FS(vmuiFiles))
// RequestHandler handles remote read API requests
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
// vmui access.
if strings.HasPrefix(r.URL.Path, "/vmui") {
vmuiFileServer.ServeHTTP(w, r)
return true
}
startTime := time.Now()
defer requestDuration.UpdateDuration(startTime)
// Limit the number of concurrent queries.
select {
case concurrencyCh <- struct{}{}:
@@ -108,6 +125,20 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
}
if *logSlowQueryDuration > 0 {
actualStartTime := time.Now()
defer func() {
d := time.Since(actualStartTime)
if d >= *logSlowQueryDuration {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("slow query according to -search.logSlowQueryDuration=%s: remoteAddr=%s, duration=%.3f seconds; requestURI: %q",
*logSlowQueryDuration, remoteAddr, d.Seconds(), requestURI)
slowQueries.Inc()
}
}()
}
path := strings.Replace(r.URL.Path, "//", "/", -1)
if path == "/internal/resetRollupResultCache" {
if len(*resetCacheAuthKey) > 0 && r.FormValue("authKey") != *resetCacheAuthKey {
@@ -147,7 +178,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
graphiteTagValuesRequests.Inc()
if err := graphite.TagValuesHandler(startTime, tagName, w, r); err != nil {
graphiteTagValuesErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -232,7 +263,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
exportRequests.Inc()
if err := prometheus.ExportHandler(startTime, w, r); err != nil {
exportErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -240,7 +271,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
exportCSVRequests.Inc()
if err := prometheus.ExportCSVHandler(startTime, w, r); err != nil {
exportCSVErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -248,7 +279,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
exportNativeRequests.Inc()
if err := prometheus.ExportNativeHandler(startTime, w, r); err != nil {
exportNativeErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -256,7 +287,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
federateRequests.Inc()
if err := prometheus.FederateHandler(startTime, w, r); err != nil {
federateErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -265,7 +296,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.EnableCORS(w, r)
if err := graphite.MetricsFindHandler(startTime, w, r); err != nil {
graphiteMetricsFindErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -274,7 +305,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.EnableCORS(w, r)
if err := graphite.MetricsExpandHandler(startTime, w, r); err != nil {
graphiteMetricsExpandErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -283,7 +314,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.EnableCORS(w, r)
if err := graphite.MetricsIndexHandler(startTime, w, r); err != nil {
graphiteMetricsIndexErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -291,7 +322,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
graphiteTagsTagSeriesRequests.Inc()
if err := graphite.TagsTagSeriesHandler(startTime, w, r); err != nil {
graphiteTagsTagSeriesErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -299,7 +330,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
graphiteTagsTagMultiSeriesRequests.Inc()
if err := graphite.TagsTagMultiSeriesHandler(startTime, w, r); err != nil {
graphiteTagsTagMultiSeriesErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -307,7 +338,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
graphiteTagsRequests.Inc()
if err := graphite.TagsHandler(startTime, w, r); err != nil {
graphiteTagsErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -315,7 +346,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
graphiteTagsFindSeriesRequests.Inc()
if err := graphite.TagsFindSeriesHandler(startTime, w, r); err != nil {
graphiteTagsFindSeriesErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -324,7 +355,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.EnableCORS(w, r)
if err := graphite.TagsAutoCompleteTagsHandler(startTime, w, r); err != nil {
graphiteTagsAutoCompleteTagsErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -333,7 +364,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
httpserver.EnableCORS(w, r)
if err := graphite.TagsAutoCompleteValuesHandler(startTime, w, r); err != nil {
graphiteTagsAutoCompleteValuesErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -346,7 +377,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
if err := graphite.TagsDelSeriesHandler(startTime, w, r); err != nil {
graphiteTagsDelSeriesErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
@@ -383,7 +414,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
if err := prometheus.DeleteHandler(startTime, r); err != nil {
deleteErrors.Inc()
httpserver.Errorf(w, r, "error in %q: %s", r.URL.Path, err)
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(http.StatusNoContent)
@@ -406,7 +437,7 @@ func isGraphiteTagsPath(path string) bool {
}
func sendPrometheusError(w http.ResponseWriter, r *http.Request, err error) {
logger.Warnf("error in %q: %s", r.RequestURI, err)
logger.Warnf("error in %q: %s", httpserver.GetRequestURI(r), err)
w.Header().Set("Content-Type", "application/json; charset=utf-8")
statusCode := http.StatusUnprocessableEntity
@@ -419,6 +450,8 @@ func sendPrometheusError(w http.ResponseWriter, r *http.Request, err error) {
}
var (
requestDuration = metrics.NewHistogram(`vmselect_request_duration_seconds`)
labelValuesRequests = metrics.NewCounter(`vm_http_requests_total{path="/api/v1/label/{}/values"}`)
labelValuesErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/api/v1/label/{}/values"}`)

View File

@@ -695,6 +695,29 @@ func GetTSDBStatusForDate(deadline searchutils.Deadline, date uint64, topN int)
return status, nil
}
// GetTSDBStatusWithFilters returns tsdb status according to https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats
//
// It accepts aribtrary filters on time series in sq.
func GetTSDBStatusWithFilters(deadline searchutils.Deadline, sq *storage.SearchQuery, topN int) (*storage.TSDBStatus, error) {
if deadline.Exceeded() {
return nil, fmt.Errorf("timeout exceeded before starting the query processing: %s", deadline.String())
}
tr := storage.TimeRange{
MinTimestamp: sq.MinTimestamp,
MaxTimestamp: sq.MaxTimestamp,
}
tfss, err := setupTfss(tr, sq.TagFilterss, deadline)
if err != nil {
return nil, err
}
date := uint64(tr.MinTimestamp) / (3600 * 24 * 1000)
status, err := vmstorage.GetTSDBStatusWithFiltersForDate(tfss, date, topN, deadline.Deadline())
if err != nil {
return nil, fmt.Errorf("error during tsdb status with filters request: %w", err)
}
return status, nil
}
// GetSeriesCount returns the number of unique series.
func GetSeriesCount(deadline searchutils.Deadline) (uint64, error) {
if deadline.Exceeded() {

View File

@@ -25,7 +25,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/metricsql"
"github.com/valyala/fastjson/fastfloat"
"github.com/valyala/quicktemplate"
)
@@ -34,7 +33,7 @@ var (
latencyOffset = flag.Duration("search.latencyOffset", time.Second*30, "The time when data points become visible in query results after the collection. "+
"Too small value can result in incomplete last points for query results")
maxQueryLen = flagutil.NewBytes("search.maxQueryLen", 16*1024, "The maximum search query length in bytes")
maxLookback = flag.Duration("search.maxLookback", 0, "Synonim to -search.lookback-delta from Prometheus. "+
maxLookback = flag.Duration("search.maxLookback", 0, "Synonym to -search.lookback-delta from Prometheus. "+
"The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. "+
"See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons")
maxStalenessInterval = flag.Duration("search.maxStalenessInterval", 0, "The maximum interval for staleness calculations. "+
@@ -633,11 +632,19 @@ const secsPerDay = 3600 * 24
// TSDBStatusHandler processes /api/v1/status/tsdb request.
//
// See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats
//
// It can accept `match[]` filters in order to narrow down the search.
func TSDBStatusHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) error {
deadline := searchutils.GetDeadlineForStatusRequest(r, startTime)
if err := r.ParseForm(); err != nil {
return fmt.Errorf("cannot parse form values: %w", err)
}
etf, err := searchutils.GetEnforcedTagFiltersFromRequest(r)
if err != nil {
return err
}
matches := getMatchesFromRequest(r)
date := fasttime.UnixDate()
dateStr := r.FormValue("date")
if len(dateStr) > 0 {
@@ -662,9 +669,17 @@ func TSDBStatusHandler(startTime time.Time, w http.ResponseWriter, r *http.Reque
}
topN = n
}
status, err := netstorage.GetTSDBStatusForDate(deadline, date, topN)
if err != nil {
return fmt.Errorf(`cannot obtain tsdb status for date=%d, topN=%d: %w`, date, topN, err)
var status *storage.TSDBStatus
if len(matches) == 0 && len(etf) == 0 {
status, err = netstorage.GetTSDBStatusForDate(deadline, date, topN)
if err != nil {
return fmt.Errorf(`cannot obtain tsdb status for date=%d, topN=%d: %w`, date, topN, err)
}
} else {
status, err = tsdbStatusWithMatches(matches, etf, date, topN, deadline)
if err != nil {
return fmt.Errorf("cannot obtain tsdb status with matches for date=%d, topN=%d: %w", date, topN, err)
}
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
bw := bufferedwriter.Get(w)
@@ -677,6 +692,25 @@ func TSDBStatusHandler(startTime time.Time, w http.ResponseWriter, r *http.Reque
return nil
}
func tsdbStatusWithMatches(matches []string, etf []storage.TagFilter, date uint64, topN int, deadline searchutils.Deadline) (*storage.TSDBStatus, error) {
tagFilterss, err := getTagFilterssFromMatches(matches)
if err != nil {
return nil, err
}
tagFilterss = addEnforcedFiltersToTagFilterss(tagFilterss, etf)
if len(tagFilterss) == 0 {
logger.Panicf("BUG: tagFilterss must be non-empty")
}
start := int64(date*secsPerDay) * 1000
end := int64(date*secsPerDay+secsPerDay) * 1000
sq := storage.NewSearchQuery(start, end, tagFilterss)
status, err := netstorage.GetTSDBStatusWithFilters(deadline, sq, topN)
if err != nil {
return nil, err
}
return status, nil
}
var tsdbStatusDuration = metrics.NewSummary(`vm_request_duration_seconds{path="/api/v1/status/tsdb"}`)
// LabelsHandler processes /api/v1/labels request.
@@ -956,15 +990,9 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
if err != nil {
return err
}
if childQuery, windowStr, offsetStr := promql.IsMetricSelectorWithRollup(query); childQuery != "" {
window, err := parsePositiveDuration(windowStr, step)
if err != nil {
return fmt.Errorf("cannot parse window: %w", err)
}
offset, err := parseDuration(offsetStr, step)
if err != nil {
return fmt.Errorf("cannot parse offset: %w", err)
}
if childQuery, windowExpr, offsetExpr := promql.IsMetricSelectorWithRollup(query); childQuery != "" {
window := windowExpr.Duration(step)
offset := offsetExpr.Duration(step)
start -= offset
end := start
start = end - window
@@ -979,22 +1007,13 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
queryDuration.UpdateDuration(startTime)
return nil
}
if childQuery, windowStr, stepStr, offsetStr := promql.IsRollup(query); childQuery != "" {
newStep, err := parsePositiveDuration(stepStr, step)
if err != nil {
return fmt.Errorf("cannot parse step: %w", err)
}
if childQuery, windowExpr, stepExpr, offsetExpr := promql.IsRollup(query); childQuery != "" {
newStep := stepExpr.Duration(step)
if newStep > 0 {
step = newStep
}
window, err := parsePositiveDuration(windowStr, step)
if err != nil {
return fmt.Errorf("cannot parse window: %w", err)
}
offset, err := parseDuration(offsetStr, step)
if err != nil {
return fmt.Errorf("cannot parse offset: %w", err)
}
window := windowExpr.Duration(step)
offset := offsetExpr.Duration(step)
start -= offset
end := start
start = end - window
@@ -1051,20 +1070,6 @@ func QueryHandler(startTime time.Time, w http.ResponseWriter, r *http.Request) e
var queryDuration = metrics.NewSummary(`vm_request_duration_seconds{path="/api/v1/query"}`)
func parseDuration(s string, step int64) (int64, error) {
if len(s) == 0 {
return 0, nil
}
return metricsql.DurationValue(s, step)
}
func parsePositiveDuration(s string, step int64) (int64, error) {
if len(s) == 0 {
return 0, nil
}
return metricsql.PositiveDurationValue(s, step)
}
// QueryRangeHandler processes /api/v1/query_range request.
//
// See https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries

View File

@@ -336,6 +336,12 @@ func evalExpr(ec *EvalConfig, e metricsql.Expr) ([]*timeseries, error) {
rv := evalString(ec, se.S)
return rv, nil
}
if de, ok := e.(*metricsql.DurationExpr); ok {
d := de.Duration(ec.Step)
dSec := float64(d) / 1000
rv := evalNumber(ec, dSec)
return rv, nil
}
return nil, fmt.Errorf("unexpected expression %q", e.AppendString(nil))
}
@@ -473,12 +479,8 @@ func getRollupExprArg(arg metricsql.Expr) *metricsql.RollupExpr {
func evalRollupFunc(ec *EvalConfig, name string, rf rollupFunc, expr metricsql.Expr, re *metricsql.RollupExpr, iafc *incrementalAggrFuncContext) ([]*timeseries, error) {
ecNew := ec
var offset int64
if len(re.Offset) > 0 {
var err error
offset, err = metricsql.DurationValue(re.Offset, ec.Step)
if err != nil {
return nil, err
}
if re.Offset != nil {
offset = re.Offset.Duration(ec.Step)
ecNew = newEvalConfig(ecNew)
ecNew.Start -= offset
ecNew.End -= offset
@@ -526,24 +528,11 @@ func evalRollupFunc(ec *EvalConfig, name string, rf rollupFunc, expr metricsql.E
func evalRollupFuncWithSubquery(ec *EvalConfig, name string, rf rollupFunc, expr metricsql.Expr, re *metricsql.RollupExpr) ([]*timeseries, error) {
// TODO: determine whether to use rollupResultCacheV here.
var step int64
if len(re.Step) > 0 {
var err error
step, err = metricsql.PositiveDurationValue(re.Step, ec.Step)
if err != nil {
return nil, err
}
} else {
step := re.Step.Duration(ec.Step)
if step == 0 {
step = ec.Step
}
var window int64
if len(re.Window) > 0 {
var err error
window, err = metricsql.PositiveDurationValue(re.Window, ec.Step)
if err != nil {
return nil, err
}
}
window := re.Window.Duration(ec.Step)
ecSQ := newEvalConfig(ec)
ecSQ.Start -= window + maxSilenceInterval + step
@@ -652,18 +641,11 @@ var (
)
func evalRollupFuncWithMetricExpr(ec *EvalConfig, name string, rf rollupFunc,
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, windowStr string) ([]*timeseries, error) {
expr metricsql.Expr, me *metricsql.MetricExpr, iafc *incrementalAggrFuncContext, windowExpr *metricsql.DurationExpr) ([]*timeseries, error) {
if me.IsEmpty() {
return evalNumber(ec, nan), nil
}
var window int64
if len(windowStr) > 0 {
var err error
window, err = metricsql.PositiveDurationValue(windowStr, ec.Step)
if err != nil {
return nil, err
}
}
window := windowExpr.Duration(ec.Step)
// Search for partial results in cache.
tssCached, start := rollupResultCacheV.Get(ec, expr, window)

View File

@@ -13,34 +13,19 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/querystats"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/metricsql"
)
var (
logSlowQueryDuration = flag.Duration("search.logSlowQueryDuration", 5*time.Second, "Log queries with execution time exceeding this value. Zero disables slow query logging")
treatDotsAsIsInRegexps = flag.Bool("search.treatDotsAsIsInRegexps", false, "Whether to treat dots as is in regexp label filters used in queries. "+
`For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped `+
`in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. `+
`This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter`)
)
var slowQueries = metrics.NewCounter(`vm_slow_queries_total`)
// Exec executes q for the given ec.
func Exec(ec *EvalConfig, q string, isFirstPointOnly bool) ([]netstorage.Result, error) {
if *logSlowQueryDuration > 0 {
startTime := time.Now()
defer func() {
d := time.Since(startTime)
if d >= *logSlowQueryDuration {
logger.Warnf("slow query according to -search.logSlowQueryDuration=%s: remoteAddr=%s, duration=%.3f seconds, start=%d, end=%d, step=%d, query=%q",
*logSlowQueryDuration, ec.QuotedRemoteAddr, d.Seconds(), ec.Start/1000, ec.End/1000, ec.Step/1000, q)
slowQueries.Inc()
}
}()
}
if querystats.Enabled() {
startTime := time.Now()
defer querystats.RegisterQuery(q, ec.End-ec.Start, startTime)

View File

@@ -132,6 +132,49 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("timezone_offset(UTC)", func(t *testing.T) {
t.Parallel()
q := `timezone_offset("UTC")`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{0, 0, 0, 0, 0, 0},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("timezone_offset(America/New_York)", func(t *testing.T) {
t.Parallel()
q := `timezone_offset("America/New_York")`
offset, err := getTimezoneOffset("America/New_York")
if err != nil {
t.Fatalf("cannot obtain timezone: %s", err)
}
off := float64(offset)
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{off, off, off, off, off, off},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("timezone_offset(Local)", func(t *testing.T) {
t.Parallel()
q := `timezone_offset("Local")`
offset, err := getTimezoneOffset("Local")
if err != nil {
t.Fatalf("cannot obtain timezone: %s", err)
}
off := float64(offset)
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{off, off, off, off, off, off},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time()", func(t *testing.T) {
t.Parallel()
q := `time()`
@@ -189,6 +232,17 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time()[:100] offset 0", func(t *testing.T) {
t.Parallel()
q := `time()[:100] offset 0`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{1000, 1200, 1400, 1600, 1800, 2000},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time() offset 1h40s0ms", func(t *testing.T) {
t.Parallel()
q := `time() offset 1h40s0ms`
@@ -200,6 +254,17 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time() offset 3640", func(t *testing.T) {
t.Parallel()
q := `time() offset 3640`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{-2800, -2600, -2400, -2200, -2000, -1800},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time() offset -1h40s0ms", func(t *testing.T) {
t.Parallel()
q := `time() offset -1h40s0ms`
@@ -318,6 +383,28 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r1, r2}
f(q, resultExpected)
})
t.Run("1h", func(t *testing.T) {
t.Parallel()
q := `1h`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{3600, 3600, 3600, 3600, 3600, 3600},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("sum_over_time(time()[1h]) / 1h", func(t *testing.T) {
t.Parallel()
q := `sum_over_time(time()[1h]) / 1h`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{-3.5, -2.5, -1.5, -0.5, 0.5, 1.5},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time()[:100s] offset 100s", func(t *testing.T) {
t.Parallel()
q := `time()[:100s] offset 100s`
@@ -340,6 +427,17 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time()[300:100] offset 100", func(t *testing.T) {
t.Parallel()
q := `time()[300:100] offset 100`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{900, 1100, 1300, 1500, 1700, 1900},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run("time()[1.5i:0.5i] offset 0.5i", func(t *testing.T) {
t.Parallel()
q := `time()[1.5i:0.5i] offset 0.5i`

View File

@@ -10,13 +10,13 @@ import (
// IsRollup verifies whether s is a rollup with non-empty window.
//
// It returns the wrapped query with the corresponding window, step and offset.
func IsRollup(s string) (childQuery string, window, step, offset string) {
func IsRollup(s string) (childQuery string, window, step, offset *metricsql.DurationExpr) {
expr, err := parsePromQLWithCache(s)
if err != nil {
return
}
re, ok := expr.(*metricsql.RollupExpr)
if !ok || len(re.Window) == 0 {
if !ok || re.Window == nil {
return
}
wrappedQuery := re.Expr.AppendString(nil)
@@ -27,13 +27,13 @@ func IsRollup(s string) (childQuery string, window, step, offset string) {
// wrapped into rollup.
//
// It returns the wrapped query with the corresponding window with offset.
func IsMetricSelectorWithRollup(s string) (childQuery string, window, offset string) {
func IsMetricSelectorWithRollup(s string) (childQuery string, window, offset *metricsql.DurationExpr) {
expr, err := parsePromQLWithCache(s)
if err != nil {
return
}
re, ok := expr.(*metricsql.RollupExpr)
if !ok || len(re.Window) == 0 || len(re.Step) > 0 {
if !ok || re.Window == nil || re.Step != nil {
return
}
me, ok := re.Expr.(*metricsql.MetricExpr)

View File

@@ -517,7 +517,7 @@ func (rc *rollupConfig) doInternal(dstValues []float64, tsm *timeseriesMap, valu
if window <= 0 {
window = rc.Step
if rc.CanDropLastSample && rc.LookbackDelta > 0 && window > rc.LookbackDelta {
// Implicitly window exceeds -search.maxStalenessInterval, so limit it to -search.maxStalenessInterval
// Implicit window exceeds -search.maxStalenessInterval, so limit it to -search.maxStalenessInterval
// according to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/784
window = rc.LookbackDelta
}
@@ -1149,7 +1149,8 @@ func rollupTmin(rfa *rollupFuncArg) float64 {
minValue := values[0]
minTimestamp := timestamps[0]
for i, v := range values {
if v < minValue {
// Get the last timestamp for the minimum value as most users expect.
if v <= minValue {
minValue = v
minTimestamp = timestamps[i]
}
@@ -1168,7 +1169,8 @@ func rollupTmax(rfa *rollupFuncArg) float64 {
maxValue := values[0]
maxTimestamp := timestamps[0]
for i, v := range values {
if v > maxValue {
// Get the last timestamp for the maximum value as most users expect.
if v >= maxValue {
maxValue = v
maxTimestamp = timestamps[i]
}
@@ -1332,7 +1334,8 @@ func rollupIncreasePure(rfa *rollupFuncArg) float64 {
// There is no need in handling NaNs here, since they must be cleaned up
// before calling rollup funcs.
values := rfa.values
prevValue := rfa.prevValue
// restore to the real value because of potential staleness reset
prevValue := rfa.realPrevValue
if math.IsNaN(prevValue) {
if len(values) == 0 {
return nan

View File

@@ -87,7 +87,7 @@ var transformFuncs = map[string]transformFunc{
"label_match": transformLabelMatch,
"label_mismatch": transformLabelMismatch,
"union": transformUnion,
"": transformUnion, // empty func is a synonim to union
"": transformUnion, // empty func is a synonym to union
"keep_last_value": transformKeepLastValue,
"keep_next_value": transformKeepNextValue,
"interpolate": transformInterpolate,
@@ -123,6 +123,7 @@ var transformFuncs = map[string]transformFunc{
"histogram_stddev": transformHistogramStddev,
"sort_by_label": newTransformFuncSortByLabel(false),
"sort_by_label_desc": newTransformFuncSortByLabel(true),
"timezone_offset": transformTimezoneOffset,
}
func getTransformFunc(s string) transformFunc {
@@ -1906,6 +1907,32 @@ func transformPi(tfa *transformFuncArg) ([]*timeseries, error) {
return evalNumber(tfa.ec, math.Pi), nil
}
func transformTimezoneOffset(tfa *transformFuncArg) ([]*timeseries, error) {
args := tfa.args
if err := expectTransformArgsNum(args, 1); err != nil {
return nil, err
}
tzString, err := getString(args[0], 0)
if err != nil {
return nil, fmt.Errorf("cannot get timezone name: %w", err)
}
tzOffset, err := getTimezoneOffset(tzString)
if err != nil {
return nil, fmt.Errorf("cannot get timezone offset for %q: %w", tzString, err)
}
rv := evalNumber(tfa.ec, float64(tzOffset))
return rv, nil
}
func getTimezoneOffset(tzString string) (int, error) {
loc, err := time.LoadLocation(tzString)
if err != nil {
return 0, fmt.Errorf("cannot load timezone %q: %w", tzString, err)
}
_, tzOffset := time.Now().In(loc).Zone()
return tzOffset, nil
}
func transformTime(tfa *transformFuncArg) ([]*timeseries, error) {
if err := expectTransformArgsNum(tfa.args, 0); err != nil {
return nil, err

View File

@@ -0,0 +1,9 @@
// +build go1.15
package promql
import (
// This is needed for embedding tzdata into binary, so `timezone_offset` could work in an app running on a scratch base Docker image.
// The "time/tzdata" package has been appeared starting from Go1.15 - see https://golang.org/doc/go1.15#time/tzdata
_ "time/tzdata"
)

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

View File

@@ -0,0 +1,17 @@
{
"files": {
"main.css": "./static/css/main.0ba440d3.chunk.css",
"main.js": "./static/js/main.ffd27a2f.chunk.js",
"runtime-main.js": "./static/js/runtime-main.50ad8b45.js",
"static/js/2.3cdac8ea.chunk.js": "./static/js/2.3cdac8ea.chunk.js",
"static/js/3.d52da3ae.chunk.js": "./static/js/3.d52da3ae.chunk.js",
"index.html": "./index.html",
"static/js/2.3cdac8ea.chunk.js.LICENSE.txt": "./static/js/2.3cdac8ea.chunk.js.LICENSE.txt"
},
"entrypoints": [
"static/js/runtime-main.50ad8b45.js",
"static/js/2.3cdac8ea.chunk.js",
"static/css/main.0ba440d3.chunk.css",
"static/js/main.ffd27a2f.chunk.js"
]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

View File

@@ -0,0 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="theme-color" content="#000000"/><meta name="description" content="VM-UI is a metric explorer for Victoria Metrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700&display=swap"/><link href="./static/css/main.0ba440d3.chunk.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div><script>!function(e){function r(r){for(var n,i,a=r[0],c=r[1],l=r[2],s=0,p=[];s<a.length;s++)i=a[s],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&p.push(o[i][0]),o[i]=0;for(n in c)Object.prototype.hasOwnProperty.call(c,n)&&(e[n]=c[n]);for(f&&f(r);p.length;)p.shift()();return u.push.apply(u,l||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,a=1;a<t.length;a++){var c=t[a];0!==o[c]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.e=function(e){var r=[],t=o[e];if(0!==t)if(t)r.push(t[2]);else{var n=new Promise((function(r,n){t=o[e]=[r,n]}));r.push(t[2]=n);var u,a=document.createElement("script");a.charset="utf-8",a.timeout=120,i.nc&&a.setAttribute("nonce",i.nc),a.src=function(e){return i.p+"static/js/"+({}[e]||e)+"."+{3:"d52da3ae"}[e]+".chunk.js"}(e);var c=new Error;u=function(r){a.onerror=a.onload=null,clearTimeout(l);var t=o[e];if(0!==t){if(t){var n=r&&("load"===r.type?"missing":r.type),u=r&&r.target&&r.target.src;c.message="Loading chunk "+e+" failed.\n("+n+": "+u+")",c.name="ChunkLoadError",c.type=n,c.request=u,t[1](c)}o[e]=void 0}};var l=setTimeout((function(){u({type:"timeout",target:a})}),12e4);a.onerror=a.onload=u,document.head.appendChild(a)}return Promise.all(r)},i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"==typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./",i.oe=function(e){throw console.error(e),e};var a=this.webpackJsonpvmui=this.webpackJsonpvmui||[],c=a.push.bind(a);a.push=r,a=a.slice();for(var l=0;l<a.length;l++)r(a[l]);var f=c;t()}([])</script><script src="./static/js/2.3cdac8ea.chunk.js"></script><script src="./static/js/main.ffd27a2f.chunk.js"></script></body></html>

View File

@@ -0,0 +1,20 @@
{
"short_name": "Victoria Metrics UI",
"name": "Victoria Metrics UI is a metric explorer for Victoria Metrics",
"icons": [
{
"src": "favicon-32x32.png",
"sizes": "32x32",
"type": "image/png"
},
{
"src": "apple-touch-icon.png",
"type": "image/png",
"sizes": "192x192"
}
],
"start_url": ".",
"display": "standalone",
"theme_color": "#000000",
"background_color": "#ffffff"
}

View File

@@ -0,0 +1,3 @@
# https://www.robotstxt.org/robotstxt.html
User-agent: *
Disallow:

View File

@@ -0,0 +1 @@
body{font-family:-apple-system,BlinkMacSystemFont,"Segoe UI","Roboto","Oxygen","Ubuntu","Cantarell","Fira Sans","Droid Sans","Helvetica Neue",sans-serif;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}code{font-family:source-code-pro,Menlo,Monaco,Consolas,"Courier New",monospace}.MuiAccordionSummary-content{margin:10px 0!important}.cm-activeLine{background-color:inherit!important}.cm-wrap{border-radius:4px;border:1px solid #b9b9b9;font-size:10px}.one-line-scroll .cm-wrap{height:24px}.line{fill:none;stroke-width:2}.overlay{fill:none;pointer-events:all}.dot{fill:#621773;stroke:#fff}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,98 @@
/*
object-assign
(c) Sindre Sorhus
@license MIT
*/
/*! *****************************************************************************
Copyright (c) Microsoft Corporation.
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
***************************************************************************** */
/**
* A better abstraction over CSS.
*
* @copyright Oleg Isonen (Slobodskoi) / Isonen 2014-present
* @website https://github.com/cssinjs/jss
* @license MIT
*/
/** @license React v0.20.1
* scheduler.production.min.js
*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
/** @license React v16.13.1
* react-is.production.min.js
*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
/** @license React v17.0.1
* react-dom.production.min.js
*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
/** @license React v17.0.1
* react-jsx-runtime.production.min.js
*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
/** @license React v17.0.1
* react.production.min.js
*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
/**!
* @fileOverview Kickass library to create and place poppers near their reference elements.
* @version 1.16.1-lts
* @license
* Copyright (c) 2016 Federico Zivolo and contributors
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

View File

@@ -0,0 +1 @@
(this.webpackJsonpvmui=this.webpackJsonpvmui||[]).push([[3],{430:function(t,n,e){"use strict";e.r(n),e.d(n,"getCLS",(function(){return l})),e.d(n,"getFCP",(function(){return g})),e.d(n,"getFID",(function(){return h})),e.d(n,"getLCP",(function(){return y})),e.d(n,"getTTFB",(function(){return F}));var i,a,r=function(){return"".concat(Date.now(),"-").concat(Math.floor(8999999999999*Math.random())+1e12)},o=function(t){var n=arguments.length>1&&void 0!==arguments[1]?arguments[1]:-1;return{name:t,value:n,delta:0,entries:[],id:r(),isFinal:!1}},u=function(t,n){try{if(PerformanceObserver.supportedEntryTypes.includes(t)){var e=new PerformanceObserver((function(t){return t.getEntries().map(n)}));return e.observe({type:t,buffered:!0}),e}}catch(t){}},s=!1,c=!1,d=function(t){s=!t.persisted},f=function(){addEventListener("pagehide",d),addEventListener("beforeunload",(function(){}))},p=function(t){var n=arguments.length>1&&void 0!==arguments[1]&&arguments[1];c||(f(),c=!0),addEventListener("visibilitychange",(function(n){var e=n.timeStamp;"hidden"===document.visibilityState&&t({timeStamp:e,isUnloading:s})}),{capture:!0,once:n})},v=function(t,n,e,i){var a;return function(){e&&n.isFinal&&e.disconnect(),n.value>=0&&(i||n.isFinal||"hidden"===document.visibilityState)&&(n.delta=n.value-(a||0),(n.delta||n.isFinal||void 0===a)&&(t(n),a=n.value))}},l=function(t){var n,e=arguments.length>1&&void 0!==arguments[1]&&arguments[1],i=o("CLS",0),a=function(t){t.hadRecentInput||(i.value+=t.value,i.entries.push(t),n())},r=u("layout-shift",a);r&&(n=v(t,i,r,e),p((function(t){var e=t.isUnloading;r.takeRecords().map(a),e&&(i.isFinal=!0),n()})))},m=function(){return void 0===i&&(i="hidden"===document.visibilityState?0:1/0,p((function(t){var n=t.timeStamp;return i=n}),!0)),{get timeStamp(){return i}}},g=function(t){var n,e=o("FCP"),i=m(),a=u("paint",(function(t){"first-contentful-paint"===t.name&&t.startTime<i.timeStamp&&(e.value=t.startTime,e.isFinal=!0,e.entries.push(t),n())}));a&&(n=v(t,e,a))},h=function(t){var n=o("FID"),e=m(),i=function(t){t.startTime<e.timeStamp&&(n.value=t.processingStart-t.startTime,n.entries.push(t),n.isFinal=!0,r())},a=u("first-input",i),r=v(t,n,a);a?p((function(){a.takeRecords().map(i),a.disconnect()}),!0):window.perfMetrics&&window.perfMetrics.onFirstInputDelay&&window.perfMetrics.onFirstInputDelay((function(t,i){i.timeStamp<e.timeStamp&&(n.value=t,n.isFinal=!0,n.entries=[{entryType:"first-input",name:i.type,target:i.target,cancelable:i.cancelable,startTime:i.timeStamp,processingStart:i.timeStamp+t}],r())}))},S=function(){return a||(a=new Promise((function(t){return["scroll","keydown","pointerdown"].map((function(n){addEventListener(n,t,{once:!0,passive:!0,capture:!0})}))}))),a},y=function(t){var n,e=arguments.length>1&&void 0!==arguments[1]&&arguments[1],i=o("LCP"),a=m(),r=function(t){var e=t.startTime;e<a.timeStamp?(i.value=e,i.entries.push(t)):i.isFinal=!0,n()},s=u("largest-contentful-paint",r);if(s){n=v(t,i,s,e);var c=function(){i.isFinal||(s.takeRecords().map(r),i.isFinal=!0,n())};S().then(c),p(c,!0)}},F=function(t){var n,e=o("TTFB");n=function(){try{var n=performance.getEntriesByType("navigation")[0]||function(){var t=performance.timing,n={entryType:"navigation",startTime:0};for(var e in t)"navigationStart"!==e&&"toJSON"!==e&&(n[e]=Math.max(t[e]-t.navigationStart,0));return n}();e.value=e.delta=n.responseStart,e.entries=[n],e.isFinal=!0,t(e)}catch(t){}},"complete"===document.readyState?setTimeout(n,0):addEventListener("pageshow",n)}}}]);

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1 @@
!function(e){function r(r){for(var n,i,a=r[0],c=r[1],l=r[2],s=0,p=[];s<a.length;s++)i=a[s],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&p.push(o[i][0]),o[i]=0;for(n in c)Object.prototype.hasOwnProperty.call(c,n)&&(e[n]=c[n]);for(f&&f(r);p.length;)p.shift()();return u.push.apply(u,l||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,a=1;a<t.length;a++){var c=t[a];0!==o[c]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.e=function(e){var r=[],t=o[e];if(0!==t)if(t)r.push(t[2]);else{var n=new Promise((function(r,n){t=o[e]=[r,n]}));r.push(t[2]=n);var u,a=document.createElement("script");a.charset="utf-8",a.timeout=120,i.nc&&a.setAttribute("nonce",i.nc),a.src=function(e){return i.p+"static/js/"+({}[e]||e)+"."+{3:"d52da3ae"}[e]+".chunk.js"}(e);var c=new Error;u=function(r){a.onerror=a.onload=null,clearTimeout(l);var t=o[e];if(0!==t){if(t){var n=r&&("load"===r.type?"missing":r.type),u=r&&r.target&&r.target.src;c.message="Loading chunk "+e+" failed.\n("+n+": "+u+")",c.name="ChunkLoadError",c.type=n,c.request=u,t[1](c)}o[e]=void 0}};var l=setTimeout((function(){u({type:"timeout",target:a})}),12e4);a.onerror=a.onload=u,document.head.appendChild(a)}return Promise.all(r)},i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!==typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"===typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./",i.oe=function(e){throw console.error(e),e};var a=this.webpackJsonpvmui=this.webpackJsonpvmui||[],c=a.push.bind(a);a.push=r,a=a.slice();for(var l=0;l<a.length;l++)r(a[l]);var f=c;t()}([]);

View File

@@ -41,6 +41,10 @@ var (
denyQueriesOutsideRetention = flag.Bool("denyQueriesOutsideRetention", false, "Whether to deny queries outside of the configured -retentionPeriod. "+
"When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. "+
"This may be useful when multiple data sources with distinct retentions are hidden behind query-tee")
maxHourlySeries = flag.Int("storage.maxHourlySeries", 0, "The maximum number of unique series can be added to the storage during the last hour. "+
"Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -storage.maxDailySeries")
maxDailySeries = flag.Int("storage.maxDailySeries", 0, "The maximum number of unique series can be added to the storage during the last 24 hours. "+
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -storage.maxHourlySeries")
)
// CheckTimeRange returns true if the given tr is denied for querying.
@@ -81,7 +85,7 @@ func InitWithoutMetrics(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
logger.Infof("opening storage at %q with -retentionPeriod=%s", *DataPath, retentionPeriod)
startTime := time.Now()
WG = syncwg.WaitGroup{}
strg, err := storage.OpenStorage(*DataPath, retentionPeriod.Msecs)
strg, err := storage.OpenStorage(*DataPath, retentionPeriod.Msecs, *maxHourlySeries, *maxDailySeries)
if err != nil {
logger.Fatalf("cannot open a storage at %s with -retentionPeriod=%s: %s", *DataPath, retentionPeriod, err)
}
@@ -208,7 +212,15 @@ func SearchTagEntries(maxTagKeys, maxTagValues int, deadline uint64) ([]storage.
// GetTSDBStatusForDate returns TSDB status for the given date.
func GetTSDBStatusForDate(date uint64, topN int, deadline uint64) (*storage.TSDBStatus, error) {
WG.Add(1)
status, err := Storage.GetTSDBStatusForDate(date, topN, deadline)
status, err := Storage.GetTSDBStatusWithFiltersForDate(nil, date, topN, deadline)
WG.Done()
return status, err
}
// GetTSDBStatusWithFiltersForDate returns TSDB status for given filters on the given date.
func GetTSDBStatusWithFiltersForDate(tfss []*storage.TagFilters, date uint64, topN int, deadline uint64) (*storage.TSDBStatus, error) {
WG.Add(1)
status, err := Storage.GetTSDBStatusWithFiltersForDate(tfss, date, topN, deadline)
WG.Done()
return status, err
}
@@ -567,6 +579,13 @@ func registerStorageMetrics() {
return float64(m().SlowMetricNameLoads)
})
metrics.NewGauge(`vm_hourly_series_limit_rows_dropped_total`, func() float64 {
return float64(m().HourlySeriesLimitRowsDropped)
})
metrics.NewGauge(`vm_daily_series_limit_rows_dropped_total`, func() float64 {
return float64(m().DailySeriesLimitRowsDropped)
})
metrics.NewGauge(`vm_timestamps_blocks_merged_total`, func() float64 {
return float64(m().TimestampsBlocksMerged)
})
@@ -633,7 +652,7 @@ func registerStorageMetrics() {
return float64(idbm().IndexBlocksCacheSize)
})
metrics.NewGauge(`vm_cache_entries{type="indexdb/tagFilters"}`, func() float64 {
return float64(idbm().TagCacheSize)
return float64(idbm().TagFiltersCacheSize)
})
metrics.NewGauge(`vm_cache_entries{type="indexdb/uselessTagFilters"}`, func() float64 {
return float64(idbm().UselessTagFiltersCacheSize)
@@ -676,7 +695,7 @@ func registerStorageMetrics() {
return float64(m().NextDayMetricIDCacheSizeBytes)
})
metrics.NewGauge(`vm_cache_size_bytes{type="indexdb/tagFilters"}`, func() float64 {
return float64(idbm().TagCacheSizeBytes)
return float64(idbm().TagFiltersCacheSizeBytes)
})
metrics.NewGauge(`vm_cache_size_bytes{type="indexdb/uselessTagFilters"}`, func() float64 {
return float64(idbm().UselessTagFiltersCacheSizeBytes)
@@ -707,7 +726,7 @@ func registerStorageMetrics() {
return float64(idbm().IndexBlocksCacheRequests)
})
metrics.NewGauge(`vm_cache_requests_total{type="indexdb/tagFilters"}`, func() float64 {
return float64(idbm().TagCacheRequests)
return float64(idbm().TagFiltersCacheRequests)
})
metrics.NewGauge(`vm_cache_requests_total{type="indexdb/uselessTagFilters"}`, func() float64 {
return float64(idbm().UselessTagFiltersCacheRequests)
@@ -738,7 +757,7 @@ func registerStorageMetrics() {
return float64(idbm().IndexBlocksCacheMisses)
})
metrics.NewGauge(`vm_cache_misses_total{type="indexdb/tagFilters"}`, func() float64 {
return float64(idbm().TagCacheMisses)
return float64(idbm().TagFiltersCacheMisses)
})
metrics.NewGauge(`vm_cache_misses_total{type="indexdb/uselessTagFilters"}`, func() float64 {
return float64(idbm().UselessTagFiltersCacheMisses)

107
app/vmui/.gitignore vendored Normal file
View File

@@ -0,0 +1,107 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
# Runtime data
pids
*.pid
*.seed
*.pid.lock
# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov
# Coverage directory used by tools like istanbul
coverage
*.lcov
# nyc test coverage
.nyc_output
# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt
# Bower dependency directory (https://bower.io/)
bower_components
# node-waf configuration
.lock-wscript
# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release
# Dependency directories
node_modules/
jspm_packages/
# TypeScript v1 declaration files
typings/
# TypeScript cache
*.tsbuildinfo
# Optional npm cache directory
.npm
# Optional eslint cache
.eslintcache
# Microbundle cache
.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/
# Optional REPL history
.node_repl_history
# Output of 'npm pack'
*.tgz
# Yarn Integrity file
.yarn-integrity
# dotenv environment variables file
.env
.env.test
# parcel-bundler cache (https://parceljs.org/)
.cache
# Next.js build output
.next
# Nuxt.js build / generate output
.nuxt
dist
# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and *not* Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public
# vuepress build output
.vuepress/dist
# Serverless directories
.serverless/
# FuseBox cache
.fusebox/
# DynamoDB Local files
.dynamodb/
# TernJS port file
.tern-port
# WebStorm etc
.idea/

26
app/vmui/Makefile Normal file
View File

@@ -0,0 +1,26 @@
# All these commands must run from repository root.
vmui-package-base-image:
(docker image ls --format '{{.Repository}}:{{.Tag}}' | grep -q vmui-builder-image) \
|| docker build -t vmui-builder-image -f app/vmui/packages/vmui/Docker-build ./app/vmui
vmui-build: vmui-package-base-image
docker run --rm \
--user $(shell id -u):$(shell id -g) \
--mount type=bind,src="$(shell pwd)/app/vmui",dst=/build \
-w /build/packages/vmui \
--entrypoint=/bin/bash \
vmui-builder-image -c "npm install && npm run build"
vmui-release: vmui-build
docker build -t ${DOCKER_NAMESPACE}/vmui:latest -f app/vmui/packages/vmui/Dockerfile-web ./app/vmui/packages/vmui
docker tag ${DOCKER_NAMESPACE}/vmui:latest ${DOCKER_NAMESPACE}/vmui:${PKG_TAG}
vmui-publish-latest: vmui-release
docker push ${DOCKER_NAMESPACE}/vmui
vmui-publish-release: vmui-release
docker push ${DOCKER_NAMESPACE}/vmui:${PKG_TAG}
vmui-update: vmui-build
rm -rf app/vmselect/vmui/* && mv app/vmui/packages/vmui/build/* app/vmselect/vmui

68
app/vmui/README.md Normal file
View File

@@ -0,0 +1,68 @@
# vmui
Web UI for VictoriaMetrics
Features:
- configurable Server URL
- configurable time range - every variant have own resolution to show around 30 data points
- query editor has basic highlighting and can be multi-line
- chart is responsive by width
- color assignment for series is automatic
- legend with reduced naming
- tooltips for closest data point
- auto-refresh mode with several time interval presets
- table and raw JSON Query viewer
## Docker image build
Run the following command from the root of VictoriaMetrics repository in order to build `victoriametrics/vmui` Docker image:
```
make vmui-release
```
Then run the built image with:
```
docker run --rm --name vmui -p 8080:8080 victoriametrics/vmui
```
Then naviate to `http://localhost:8080` in order to see the web UI.
## Static build
Run the following command from the root of VictoriaMetrics repository for building `vmui` static contents:
```
make vmui-build
```
The built static contents is put into `app/vmui/packages/vmui/` directory.
## Updating vmui embedded into VictoriaMetrics
Run the following command from the root of VictoriaMetrics repository for updating `vmui` embedded into VictoriaMetrics:
```
make vmui-update
```
This command should update `vmui` static files at `app/vmselect/vmui` directory. Commit changes to these files if needed.
Then build VictoriaMetrics with the following command:
```
make victoria-metrics
```
Then run the built binary with the following command:
```
bin/victoria-metrics -selfScrapeInterval=5s
```
Then navigate to `http://localhost:8428/vmui/`

View File

@@ -0,0 +1,11 @@
# Items that don't need to be in a Docker image.
# Anything not used by the build system should go here.
Dockerfile
.dockerignore
.gitignore
README.md
# Artifacts that will be built during image creation.
# This should contain all files created during `npm run build`.
#build
node_modules

View File

@@ -0,0 +1,58 @@
module.exports = {
"env": {
"browser": true,
"es2021": true
},
"extends": [
"eslint:recommended",
"plugin:react/recommended",
"plugin:@typescript-eslint/recommended"
],
"parser": "@typescript-eslint/parser",
"parserOptions": {
"ecmaFeatures": {
"jsx": true
},
"ecmaVersion": 12,
"sourceType": "module"
},
"plugins": [
"react",
"@typescript-eslint"
],
"rules": {
"indent": [
"error",
2,
{ "SwitchCase": 1 }
],
"linebreak-style": [
"error",
"unix"
],
"quotes": [
"error",
"double"
],
"semi": [
"error",
"always"
],
"react/prop-types": 0,
"max-lines": [
"error",
{"max": 150}
]
},
"settings": {
"react": {
"pragma": "React", // Pragma to use, default to "React"
"version": "detect"
},
"linkComponents": [
// Components used as alternatives to <a> for linking, eg. <Link to={ url } />
"Hyperlink",
{"name": "Link", "linkAttribute": "to"}
]
}
};

Some files were not shown because too many files have changed in this diff Show More