update wiki pages
104
Articles.md
Normal file
@@ -0,0 +1,104 @@
|
||||
---
|
||||
sort: 16
|
||||
---
|
||||
|
||||
# Articles
|
||||
|
||||
See also [case studies](https://docs.victoriametrics.com/CaseStudies.html).
|
||||
|
||||
## Third-party articles and slides about VictoriaMetrics
|
||||
|
||||
* [Choosing a Time Series Database for High Cardinality Aggregations](https://abiosgaming.com/press/high-cardinality-aggregations/)
|
||||
* [Scaling to trillions of metric data points](https://engineering.razorpay.com/scaling-to-trillions-of-metric-data-points-f569a5b654f2)
|
||||
* [VictoriaMetrics vs. OpenTSDB](https://blg.robot-house.us/posts/tsdbs-grow/)
|
||||
* [Monitoring of multiple OpenShift clusters with VictoriaMetrics](https://medium.com/ibm-garage/monitoring-of-multiple-openshift-clusters-with-victoriametrics-d4f0979e2544)
|
||||
* [Fly's Prometheus Metrics](https://fly.io/blog/measuring-fly/)
|
||||
* [Infrastructure monitoring with Prometheus at Zerodha](https://zerodha.tech/blog/infra-monitoring-at-zerodha/)
|
||||
* [Sismology: Iguana Solutions’ Monitoring System](https://medium.com/nerd-for-tech/sismology-iguana-solutions-monitoring-system-f46e4170447f)
|
||||
* [Prometheus High Availability and Fault Tolerance strategy, long term storage with VictoriaMetrics](https://medium.com/miro-engineering/prometheus-high-availability-and-fault-tolerance-strategy-long-term-storage-with-victoriametrics-82f6f3f0409e)
|
||||
* [How we improved our Kubernetes monitoring at Smarkets, and how you could too](https://smarketshq.com/monitoring-kubernetes-clusters-41a4b24c19e3)
|
||||
* [Kubernetes and VictoriaMetrics in Mist v4.6](https://mist.io/blog/2021-11-26-kubernetes-and-victoriametrics-in-Mist-v4-6)
|
||||
* [Foiled by the Firewall: A Tale of Transition From Prometheus to VictoriaMetrics](https://www.percona.com/blog/2020/12/01/foiled-by-the-firewall-a-tale-of-transition-from-prometheus-to-victoriametrics/)
|
||||
* [Observations on Better Resource Usage with Percona Monitoring and Management v2.12.0](https://www.percona.com/blog/2020/12/23/observations-on-better-resource-usage-with-percona-monitoring-and-management-v2-12-0/)
|
||||
* [Better Prometheus rate() function with VictoriaMetrics](https://www.percona.com/blog/2020/02/28/better-prometheus-rate-function-with-victoriametrics/)
|
||||
* [Percona monitoring and management migration from Prometheus to VictoriaMetrics FAQ](https://www.percona.com/blog/2020/12/16/percona-monitoring-and-management-migration-from-prometheus-to-victoriametrics-faq/)
|
||||
* [Compiling a Percona Monitoring and Management v2 Client in ARM: Raspberry Pi 3 Reprise](https://www.percona.com/blog/2021/05/26/compiling-a-percona-monitoring-and-management-v2-client-in-arm-raspberry-pi-3/)
|
||||
* [Making peace with Prometheus rate()](https://blog.doit-intl.com/making-peace-with-prometheus-rate-43a3ea75c4cf)
|
||||
* [Monitoring K8S with VictoriaMetrics](https://docs.google.com/presentation/d/1g7yUyVEaAp4tPuRy-MZbPXKqJ1z78_5VKuV841aQfsg/edit)
|
||||
* [CMS monitoring R&D: Real-time monitoring and alerts](https://indico.cern.ch/event/877333/contributions/3696707/attachments/1972189/3281133/CMS_mon_RD_for_opInt.pdf)
|
||||
* [The CMS monitoring infrastructure and applications](https://arxiv.org/pdf/2007.03630.pdf)
|
||||
* [Disk usage: VictoriaMetrics vs Prometheus](https://stas.starikevich.com/posts/disk-usage-for-vm-versus-prometheus/)
|
||||
* [Benchmarking time series workloads on Apache Kudu using TSBS](https://blog.cloudera.com/benchmarking-time-series-workloads-on-apache-kudu-using-tsbs/)
|
||||
* [What are Open Source Time Series Databases?](https://www.iunera.com/kraken/fabric/time-series-database/)
|
||||
* [Evaluating performance and correctness](https://www.robustperception.io/evaluating-performance-and-correctness)
|
||||
* [Running VictoriaMetrics on Raspberry PI](https://stas.starikevich.com/posts/raspberry-pi-4-prometheus/)
|
||||
* [Calculating the Error of Quantile Estimation with Histograms](https://linuxczar.net/blog/2020/08/13/histogram-error/)
|
||||
* [Monitoring private clouds with VictoriaMetrics at LeroyMerlin](https://www.youtube.com/watch?v=74swsWqf0Uc)
|
||||
* [Monitoring Kubernetes with VictoriaMetrics+Prometheus](https://speakerdeck.com/bo0km4n/victoriametrics-plus-prometheusdegou-zhu-surufu-shu-kubernetesfalsejian-shi-ji-pan)
|
||||
* [High-performance Graphite storage solution on top of VictoriaMetrics](https://golangexample.com/a-high-performance-graphite-storage-solution/)
|
||||
* [Cloud Native Model Driven Telemetry Stack on OpenShift](https://cer6erus.medium.com/cloud-native-model-driven-telemetry-stack-on-openshift-80712621f5bc)
|
||||
* [Observability, Availability & DORA’s Research Program](https://medium.com/alteos-tech-blog/observability-availability-and-doras-research-program-85deb6680e78)
|
||||
* [Tame Kubernetes Costs with Percona Monitoring and Management and Prometheus Operator](https://www.percona.com/blog/2021/02/12/tame-kubernetes-costs-with-percona-monitoring-and-management-and-prometheus-operator/)
|
||||
* [Prometheus VictoriaMetrics On AWS ECS](https://dalefro.medium.com/prometheus-victoria-metrics-on-aws-ecs-62448e266090)
|
||||
* [API Monitoring With Prometheus, Grafana, AlertManager and VictoriaMetrics](https://nordicapis.com/api-monitoring-with-prometheus-grafana-alertmanager-and-victoriametrics/)
|
||||
* [Solving Metrics at scale with VictoriaMetrics](https://www.youtube.com/watch?v=QgLMztnj7-8)
|
||||
* [Monitoring Kubernetes clusters with VictoriaMetrics and Grafana](https://blog.cybozu.io/entry/2021/03/18/115743)
|
||||
* [Multi-tenancy monitoring system for Kubernetes cluster using VictoriaMetrics and operators](https://blog.kintone.io/entry/2021/03/31/175256)
|
||||
* [Monitoring as Code на базе VictoriaMetrics и Grafana](https://habr.com/ru/post/568090/)
|
||||
* [Push Prometheus metrics to VictoriaMetrics or other exporters](https://pythonawesome.com/push-prometheus-metrics-to-victoriametrics-or-other-exporters/)
|
||||
* [Install and configure VictoriaMetrics on Debian](https://www.vultr.com/docs/install-and-configure-victoriametrics-on-debian)
|
||||
* [Superset BI with Victoria Metrics](https://cer6erus.medium.com/superset-bi-with-victoria-metrics-a109d3e91bc6)
|
||||
|
||||
## Our articles
|
||||
|
||||
### Announcements
|
||||
|
||||
* [Open-source strategy at VictoriaMetrics](https://www.youtube.com/watch?v=-DbbIZzFHIY)
|
||||
* [Open-sourcing VictoriaMetrics](https://blog.usejournal.com/open-sourcing-victoriametrics-f31e34485c2b)
|
||||
* [VictoriaMetrics — creating the best remote storage for Prometheus](https://faun.pub/victoriametrics-creating-the-best-remote-storage-for-prometheus-5d92d66787ac)
|
||||
* [Anomaly Detection in VictoriaMetrics](https://victoriametrics.medium.com/anomaly-detection-in-victoriametrics-9528538786a7)
|
||||
|
||||
|
||||
### Benchmarks
|
||||
|
||||
* [When size matters — benchmarking VictoriaMetrics vs Timescale and InfluxDB](https://valyala.medium.com/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4)
|
||||
* [High-cardinality TSDB benchmarks: VictoriaMetrics vs TimescaleDB vs InfluxDB](https://valyala.medium.com/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b)
|
||||
* [Insert benchmarks with inch: InfluxDB vs VictoriaMetrics](https://valyala.medium.com/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893)
|
||||
* [Measuring vertical scalability for time series databases in Google Cloud](https://valyala.medium.com/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae)
|
||||
* [Billy: how VictoriaMetrics deals with more than 500 billion rows](https://valyala.medium.com/billy-how-victoriametrics-deals-with-more-than-500-billion-rows-e82ff8f725da)
|
||||
* [First look at performance comparison between InfluxDB IOx and VictoriaMetrics](https://victoriametrics.medium.com/first-look-at-perfomance-comparassion-between-influxdb-iox-and-victoriametrics-e590f847935b)
|
||||
* [Prometheus vs VictoriaMetrics benchmark on node-exporter metrics](https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f)
|
||||
* [Promscale vs VictoriaMetrics: resource usage on production workload](https://valyala.medium.com/promscale-vs-victoriametrics-resource-usage-on-production-workload-91c8e3786c03)
|
||||
|
||||
|
||||
### Technical articles
|
||||
|
||||
* [How VictoriaMetrics makes instant snapshots for multi-terabyte time series data](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
|
||||
* [WAL Usage Looks Broken in Modern TSDBs](https://valyala.medium.com/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704)
|
||||
* [mmap may slow down your Go app](https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d)
|
||||
* [VictoriaMetrics: achieving better compression than Gorilla for time series data](https://faun.pub/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932)
|
||||
* [Stripping dependency bloat in VictoriaMetrics Docker image](https://valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d)
|
||||
* [Speeding up backups for big time series databases](https://valyala.medium.com/speeding-up-backups-for-big-time-series-databases-533c1a927883)
|
||||
* [Improving histogram usability for Prometheus and Grafana](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350)
|
||||
* [Why irate from Prometheus doesn't capture spikes](https://valyala.medium.com/why-irate-from-prometheus-doesnt-capture-spikes-45f9896d7832)
|
||||
* [VictoriaMetrics: PromQL compliance](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e)
|
||||
|
||||
|
||||
### Tutorials, guides and how-to articles
|
||||
|
||||
* [PromQL tutorial for beginners and humans](https://valyala.medium.com/promql-tutorial-for-beginners-9ab455142085)
|
||||
* [How to optimize PromQL and MetricsQL queries](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986)
|
||||
* [Analyzing Prometheus data with external tools](https://valyala.medium.com/analyzing-prometheus-data-with-external-tools-5f3e5e147639)
|
||||
* [Prometheus Subqueries in VictoriaMetrics](https://valyala.medium.com/prometheus-subqueries-in-victoriametrics-9b1492b720b3)
|
||||
* [How to migrate data from Prometheus to VictoriaMetrics](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043)
|
||||
* [VictoriaMetrics: how to migrate data from Prometheus. Filtering and modifying time series.](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-filtering-and-modifying-time-series-6d40cea4bf21)
|
||||
* [How to use relabeling in Prometheus and VictoriaMetrics](https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2)
|
||||
* [How to monitor Go applications with VictoriaMetrics](https://victoriametrics.medium.com/how-to-monitor-go-applications-with-victoriametrics-c04703110870)
|
||||
* [Prometheus storage: tech terms for humans](https://valyala.medium.com/prometheus-storage-technical-terms-for-humans-4ab4de6c3d48)
|
||||
|
||||
|
||||
### Other articles
|
||||
|
||||
* [How ClickHouse inspired us to build a high performance time series database](https://www.youtube.com/watch?v=p9qjb_yoBro). See also [slides](https://docs.google.com/presentation/d/1SdFrwsyR-HMXfbzrY8xfDZH_Dg6E7E5NJ84tQozMn3w/edit?usp=sharing).
|
||||
* [Comparing Thanos to VictoriaMetrics cluster](https://faun.pub/comparing-thanos-to-victoriametrics-cluster-b193bea1683)
|
||||
* [Evaluation performance and correctness: VictoriaMetrics response](https://valyala.medium.com/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87)
|
||||
65
BestPractices.md
Normal file
@@ -0,0 +1,65 @@
|
||||
---
|
||||
sort: 19
|
||||
---
|
||||
|
||||
# VictoriaMetrics best practices
|
||||
|
||||
|
||||
## Install Recommendation
|
||||
|
||||
It is recommended running the latest available release of VictoriaMetrics from [this page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases), since it contains all the bugfixes and enhancements.
|
||||
|
||||
There is no need to tune VictoriaMetrics because it uses reasonable defaults for command-line flags. These flags are automatically adjusted for the available CPU and RAM resources. There is no need in Operating System tuning because VictoriaMetrics is optimized for default OS settings. The only option is to increase the limit on the [number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a), so VictoriaMetrics could accept more incoming connections and could keep open more data files.
|
||||
|
||||
|
||||
## Filesystem
|
||||
|
||||
The recommended filesystem for VictoriaMetrics is [ext4](https://en.wikipedia.org/wiki/Ext4). If you plan to store more than 1TB of data on ext4 partition or plan to extend it to more than 16TB, then the following options are recommended to pass to mkfs.ext4:
|
||||
|
||||
```
|
||||
mkfs.ext4 ... -O 64bit,huge_file,extent -T huge
|
||||
```
|
||||
|
||||
VictoriaMetrics should work OK with other filesystems, including network filesystems such as [NFS](https://en.wikipedia.org/wiki/Network_File_System), [Amazon EFS](https://aws.amazon.com/efs/) and [Google Filestore](https://cloud.google.com/filestore).
|
||||
|
||||
|
||||
## Operation System
|
||||
|
||||
VictoriaMetrics is production-ready for the following operating systems:
|
||||
|
||||
* Linux (Alpine, Ubuntu, Debian, RedHat, etc.)
|
||||
* FreeBSD
|
||||
* OpenBSD
|
||||
|
||||
Some VictoriaMetrics components ([vmagent](https://docs.victoriametrics.com/vmagent.html), [vmalert](https://docs.victoriametrics.com/vmalert.html) and [vmauth](https://docs.victoriametrics.com/vmauth.html)) can run on Windows.
|
||||
|
||||
VictoriaMetrics can run also on MacOS for testing and development purposes.
|
||||
|
||||
|
||||
## Upgrade procedure
|
||||
|
||||
It is safe to upgrade VictoriaMetrics to new versions unless the [release notes](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) say otherwise. It is safe to skip multiple versions during the upgrade unless release notes say otherwise. It is recommended to perform regular upgrades to the latest version, since it may contain important bug fixes, performance optimizations or new features.
|
||||
|
||||
It is also safe to downgrade to the previous version unless release notes say otherwise.
|
||||
|
||||
The following steps must be performed during the upgrade / downgrade procedure:
|
||||
|
||||
* Send SIGINT signal to VictoriaMetrics process so that it is stopped gracefully.
|
||||
* Wait until the process stops. This can take a few seconds.
|
||||
* Start the upgraded VictoriaMetrics.
|
||||
|
||||
|
||||
## Backup Recommendations
|
||||
|
||||
VictoriaMetrics supports backups via [vmbackup](https://docs.victoriametrics.com/vmbackup.html) and [vmrestore](https://docs.victoriametrics.com/vmrestore.html) tools. There is also [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager.html), which simplifies backup automation.
|
||||
|
||||
|
||||
## Technical Support and Services
|
||||
|
||||
There are the following channels for providing technical support for VictoriaMetrics:
|
||||
|
||||
* [Github issues](https://github.com/VictoriaMetrics/VictoriaMetrics/issues)
|
||||
* [Slack channel](https://slack.victoriametrics.com/)
|
||||
* [Telegram channel](https://t.me/VictoriaMetrics_en)
|
||||
|
||||
We also provide [Enterprise support](https://victoriametrics.com/products/enterprise/).
|
||||
986
CHANGELOG.md
Normal file
@@ -0,0 +1,986 @@
|
||||
---
|
||||
sort: 15
|
||||
---
|
||||
|
||||
# CHANGELOG
|
||||
|
||||
## tip
|
||||
|
||||
* FEATURE: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): cover more cases with the [label filters' propagation optimization](https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization). This should improve the average performance for practical queries. The following cases are additionally covered:
|
||||
* Multi-level [transform functions](https://docs.victoriametrics.com/MetricsQL.html#transform-functions). For example, `abs(round(foo{a="b"})) + bar{x="y"}` is now optimized to `abs(round(foo{a="b",x="y"})) + bar{a="b",x="y"}`
|
||||
* Binary operations with `on()`, `without()`, `group_left()` and `group_right()` modifiers. For example, `foo{a="b"} on (a) + bar` is now optimized to `foo{a="b"} on (a) + bar{a="b"}`
|
||||
* Multi-level binary operations. For example, `foo{a="b"} + bar{x="y"} + baz{z="q"}` is now optimized to `foo{a="b",x="y",z="q"} + bar{a="b",x="y",z="q"} + baz{a="b",x="y",z="q"}`
|
||||
* Aggregate functions. For example, `sum(foo{a="b"}) by (c) + bar{c="d"}` is now optimized to `sum(foo{a="b",c="d"}) by (c) + bar{c="d"}`
|
||||
|
||||
* BUGFIX: return proper results from `highestMax()` function at [Graphite render API](https://docs.victoriametrics.com/#graphite-render-api-usage). Previously it was incorrectly returning timeseries with min peaks instead of max peaks.
|
||||
* BUGFIX: properly limit indexdb cache sizes. Previously they could exceed values set via `-memory.allowedPercent` and/or `-memory.allowedBytes` when `indexdb` contained many data parts. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2007).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix a bug, which could break time range picker when editing `From` or `To` input fields. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2080).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix a bug, which could break switching between `graph`, `json` and `table` views. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2084).
|
||||
* BUGFIX: show the original location of the warning or error message when logging throttled messages. Previously the location inside `lib/logger/throttler.go` was shown. This could increase the complexity of debugging.
|
||||
|
||||
|
||||
## [v1.72.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.72.0)
|
||||
|
||||
Released at 18-01-2022
|
||||
|
||||
* FEATURE: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): add support for `@` modifier, which is enabled by default in Prometheus starting from [Prometheus v2.33.0](https://github.com/prometheus/prometheus/pull/10121). See [these docs](https://prometheus.io/docs/prometheus/latest/querying/basics/#modifier) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1348). VictoriaMetrics extends `@` modifier with the following additional features:
|
||||
* It can contain arbitrary expression. For example, `foo @ (end() - 1h)` would return `foo` value at `end - 1 hour` timestamp on the selected time range `[start ... end]`. Another example: `foo @ (now() - 10m)` would return `foo` value 10 minutes ago from the current time.
|
||||
* It can be put everywhere in the query. For example, `sum(foo) @ start()` would calculate `sum(foo)` at `start` timestamp on the selected time range `[start ... end]`.
|
||||
* FEATURE: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): add support for optional `keep_metric_names` modifier, which can be applied to all the [rollup functions](https://docs.victoriametrics.com/MetricsQL.html#rollup-functions) and [transform functions](https://docs.victoriametrics.com/MetricsQL.html#transform-functions). This modifier prevents from deleting metric names from function results. For example, `rate({__name__=~"foo|bar"}[5m]) keep_metric_names` leaves `foo` and `bar` metric names in `rate()` results. This feature provides an additional workaround for [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/949).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add support for Kubernetes service discovery in the current namespace in the same way as [Prometheus does](https://github.com/prometheus/prometheus/pull/9881). For example, the following config limits pod discovery to the namespace where vmagent runs:
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'kubernetes-pods'
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
namespaces:
|
||||
own_namespace: true
|
||||
```
|
||||
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): add `__meta_kubernetes_node_provider_id` label for discovered Kubernetes nodes in the same way as [Prometheus does](https://github.com/prometheus/prometheus/pull/9603).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): log error message when remote storage returns 400 or 409 http errors. This should simplify detection and debugging of this case. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1911).
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): expose `promscrape_stale_samples_created_total` metric for monitoring the total number of created stale samples when scraping Prometheus targets. See [these docs](https://docs.victoriametrics.com/vmagent.html#prometheus-staleness-markers) for the information on when stale samples (aka staleness markers) can be created.
|
||||
* FEATURE: [vmrestore](https://docs.victoriametrics.com/vmrestore.html): store `restore-in-progress` file in `-dst` directory while `vmrestore` is running. This file is automatically deleted when `vmrestore` is successfully finished. This helps detecting incompletely restored data on VictoriaMetrics start. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1958).
|
||||
* FEATURE: [vmctl](https://docs.victoriametrics.com/vmctl.html): print the last sample timestamp when the data migration is interrupted either by user or by error. This helps continuing the data migration from the interruption moment. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1236).
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): expose `vmalert_remotewrite_total` metric at `/metrics` page. This makes possible calculating SLOs for error rate during writing recording rules and alert state to `-remoteWrite.url` with the query `vmalert_remotewrite_errors_total / vmalert_remotewrite_total`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2040). Thanks to @afoninsky .
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `stripPort` template function in the same way as [Prometheus does](https://github.com/prometheus/prometheus/pull/10002).
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add `parseDuration` template function in the same way as [Prometheus does](https://github.com/prometheus/prometheus/pull/8817).
|
||||
* FEATURE: [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html): add `stale_samples_over_time(m[d])` function for calculating the number of [staleness marks](https://docs.victoriametrics.com/vmagent.html#prometheus-staleness-markers) for time series `m` over the duration `d`. This function may be useful for detecting flapping metrics at scrape targets, which periodically disappear and then appear again.
|
||||
* FEATURE: [vmgateway](https://docs.victoriametrics.com/vmgateway.html): add support for `extra_filters` option. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1863).
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): improve UX according to [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1960). Thanks to @Loori-R .
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): limit the number of requests sent to VictoriaMetrics during zooming / scrolling. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2064).
|
||||
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): make sure that `vmagent` replicas scrape the same targets at different time offsets when [replication is enabled in vmagent clustering mode](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets). This guarantees that the [deduplication](https://docs.victoriametrics.com/#deduplication) consistently leaves samples from the same `vmagent` replica.
|
||||
* BUGFIX: return the proper response stub from `/api/v1/query_exemplars` handler, which is needed for Grafana v8+. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1999).
|
||||
* BUGFIX: [vmctl](https://docs.victoriametrics.com/vmctl.html): fix a few edge cases and improve migration speed for OpenTSDB importer. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2019).
|
||||
* BUGFIX: fix possible data race when searching for time series matching `{key=~"value|"}` filter over time range covering multipe days. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2032). Thanks to @waldoweng for the provided fix.
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): do not send staleness markers on graceful shutdown. This follows Prometheus behavior. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2013#issuecomment-1006994079).
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): properly set `__address__` label in `dockerswarm_sd_config`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2038). Thanks to @ashtuchkin for the fix.
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix incorrect calculations for graph limits on y axis. This could result in incorrect graph rendering in some cases. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2037).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix handling for multi-line queries. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2039).
|
||||
|
||||
|
||||
## [v1.71.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.71.0)
|
||||
|
||||
Released at 20-12-2021
|
||||
|
||||
* FEATURE: [VictoriaMetrics enterprise](https://victoriametrics.com/products/enterprise/): add multi-level downsampling support. See [these docs](https://docs.victoriametrics.com/#downsampling) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/36).
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add ability to analyze the correlation between two queries on a single graph. Just click `+Query` button, enter the second query in the newly appeared input field and press `Ctrl+Enter`. Results for both queries should be displayed simultaneously on the same graph. Every query has its own vertical scale, which is displayed on the left and the right side of the graph. Lines for the second query are dashed. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1916).
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): add ability to override the interval between returned datapoints. By default it is automatically calculated depending on the selected time range and horizontal resolution of the graph. Now it is possible to override it with custom values. This may be useful during data exploration and debugging.
|
||||
* FEATURE: accept optional `extra_filters[]=series_selector` query args at Prometheus query APIs additionally to `extra_label` query args. This allows enforcing additional filters for all the Prometheus query APIs by using [vmgateway](https://docs.victoriametrics.com/vmgateway.html) or [vmauth](https://docs.victoriametrics.com/vmauth.html). See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1863).
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): allow specifying `http` and `https` urls in `-auth.config` command-line flag. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1898). Thanks for @TFM93 .
|
||||
* FEATURE: [vmagent](https://docs.victoriametrics.com/vmagent.html): allow specifying `http` and `https` urls in the following command-line flags: `-promscrape.config`, `-remoteWrite.relabelConfig` and `-remoteWrite.urlRelabelConfig`.
|
||||
* FEATURE: vminsert: allow specifying `http` and `https` urls in `-relabelConfig` command-line flag.
|
||||
* FEATURE: vminsert: add `-maxLabelValueLen` command-line flag for the ability to configure the maximum length of label value. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1908).
|
||||
* FEATURE: preserve the order of time series passed to [limit_offset](https://docs.victoriametrics.com/MetricsQL.html#limit_offset) function. This allows implementing series paging via `limit_offset(limit, offset, sort_by_label(...))`. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1920) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/951) issues.
|
||||
* FEATURE: automaticall convert `(value1|...|valueN)` into `{value1,...,valueN}` inside `__graphite__` pseudo-label. This allows using [Grafana multi-value template variables](https://grafana.com/docs/grafana/latest/variables/formatting-multi-value-variables/) inside `__graphite__` pseudo-label. For example, `{__graphite__=~"foo.($bar)"}` is expanded to `{__graphite__=~"foo.{x,y}"}` if both `x` and `y` are selected for `$bar` template variable. See [these docs](https://docs.victoriametrics.com/#selecting-graphite-metrics) for details.
|
||||
* FEATURE: add [timestamp_with_name](https://docs.victoriametrics.com/MetricsQL.html#timestamp_with_name) function. It works the same as [timestamp](https://docs.victoriametrics.com/MetricsQL.html#timestamp), but leaves the original time series names, so it can be used in queries, which match multiple time series names: `timestamp_with_name({foo="bar"}[1h])`. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/949#issuecomment-995222388) for more context.
|
||||
* FEATURE: add [changes_prometheus](https://docs.victoriametrics.com/MetricsQL.html#changes_prometheus), [increase_prometheus](https://docs.victoriametrics.com/MetricsQL.html#increase_prometheus) and [delta_prometheus](https://docs.victoriametrics.com/MetricsQL.html#delta_prometheus) functions, which don't take into account the previous sample before the given lookbehind window specified in square brackets. These functions may be used when the Prometheus behaviour for `changes()`, `increase()` and `delta()` functions is needed to be preserved. VictoriaMetrics uses slightly different behaviour for `changes()`, `increase()` and `delta()` functions by default - see [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1962).
|
||||
|
||||
* BUGFIX: fix `unaligned 64-bit atomic operation` panic on 32-bit architectures, which has been introduced in v1.70.0. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1944).
|
||||
* BUGFIX: [vmalert](https://docs.victoriametrics.com/vmalert.html): restore the ability to use `$labels.alertname` in labels templating. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1921).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): add missing `query` caption to the input field for the query. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1900).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix navigation over query history with `Ctrl+up/down` and fix zoom relatively to the cursor position. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1936).
|
||||
* BUGFIX: deduplicate samples more thoroughly if [deduplication](https://docs.victoriametrics.com/#deduplication) is enabled. Previously some duplicate samples may be left on disk for time series with high churn rate. This may result in bigger storage space requirements.
|
||||
* BUGFIX: [vmagent](https://docs.victoriametrics.com/vmagent.html): follow up to 5 redirects when `follow_redirects: true` is set for a particular scrape config. Previously only a single redirect was performed in this case. It is expected these redirects are performed to the original hostname. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945).
|
||||
* BUGFIX: de-duplicate data exported via [/api/v1/export/csv](https://docs.victoriametrics.com/#how-to-export-csv-data) by default if [deduplication](https://docs.victoriametrics.com/#deduplication) is enabled. The de-duplication can be disabled by passing `reduce_mem_usage=1` query arg to `/api/v1/export/csv`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1837).
|
||||
* BUGFIX: [vmalert](https://docs.victoriametrics.com/vmalert.html): properly store [historical data](https://docs.victoriametrics.com/vmalert.html#rules-backfilling) to old Prometheus versions. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1943).
|
||||
|
||||
|
||||
## [v1.70.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.70.0)
|
||||
|
||||
Released at 02-12-2021
|
||||
|
||||
* FEATURE: [vmalert](https://docs.victoriametrics.com/vmalert.html): add ability to pass arbitrary query args to `-datasource.url` on a per-group basis via `params` option. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1892).
|
||||
* FEATURE: add `now()` function to MetricsQL. This function returns the current timestamp in seconds. See [these docs](https://docs.victoriametrics.com/MetricsQL.html#now).
|
||||
* FEATURE: vmauth: allow using optional `name` field in configs. This field is then used as `username` label value for `vmauth_user_requests_total` metric. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1805).
|
||||
* FEATURE: vmagent: export `vm_persistentqueue_read_duration_seconds_total` and `vm_persistentqueue_write_duration_seconds_total` metrics, which can be used for detecting persistent queue saturation with `rate(vm_persistentqueue_write_duration_seconds_total) > 0.9` alerting rule.
|
||||
* FEATURE: export `vm_filestream_read_duration_seconds_total` and `vm_filestream_write_duration_seconds_total` metrics, which can be used for detecting persistent disk saturation with `rate(vm_filestream_read_duration_seconds_total) > 0.9` alerting rule.
|
||||
* FEATURE: export `vm_cache_size_max_bytes` metrics, which show capacity for various caches. These metrics can be used for determining caches with reach its capacity with `vm_cache_size_bytes / vm_cache_size_max_bytes > 0.9` query.
|
||||
* FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html), [vmrestore](https://docs.victoriametrics.com/vmrestore.html): add `-s3ForcePathStyle` command-line flag, which can be used for making backups to [Aliyun OSS](https://www.aliyun.com/product/oss). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1802).
|
||||
* FEATURE: [vmctl](https://docs.victoriametrics.com/vmctl.html): improve data migration from OpenTSDB. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1809). Thanks to @johnseekins .
|
||||
* FEATURE: suppress `connection reset by peer` errors when remote client resets TCP connection to VictoriaMetrics / vmagent while ingesting the data via InfluxDB line protocol, Graphite protocol or OpenTSDB protocol. This error is expected, so there is no need in logging it.
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): store the display type in URL, so it isn't lost when copy-pasting the URL. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1804).
|
||||
* FEATURE: vmalert: make `-notifier.url` command-line flag optional. This flag can be omitted if `vmalert` is used solely for recording rules and doesn't evaluate alerting rules. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1870).
|
||||
* FEATURE: [vmbackup](https://docs.victoriametrics.com/vmbackup.html), [vmrestore](https://docs.victoriametrics.com/vmrestore.html): export internal metrics at `http://vmbackup:8420/metrics` and `http://vmrestore:8421/metrics` for better visibility of the backup/restore process.
|
||||
* FEATURE: allow trailing whitespace after the timestamp when [parsing Graphite plaintext lines](https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1865).
|
||||
* FEATURE: expose `/-/healthy` and `/-/ready` endpoints as Prometheus does. This is needed for improving integration with third-party solutions, which rely on these endpoints. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1833).
|
||||
|
||||
* BUGFIX: vmagent: prevent from scraping duplicate targets if `-promscrape.dropOriginalLabels` command-line flag is set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1830). Thanks to @guidao for the fix.
|
||||
* BUGFIX: vmstorage [enterprise](https://victoriametrics.com/products/enterprise/): added missing `vm_tenant_used_tenant_bytes` metric, which shows the approximate per-tenant disk usage. See [these docs](https://docs.victoriametrics.com/PerTenantStatistic.html) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1605).
|
||||
* BUGFIX: vmauth: properly take into account the value passed to `-maxIdleConnsPerBackend` command-line flag. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300).
|
||||
* BUGFIX: vmagent: fix [reading data from Kafka](https://docs.victoriametrics.com/vmagent.html#reading-metrics-from-kafka).
|
||||
* BUGFIX: vmalert: fix [replay mode](https://docs.victoriametrics.com/vmalert.html#rules-backfilling) in enterprise version.
|
||||
* BUGFIX: consistently return zero from [deriv()](https://docs.victoriametrics.com/MetricsQL.html#deriv) function applied to a constant time series. Previously it could return small non-zero values in this case.
|
||||
* BUGFIX: [vmrestore](https://docs.victoriametrics.com/vmrestore.html): properly resume downloading for partially downloaded big files. Previously such files were re-downloaded from the beginning after the interrupt. Now only the remaining parts of the file are downloaded. This allows saving network bandwidth. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/487).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): do not store the last query across vmui page reloads. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1694).
|
||||
* BUGFIX: [vmui](https://docs.victoriametrics.com/#vmui): fix `Cannot read properties of undefined` error at table view. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1797).
|
||||
|
||||
|
||||
## [v1.69.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.69.0)
|
||||
|
||||
Released at 08-11-2021
|
||||
|
||||
* FEATURE: vmalert: allow groups with empty rules list like Prometheus does. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1742).
|
||||
* FEATURE: vmalert: allow groups with default `tenant` in `-clusterMode`. Default `tenant` values can be specified via `-defaultTenant.prometheus` and `-defaultTenant.graphite`. See [these docs](https://docs.victoriametrics.com/vmalert.html#multitenancy).
|
||||
* FEATURE: vmagent: add `collapse` and `expand` buttons per each group of targets with the same `job_name` at `http://vmagent:8429/targets` page.
|
||||
* FEATURE: automatically detect timestamp precision (ns, us, ms or s) for the data ingested into VictoriaMetrics via [InfluxDB line protocol](https://docs.victoriametrics.com/#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf).
|
||||
* FEATURE: vmagent: add ability to protect `/config` page with auth key via `-configAuthKey` command-line flag. This page may contain sensitive config information, so it may be good to restrict access to this page. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764).
|
||||
* FEATURE: vmagent: hide passwords and auth tokens at `/config` page like Prometheus does. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1764).
|
||||
* FEATURE: vmagent: add `-promscrape.maxResponseHeadersSize` command-line flag for tuning the maximum HTTP response headers size for Prometheus scrape targets.
|
||||
* FEATURE: vmagent: send data to multiple configured remote storage systems in parallel (e.g. when multiple `-remoteWrite.url` flag values are specified). This should improve data ingestion speed.
|
||||
* FEATURE: vmagent: add `-remoteWrite.maxRowsPerBlock` command-line flag for tuning the number of samples to send to remote storage per each block. Bigger values may improve data ingestion performance at the cost of higher memory usage.
|
||||
* FEATURE: vmagent: distribute Kafka messages among all the partitions when [writing data to Kafka](https://docs.victoriametrics.com/vmagent.html#writing-metrics-to-kafka).
|
||||
* FEATURE: add [label_graphite_group](https://docs.victoriametrics.com/MetricsQL.html#label_graphite_group) function for extracting the given groups from Graphite metric names.
|
||||
* FEATURE: add [duration_over_time](https://docs.victoriametrics.com/MetricsQL.html#duration_over_time) function for calculating the actual lifetime of the time series with possible gaps. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1780).
|
||||
* FEATURE: add [limit_offset](https://docs.victoriametrics.com/MetricsQL.html#limit_offset) function, which can be used for implementing simple paging over big number of time series. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1778).
|
||||
|
||||
* BUGFIX: vmagent: reduce the increased memory usage when scraping targets with big number of metrics which periodically change. The memory usage has been increased in v1.68.0 after vmagent started generating staleness markers in [stream parse mode](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1745).
|
||||
* BUGFIX: vmagent: properly display `proxy_url` config option at `http://vmagent:8429/config` page. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1755).
|
||||
* BUGFIX: fix tests for Apple M1. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1653).
|
||||
|
||||
|
||||
## [v1.68.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.68.0)
|
||||
|
||||
Released at 22-10-2021
|
||||
|
||||
* FEATURE: vmagent: expose `-promscrape.config` contents at `/config` page as Prometheus does. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1695).
|
||||
* FEATURE: vmagent: add `show original labels` button per each scrape target displayed at `http://vmagent:8429/targets` page. This should improve debuggability for service discovery and relabeling issues similar to [this one](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1664). See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1698).
|
||||
* FEATURE: vmagent: shard targets among cluster nodes after the relabeling is applied. This should guarantee that targets with the same set of labels go to the same `vmagent` node in the cluster. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1687).
|
||||
* FEATURE: vmagent: atomatically switch to [stream parsing mode](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode) if the response from the given target exceeds the command-line flag value `-promscrape.minResponseSizeForStreamParse`. This should reduce memory usage when `vmagent` scrapes targets with non-uniform response sizes (this is the case in Kubernetes monitoring).
|
||||
* FEATURE: vmagent: send Prometheus-like staleness marks in [stream parsing mode](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode). Previously staleness marks wern't sent in stream parsing mode. See [these docs](https://docs.victoriametrics.com/vmagent.html#prometheus-staleness-markers) for details.
|
||||
* FEATURE: vmagent: properly calculate `scrape_series_added` metric for targets in [stream parsing mode](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode). Previously it was set to 0 in stream parsing mode. See [more details about this metric](https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series).
|
||||
* FEATURE: vmagent: expose `promscrape_series_limit_max_series` and `promscrape_series_limit_current_series` metrics at `http://vmagent:8429/metrics` for scrape targets with the [enabled series limiter](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter).
|
||||
* FEATURE: vmagent: return error if `sample_limit` or `series_limit` options are set when [stream parsing mode](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode) is enabled, since these limits cannot be applied in stream parsing mode.
|
||||
* FEATURE: vmalert: add `-remoteRead.disablePathAppend` command-line flag, which allows specifying the full `-remoteRead.url`. If `-remoteRead.disablePathAppend` is set, then `vmalert` doesn't add `/api/v1/query` suffix to `-remoteRead.url`.
|
||||
* FEATURE: add trigonometric functions, which are going to be added in [Prometheus 2.31](https://github.com/prometheus/prometheus/pull/9239): [acosh](https://docs.victoriametrics.com/MetricsQL.html#acosh), [asinh](https://docs.victoriametrics.com/MetricsQL.html#asinh), [atan](https://docs.victoriametrics.com/MetricsQL.html#atan), [atanh](https://docs.victoriametrics.com/MetricsQL.html#atanh), [cosh](https://docs.victoriametrics.com/MetricsQL.html#cosh), [deg](https://docs.victoriametrics.com/MetricsQL.html#deg), [rad](https://docs.victoriametrics.com/MetricsQL.html#rad), [sinh](https://docs.victoriametrics.com/MetricsQL.html#sinh), [tan](https://docs.victoriametrics.com/MetricsQL.html#tan), [tanh](https://docs.victoriametrics.com/MetricsQL.html#tanh). Also add `atan2` binary operator. See [this pull request](https://github.com/prometheus/prometheus/pull/9248).
|
||||
* FEATURE: consistently return the same set of time series from [limitk](https://docs.victoriametrics.com/MetricsQL.html#limitk) function. This improves the usability of periodically refreshed graphs.
|
||||
* FEATURE: [vmui](https://docs.victoriametrics.com/#vmui): varios UX improvements. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1711) and [these docs](https://docs.victoriametrics.com/#vmui).
|
||||
* FEATURE: [vmauth](https://docs.victoriametrics.com/vmauth.html): add ability to specify HTTP headers, which will be sent in requests to backends. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1736).
|
||||
* FEATURE: add `/flags` page to all the VictoriaMetrics components. This page contains command-line flags passed to the component.
|
||||
* FEATURE: allow using tab separators additionally to whitespace separators when [ingesting data in Graphite plaintext protocol](https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd). Such separators are [supported by Carbon-c-relay](https://github.com/grobian/carbon-c-relay/commit/f3ffe6cc2b52b07d14acbda649ad3fd6babdd528).
|
||||
|
||||
* BUGFIX: vmstorage: fix `unaligned 64-bit atomic operation` panic on 32-bit architectures (arm and 386). The panic has been introduced in v1.67.0.
|
||||
* BUGFIX: vmalert, vmauth: prevent from frequent closing of TCP connections established to backends under high load. This should reduce the number of TCP sockets in `TIME_WAIT` state at `vmalert` and `vmauth` under high load. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1704).
|
||||
* BUGFIX: vmalert: correctly calculate alert ID including extra labels. Previously, ID for alert entity was generated without alertname or groupname. This led to collision, when multiple alerting rules within the same group producing same labelsets. E.g. expr: `sum(metric1) by (job) > 0` and expr: `sum(metric2) by (job) > 0` could result into same labelset `job: "job"`. The bugfix adds all extra labels right after receiving response from the datasource. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1734).
|
||||
* BUGFIX: vmalert: fix links in [Web UI](https://docs.victoriametrics.com/vmalert.html#web). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1717).
|
||||
* BUGFIX: vmagent: set `honor_timestamps: true` by default in [scrape configs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) if this options isn't set explicitly. This aligns the behaviour with Prometheus.
|
||||
* BUGFIX: vmagent: group scrape targets by the original job names at `http://vmagent:8429/targets` page like Prometheus does. Previously they were grouped by the job name after relabeling, which may result in unexpected empty target groups. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1707).
|
||||
* BUGFIX: [vmctl](https://docs.victoriametrics.com/vmctl.html): fix importing boolean fields from InfluxDB line protocol. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1709).
|
||||
|
||||
|
||||
## [v1.67.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.67.0)
|
||||
|
||||
Released at 08-10-2021
|
||||
|
||||
* FEATURE: add ability to accept metrics from [DataDog agent](https://docs.datadoghq.com/agent/) and [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/). See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-datadog-agent). This option simplifies the migration path from DataDog to VictoriaMetrics. See also [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/206).
|
||||
* FEATURE: vmagent [enterprise](https://victoriametrics.com/products/enterprise/): add support for data reading and writing from/to [Apache Kafka](https://kafka.apache.org/). See [these docs](https://docs.victoriametrics.com/vmagent.html#kafka-integration).
|
||||
* FEATURE: vmui: switch to [μPlot](https://github.com/leeoniya/uPlot) and add ability to naturally scroll and zoom graphs. See [these docs](https://docs.victoriametrics.com/#vmui). Thanks to @Loori-R.
|
||||
* FEATURE: vmstorage: stop accepting new data if `-storageDataPath` directory contains less than `-storage.minFreeDiskSpaceBytes` of free space. This should prevent from `out of disk space` crashes. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/269).
|
||||
* FEATURE: calculate quantiles in the same way as Prometheus does in such functions as [quantile_over_time](https://docs.victoriametrics.com/MetricsQL.html#quantile_over_time) and [quantile](https://docs.victoriametrics.com/MetricsQL.html#quantile). Previously results from VictoriaMetrics could be slightly different than results from Prometheus. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1625) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1612) issues.
|
||||
* FEATURE: add `rollup_scrape_interval(m[d])` function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which returns `min`, `max` and `avg` values for the interval between samples for `m` on the given lookbehind window `d`.
|
||||
* FEATURE: add `topk_last(k, q)` and `bottomk_last(k, q)` functions to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which return up to `k` time series from `q` with the maximum / minimum last value on the graph.
|
||||
|
||||
* BUGFIX: align behavior of the queries `a or on (labels) b`, `a and on (labels) b` and `a unless on (labels) b` where `b` has multiple time series with the given `labels` to Prometheus behavior. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1643).
|
||||
* BUGFIX: vmagent: fix `openstack_sd_config` service discovery when both `domain_name` and `project_id` config options are set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1655).
|
||||
* BUGFIX: return proper values (zeroes) from [stddev_over_time](https://docs.victoriametrics.com/MetricsQL.html#stddev_over_time) and [stdvar_over_time](https://docs.victoriametrics.com/MetricsQL.html#stdvar_over_time) functions when the lookbehind window in square brackets contains only a single sample. Previously the sample value was incorrectly returned in this case.
|
||||
* BUGFIX: vminsert: fix uneven distribution of time series among storage nodes in [multi-level cluster setup](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multi-level-cluster-setup). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1672).
|
||||
|
||||
|
||||
## [v1.66.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.66.2)
|
||||
|
||||
Released at 23-09-2021
|
||||
|
||||
* FEATURE: vmagent: add `vm_promscrape_max_scrape_size_exceeded_errors_total` metric for counting of the failed scrapes due to the exceeded response size (the response size limit can be configured via `-promscrape.maxScrapeSize` command-line flag). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1639).
|
||||
|
||||
* BUGFIX: vmalert: properly reload rule groups if only the `interval` config option is changed. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1641).
|
||||
* BUGFIX: properly handle `{__name__=~"prefix(suffix1|suffix2)",other_label="..."}` queries. They may return unexpected empty responses since v1.66.0. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1644).
|
||||
|
||||
|
||||
## [v1.66.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.66.1)
|
||||
|
||||
Released at 22-09-2021
|
||||
|
||||
* FEATURE: add `-cluster` and/or `-enterprise` suffixes to `short_version` label at `vm_app_version` metric exposed at `/metrics` page of every VictoriaMetrics component. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1635).
|
||||
|
||||
* BUGFIX: vmselect: fix accessing [Graphite APIs](https://docs.victoriametrics.com/#graphite-api-usage). The access has been broken in v1.66.0, because `/graphite/*` path prefix accidentally clashed with `/graph*` path prefix used for VictoriaMetrics UI (aka `vmui`).
|
||||
* BUGFIX: fix parsing `regex: <bool_or_number>` in relabeling rules (for example, `regex: true` or `regex: 123`). The bug has been introduced in v1.66.0.
|
||||
|
||||
|
||||
## [v1.66.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.66.0)
|
||||
|
||||
Released at 20-09-2021
|
||||
|
||||
* FEATURE: vmalert: add web UI with the list of alerting groups, alerts and alert statuses. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1602).
|
||||
* FEATURE: vmalert: add `-rule.maxResolveDuration` command-line flag, which could be used for limiting the auto-resolve duration for the alerting rule. By default it is limited to 3x evaluation interval. This could be too high for big evaluation intervals. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1586).
|
||||
* FEATURE: vmalert: add support for Bearer token authorization for `-datasource.url`, `-remoteRead.url` and `-remoteWrite.url`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1608).
|
||||
* FEATURE: vmagent: send stale markers for disappeared metrics like Prometheus does. Previously stale markers were sent only when the scrape target disappears or when it becomes temporarily unavailable. See [these docs](https://docs.victoriametrics.com/vmagent.html#prometheus-staleness-markers) for details.
|
||||
* FEATURE: vmagent: add ability to set `series_limit` option for a particular scrape target via `__series_limit__` label. This allows setting the limit on the number of time series on a per-target basis. See [these docs](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter) for details.
|
||||
* FEATURE: vmagent: add ability to set `stream_parse` option for a particular scrape target via `__stream_parse__` label. This allows managing the stream parsing mode on a per-target basis. See [these docs](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode) for details.
|
||||
* FEATURE: vmagent: add ability to set `scrape_interval` and `scrape_timeout` options for a particular target via `__scrape_interval__` and `__scrape_timeout__` labels in the same way as Prometheus 2.30 does. See [this pull request](https://github.com/prometheus/prometheus/pull/8911).
|
||||
* FEATURE: vmagent: generate `scrape_timeout_seconds` metric per each scrape target, so the target saturation could be calculated with `scrape_duration_seconds / scrape_timeout_seconds`. See the corresponding [pull request from Prometheus 2.30](https://github.com/prometheus/prometheus/pull/9247).
|
||||
* FEATURE: vmagent: reduce CPU usage when calculating the number of newly added series per scrape (this number is sent to remote storage in `scrape_series_added` metric).
|
||||
* FEATURE: vmagent: reduce CPU usage when applying `series_limit` to scrape targets with constant set of metrics. See more information about `series_limit` [here](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter).
|
||||
* FEATURE: vminsert: disable rerouting by default when a few of `vmstorage` nodes start accepting data at lower speed than the rest of `vmstorage` nodes. This should improve VictoriaMetrics cluster stability during rolling restarts and during spikes in [time series churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate). The rerouting can be enabled by passing `-disableRerouting=false` command-line flag to `vminsert`.
|
||||
* FEATURE: vmauth: do not put invalid auth tokens into log by default due to security reasons. The logging can be returned back by passing `-logInvalidAuthTokens` command-line flag to `vmauth`. Requests with invalid auth tokens are counted at `vmagent_http_request_errors_total{reason="invalid_auth_token"}` metric exposed by `vmauth` at `/metrics` page.
|
||||
* FEATURE: add new relabeling actions: `keep_metrics` and `drop_metrics`. This simplifies metrics filtering by metric names. See [these docs](https://docs.victoriametrics.com/vmagent.html#relabeling) for more details.
|
||||
* FEATURE: allow splitting long `regex` in relabeling filters into an array of shorter regexps, which can be put into multiple lines for better readability and maintainability. See [these docs](https://docs.victoriametrics.com/vmagent.html#relabeling) for more details.
|
||||
* FEATURE: optimize performance for queries with regexp filters on metric name like `{__name__=~"metric1|...|metricN"}`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1610) from @faceair.
|
||||
* FEATURE: vmui: use Prometheus-compatible query args, so `vmui` could be accessed from graph editor in Grafana. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1619). Thanks to @Loori-R.
|
||||
* FEATURE: vmselect: automatically add missing port to `-storageNode` hostnames. For example, `-storageNode=vmstorage1,vmstorage2` is automatically translated to `-storageNode=vmstorage1:8401,vmstorage2:8401`. This simplifies [manual setup of VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-setup).
|
||||
* FEATURE: vminsert: automatically add missing port to `-storageNode` hostnames. For example, `-storageNode=vmstorage1,vmstorage2` is automatically translated to `-storageNode=vmstorage1:8400,vmstorage2:8400`. This simplifies [manual setup of VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-setup).
|
||||
* FEATURE: add [mad(q)](https://docs.victoriametrics.com/MetricsQL.html#mad) function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). It calculates [Median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) for groups of points with identical timestamps across multiple time series.
|
||||
* FEATURE: add [outliers_mad(tolerance, q)](https://docs.victoriametrics.com/MetricsQL.html#outliers_mad) function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). It returns time series with peaks outside the [Median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) multiplied by `tolerance`.
|
||||
* FEATURE: add `histogram_quantiles("phiLabel", phi1, ..., phiN, buckets)` function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). It calculates the given `phi*`-quantiles over the given `buckets` and returns time series per each quantile with the corresponding `{phiLabel="phi*"}` label.
|
||||
* FEATURE: add `quantiles_over_time("phiLabel", phi1, ..., phiN, series_selector[d])` function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). It calculates the given `phi*`-quantiles over raw samples selected by `series_selector` on the given lookbehind window `d`. It returns time series per each quantile with the corresponding `{phiLabel="phi*"}` label.
|
||||
* FEATURE: [enterprise](https://victoriametrics.com/products/enterprise/): do not ask for `-eula` flag if `-version` flag is passed to enteprise app. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1621).
|
||||
|
||||
* BUGFIX: properly handle queries with multiple filters matching empty labels such as `metric{label1=~"foo|",label2="bar|"}`. This filter must match the following series: `metric`, `metric{label1="foo"}`, `metric{label2="bar"}` and `metric{label1="foo",label2="bar"}`. Previously it was matching only `metric{label1="foo",label2="bar"}`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1601).
|
||||
* BUGFIX: vmselect: reset connection timeouts after each request to `vmstorage`. This should prevent from `cannot read data in 0.000 seconds: unexpected EOF` warning in logs. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1562). Thanks to @mxlxm .
|
||||
* BUGFIX: keep metric name for time series returned from [rollup_candlestick](https://docs.victoriametrics.com/MetricsQL.html#rollup_candlestick) function, since the returned series don't change the meaning of the original series. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1600).
|
||||
* BUGFIX: use Prometheus-compatible label value formatting for [count_values](https://docs.victoriametrics.com/MetricsQL.html#count_values) function. Previously big values could be improperly formatted, which could break query results, which rely on label value such as `... on(label) count_values("label", ...)`.
|
||||
* BUGFIX: vmagent: properly use `https` scheme for wildcard TLS certificates for `role: ingress` targets in Kubernetes service discovery. See [this issue](https://github.com/prometheus/prometheus/issues/8902).
|
||||
* BUGFIX: vmagent: support host networking mode for `docker_sd_config`. See [this issue](https://github.com/prometheus/prometheus/issues/9116).
|
||||
* BUGFIX: fix non-repeatable results from `quantile_over_time()` function when the number of input samples exceeds 1000. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1612).
|
||||
* BUGFIX: vmagent: fix EC2 zone discovery when `filters` are specified in [ec2_sc_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1626).
|
||||
|
||||
|
||||
## [v1.65.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.65.0)
|
||||
|
||||
Released at 01-09-2021
|
||||
|
||||
* FEATURE: vmagent: add ability to read scrape configs from multiple files specified in `scrape_config_files` section. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1559).
|
||||
* FEATURE: vmagent: reduce memory usage and CPU usage when [Prometheus staleness tracking](https://docs.victoriametrics.com/vmagent.html#prometheus-staleness-markers) is enabled for metrics exported from the deleted or disappeared scrape targets.
|
||||
* FEATURE: vmagent: add the ability to limit the number of unique time series scraped per each target. This can be done either globally via `-promscrape.seriesLimitPerTarget` command-line option or on per-target basis via `series_limit` option at `scrape_config` section. See [the updated docs on cardinality limiter](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1561).
|
||||
* FEATURE: vmagent: discover `role: ingress` and `role: endpointslice` in [kubernetes_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) via v1 API instead of v1beta1 API if Kubernetes supports it. This fixes service discovery in Kubernetes v1.22 and newer versions. See [these docs](https://kubernetes.io/docs/reference/using-api/deprecation-guide/#ingress-v122).
|
||||
* FEATURE: take into account failed queries in `vm_request_duration_seconds` summary at `/metrics`. Previously only successful queries were taken into account. This could result in skewed summary. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1537).
|
||||
* FEATURE: vmalert: add an official dashboard for vmalert. See [these docs](https://docs.victoriametrics.com/vmalert.html#monitoring).
|
||||
* FEATURE: vmalert: add ability to set additional labels per group via `labels` config section. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1471).
|
||||
* FEATURE: vmalert: add `-disableAlertgroupLabel` command-line flag for disabling the label with alert group name. This may be needed for proper deduplication in Alertmanager. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1532).
|
||||
* FEATURE: update Go builder from v1.16.7 to v1.17.0. This improves data ingestion and query performance by up to 5% according to benchmarks. See [the release post for Go1.17](https://go.dev/blog/go1.17).
|
||||
* FEATURE: vmagent: expose `promscrape_discovery_http_errors_total` metric, which can be used for monitoring the number of failed discovery attempts per each `http_sd` config.
|
||||
* FEATURE: do not reset response cache when a sample with old timestamp is ingested into VictoriaMetrics if `-search.disableAutoCacheReset` command-line option is set. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1570).
|
||||
* FEATURE: add `quantiles("phiLabel", phi1, ..., phiN, q)` aggregate function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html), which calculates the given `phi*` quantiles over time series returned by `q`. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1573).
|
||||
|
||||
* BUGFIX: rename `sign` function to `sgn` in order to be consistent with PromQL. See [this pull request from Prometheus](https://github.com/prometheus/prometheus/pull/8457).
|
||||
* BUGFIX: vmagent: add `role: endpointslice` in [kubernetes_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) in order to be consistent with Prometheus. Previously this role was supported with incorrect name: `role: endpointslices`. Now both `endpointslice` and `endpointslices` are supported. See [the corresponding code in Prometheus](https://github.com/prometheus/prometheus/blob/2ec6c7dbb82b72834021e01f1773eb90a67a371f/discovery/kubernetes/kubernetes.go#L99).
|
||||
* BUGFIX: improve the detection of the needed free space for background merge operation. This should prevent from possible out of disk space crashes during big merges. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1560).
|
||||
* BUGFIX: vmauth: remove trailing slash from the full url before requesting it from the backend. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1554).
|
||||
* BUGFIX: [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager.html): fix timeout error when snapshot takes longer than 10 seconds. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1571).
|
||||
* BUGFIX: properly parse OpenTSDB `put` messages with multiple spaces between message elements. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1574). Thanks to @envzhu for the fix.
|
||||
|
||||
|
||||
## [v1.64.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.64.1)
|
||||
|
||||
Released at 19-08-2021
|
||||
|
||||
* FEATURE: add `bitmap_and(q, mask)`, `bitmap_or(q, mask)` and `bitmak_xor(q, mask)` functions to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). These functions allow performing bitwise operations over data points in time series. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1541).
|
||||
* FEATURE: vmalert: add `-remoteWrite.disablePathAppend` command-line flag, which can be used when custom `-remoteWrite.url` must be specified. For example, `./vmalert -disablePathAppend -remoteWrite.url='http://foo.bar/a/b/c?d=e'` would write data to `http://foo.bar/a/b/c?d=e` instead of `http://foo.bar/a/b/c?d=e/api/v1/write`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1536).
|
||||
* FEATURE: vmagent: add `-promscrape.noStaleMarkers` command-line flag for disabling sending Prometheus stale markers for metrics from disappeared scrape targets. This option may be used for reducing memory usage when scraping big number of metrics with big number of labels and when stale markers aren't needed.
|
||||
* FEATURE: vmselect: add `-search.noStaleMarkers` command-line flag for disabling stale markers handling in queries. This may save some CPU time when the queried data doesn't contain stale markers.
|
||||
|
||||
* BUGFIX: vmagent: stop scrapers for deleted targets before starting scrapers for added targets. This should prevent from possible time series overlap when old targets are substituted by new targets (for example, during new deployment in Kubernetes). The overlap could lead to incorrect query results. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1509).
|
||||
* BUGFIX: vmagent: send Prometheus stale markers for the previously scraped metrics on failed scrapes like Prometheus does. See [this article](https://www.robustperception.io/staleness-and-promql).
|
||||
* BUGFIX: upgrade base Docker image from Alpine 3.14.0 to Alpine 3.14.1 . This fixes potential security issues - see [Alpine 3.14.1 release notes](https://www.alpinelinux.org/posts/Alpine-3.14.1-released.html).
|
||||
* BUGFIX: disable overriding the lookbehind window `[d]` at `last_over_time(m[d])` if `d` is smaller than the interval between samples, since users don't expect implicit overriding of explicitly set `[d]` in `last_over_time(m[d])`.
|
||||
|
||||
|
||||
## [v1.64.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.64.0)
|
||||
|
||||
Released at 15-08-2021
|
||||
|
||||
* FEATURE: add support for Prometheus staleness markers. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526).
|
||||
* FEATURE: vmagent: automatically generate Prometheus staleness markers for the scraped metrics when scrape targets disappear in the same way as Prometheus does. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526).
|
||||
* FEATURE: add `present_over_time(m[d])` function, which returns 1 if `m` has a least a single sample over the previous duration `d`. This function has been added also to [Prometheus 2.29](https://github.com/prometheus/prometheus/releases/tag/v2.29.0).
|
||||
* FEATURE: vmagent: support multitenant writes according to [these docs](https://docs.victoriametrics.com/vmagent.html#multitenancy). This allows using a single `vmagent` instance in front of VictoriaMetrics cluster for all the tenants. Thanks to @omarghader for [the pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1505). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1491).
|
||||
* FEATURE: vmagent: add `__meta_ec2_availability_zone_id` label to discovered Amazon EC2 targets. This label is available in Prometheus [starting from v2.29](https://github.com/prometheus/prometheus/releases/tag/v2.29.0).
|
||||
* FAETURE: vmagent: add `__meta_gce_interface_ipv4_<name>` labels to discovered GCE targets. These labels are available in Prometheus [starting from v2.29](https://github.com/prometheus/prometheus/releases/tag/v2.29.0).
|
||||
* FEATURE: add `-search.maxSamplesPerSeries` command-line flag for limiting the number of raw samples a single query can process per each time series. This option can protect from out of memory errors when a query processes tens of millions of raw samples per series. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1067).
|
||||
* FEATURE: add `-search.maxSamplesPerQuery` command-line flag for limiting the number of raw samples a single query can process across all the time series. This option can protect from heavy queries, which select too big number of raw samples. Thanks to @jiangxinlingdu for [the initial pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1478).
|
||||
* FEATURE: improve performance for queries that process big number of time series and/or samples on systems with big number of CPU cores.
|
||||
* FEATURE: vmalert: expose `vmalert_alerting_rules_last_evaluation_samples` and `vmalert_recording_rules_last_evaluation_samples` metrics. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1494).
|
||||
* FEATURE: vminsert: expose `vm_rpc_send_duration_seconds_total` counter, which can be used for determining high saturation of every `vminsert -> vmstorage` link with an alerting query `rate(vm_rpc_send_duration_seconds_total) > 0.9s`. This query triggers when the link is saturated by more than 90%. This usually means that more `vminsert` or `vmstorage` nodes must be added to the cluster in order to increase the total number of `vminsert -> vmstorage` links.
|
||||
* FEATURE: vmagent: expose `vmagent_remotewrite_send_duration_seconds_total` counter, which can be used for determining high saturation of every connection to remote storage with an alerting query `rate(vmagent_remotewrite_send_duration_seconds_total) > 0.9s`. This query triggers when a connection is saturated by more than 90%. This usually means that `-remoteWrite.queues` command-line flag must be increased in order to increase the number of connections per each remote storage.
|
||||
* FEATURE: vmui: automatically fill Server URL field. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1506).
|
||||
|
||||
* BUGFIX: fix corner cases for queries on time ranges exceeding 40 days. Previously some series can be missing in query results. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1486).
|
||||
* BUGFIX: vmselect: return dummy response at `/rules` page in the same way as for `/api/v1/rules` page. The `/rules` page is requested by Grafana 8. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1493) for details.
|
||||
* BUGFIX: vmbackup: automatically set default `us-east-1` S3 region if it is missing. This should simplify using S3-compatible services such as MinIO for backups. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1449).
|
||||
* BUGFIX: vmselect: prevent from possible deadlock when multiple `target` query args are passed to [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage).
|
||||
* BUGFIX: return series with `a op b` labels and `N` values for `(a op b) default N` if `(a op b)` returns series with all NaN values. Previously such series were removed.
|
||||
* BUGFIX: vmui: fix layout when the query selects more than 27 time series. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1497).
|
||||
* BUGFIX: vmagent: restore highlighting in red for DOWN targets at `/targets` page. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1461).
|
||||
|
||||
|
||||
## [v1.63.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.63.0)
|
||||
|
||||
Released at 15-07-2021
|
||||
|
||||
* FEATURE: reduce memory usage by up to 30% on production workloads.
|
||||
* FEATURE: vmselect: embed [vmui](https://github.com/VictoriaMetrics/vmui) into a single-node VictoriaMetrics and into `vmselect` component of cluster version. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1413). The web interface is available at the following paths:
|
||||
* `/vmui/` for a single-node VictoriaMetrics
|
||||
* `/select/<accountID>/vmui/` for `vmselect` at cluster version of VictoriaMetrics
|
||||
* FEATURE: support durations anywhere in [MetricsQL queries](https://docs.victoriametrics.com/MetricsQL.html). For example, `sum_over_time(m[1h]) / 1h` is a valid query, which is equivalent to `sum_over_time(m[1h]) / 3600`.
|
||||
* FEATURE: support durations without suffixes in [MetricsQL queries](https://docs.victoriametrics.com/MetricsQL.html). For example, `rate(m[3600])` is a valid query, which is equivalent to `rate(m[1h])`.
|
||||
* FEATURE: export `vmselect_request_duration_seconds` and `vminsert_request_duration_seconds` [VictoriaMetrics histograms](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) at `/metrics` page. These histograms can be used for determining latency distribution and SLI/SLO for the served requests. For example, the following query would return the percent of queries that took less than 500ms during the last hour: `histogram_share(500ms, sum(rate(vmselect_request_duration_seconds_bucket[1h])) by (vmrange))`.
|
||||
* FEATURE: vmagent: dynamically reload client TLS certificates from disk on every [mTLS connection](https://developers.cloudflare.com/cloudflare-one/identity/devices/mutual-tls-authentication). This should allow using `vmagent` with [Istio service mesh](https://istio.io/latest/about/service-mesh/). See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1420).
|
||||
* FEATURE: log http request path plus all the query args on errors during request processing. Previously only http request path was logged without query args, so it could be hard debugging such errors.
|
||||
* FEATURE: add `is_set` label to `flag` metrics. This allows determining explicitly set command-line flags with the query `flag{is_set="true"}`.
|
||||
* FEATURE: add ability to remove caches stored inside `<-storageDataPath>/cache` on startup if `reset_cache_on_startup` file is present there. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1447).
|
||||
|
||||
* BUGFIX: vmagent: remove `{ %space %}` typo in `/targets` output. The typo has been introduced in v1.62.0. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1408).
|
||||
* BUGFIX: vmagent: fix CSS styles on `/targets` page. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1422).
|
||||
* BUGFIX: vmalert: accept Prometheus-like durations in `interval` config option inside `group` section. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1444).
|
||||
* BUGFIX: properly update `vm_merge_need_free_disk_space` metric at `/metrics` page when there is no enough free disk space for performing optimal merges. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1373).
|
||||
|
||||
|
||||
## [v1.62.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.62.0)
|
||||
|
||||
Released at 25-06-2021
|
||||
|
||||
* FEATURE: vmagent: add service discovery for Docker (aka [docker_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config)). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1402).
|
||||
* FEATURE: vmagent: add service discovery for DigitalOcean (aka [digitalocean_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config)). See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1367).
|
||||
* FEATURE: vmagent: change the default value for `-remoteWrite.queues` from 4 to `2 * numCPUs`. This should reduce scrape duration for highly loaded vmagent, which scrapes tens of thousands of targets. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1385).
|
||||
* FEATURE: vmagent: show the number of samples the target returns during the last scrape on `/targets` and `/api/v1/targets` pages. This should simplify debugging targets, which may return too big or too low number of samples. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1377).
|
||||
* FEATURE: vmagent: show jobs with zero discovered targets on `/targets` page. This should help debugging improperly configured scrape configs.
|
||||
* FEATURE: vmagent: support for http-based service discovery (aka [http_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config)), which has been added since Prometheus 2.28. See [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1392).
|
||||
* FEATURE: vmagent: support namespace in Consul serive discovery in the same way as Prometheus 2.28 does. See [this issue](https://github.com/prometheus/prometheus/issues/8894) for details.
|
||||
* FEATURE: vmagent: support generic auth configs in `consul_sd_configs` in the same way as Prometheus 2.28 does. See [this issue](https://github.com/prometheus/prometheus/issues/8924) for details.
|
||||
* FEATURE: [vmctl](https://docs.victoriametrics.com/vmctl.html): limit the number of samples per each imported JSON line. This should limit the memory usage at VictoriaMetrics side when importing time series with big number of samples.
|
||||
* FEATURE: vmselect: log slow queries across all the `/api/v1/*` handlers (aka [Prometheus query API](https://prometheus.io/docs/prometheus/latest/querying/api)) if their execution duration exceeds `-search.logSlowQueryDuration`. This should simplify debugging slow requests to such handlers as `/api/v1/labels` or `/api/v1/series` additionally to `/api/v1/query` and `/api/v1/query_range`, which were logged in the previous releases.
|
||||
* FEATURE: vminsert: sort the `-storageNode` list in order to guarantee the identical `series -> vmstorage` mapping across all the `vminsert` nodes. This should reduce resource usage (RAM, CPU and disk IO) at `vmstorage` nodes if `vmstorage` addresses are passed in random order to `vminsert` nodes.
|
||||
* FEATURE: vmstorage: reduce memory usage on a system with many CPU cores under high ingestion rate.
|
||||
|
||||
* BUGFIX: prevent from adding new samples to deleted time series after the rotation of the inverted index (the rotation is performed once per `-retentionPeriod`). See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347#issuecomment-861232136) for details.
|
||||
* BUGFIX: vmstorage: reduce high disk write IO usage on systems with big number of CPU cores. The issue has been introduced in the release [v1.59.0](#v1590). See [this commit](aa9b56a046b6ae8083fa659df35dd5e994bf9115) and [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338#issuecomment-863046999) for details.
|
||||
* BUGFIX: vmstorage: prevent from incorrect stats collection when multiple concurrent queries execute the same tag filter. This may help reducing CPU usage under certain workloads. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1338).
|
||||
* BUGFIX: vmselect: return the last timestamp for the max / min value from `tmax_over_time(m[d])` and `tmin_over_time(m[d])` [MetricsQL functions](https://docs.victoriametrics.com/MetricsQL.html) as most users expect. See also [this issue](https://github.com/prometheus/prometheus/issues/8966).
|
||||
* BUGFIX: vmselect: return the expected value for `increase_pure()` [MetricsQL function](https://docs.victoriametrics.com/MetricsQL.html) after a gap in a time series. Previously incorrect too big value could be returned after the gap from `increase_pure()`.
|
||||
|
||||
|
||||
## [v1.61.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.61.1)
|
||||
|
||||
Released at 11-06-2021
|
||||
|
||||
* BUGFIX: vmalert: fix recording rules, which were broken in v1.61.0. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1369).
|
||||
* BUGFIX: reset the on-disk cache for mapping from the full metric name to an internal metric id (e.g. `metric_name{labels} -> internal_metric_id`) after deleting metrics via [delete API](https://docs.victoriametrics.com/#how-to-delete-time-series). This should prevent from possible inconsistent state after unclean shutdown. This [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1347).
|
||||
|
||||
|
||||
## [v1.61.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.61.0)
|
||||
|
||||
Released at 09-06-2021
|
||||
|
||||
* FEATURE: vmalert: add support for backfilling (aka replay) of recording and alerting rules. See [these docs](https://docs.victoriametrics.com/vmalert.html#rules-backfilling) and [this feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/836).
|
||||
* FEATURE: vmalert: add a command-line flag `-rule.configCheckInterval` for automatic re-reading of `-rule` files without the need to send SIGHUP signal. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/512).
|
||||
* FEATURE: vmagent: respect the `sample_limit` and `-promscrape.maxScrapeSize` values when scraping targets in [stream parsing mode](https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode). See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1331).
|
||||
* FEATURE: vmauth: add ability to specify mutliple `url_prefix` entries for balancing the load among multiple `vmselect` and/or `vminsert` nodes in a cluster. See [these docs](https://docs.victoriametrics.com/vmauth.html#load-balancing).
|
||||
* FEATURE: vminsert: add `-disableRerouting` command-line flag for forcibly disabling the rerouting. This should help resolving [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054) issues.
|
||||
* FEATURE: vminsert: reduce the probability of global re-routing storm if all the vmstorage nodes cannot keep up with the given ingestion rate for some time. This should improve cluster stability in such cases. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791) and [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1054) issues.
|
||||
* FEATURE: allow building VictoriaMetrics components for Solaris / SmartOS. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1322).
|
||||
* FEATURE: vmagent: add ability to debug relabeling rules. See [these docs](https://docs.victoriametrics.com/vmagent.html#relabeling) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1343).
|
||||
|
||||
* BUGFIX: reduce CPU usage by up to 2x during querying a database with big number of active daily time series. The issue has been introduced in `v1.59.0`.
|
||||
* BUGFIX: vmagent: properly apply auth and tls configs in `eureka_sd_configs`. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1350).
|
||||
* BUGFIX: vmauth: do not panic on aborted http requests. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1353).
|
||||
* BUGFIX: properly generate `target` property for `*Series(foo.*.bar)` responses returned from [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage). Previously the `target` contained the expanded list of series for `foo.*.bar`, e.g. `sumSeries(foo.a.bar,foo.b.bar,...foo.z.bar)`. Now VictoriaMetrics returns `sumSeries(foo.*.bar)` as a target in the same way as Graphite does.
|
||||
|
||||
|
||||
## [v1.60.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.60.0)
|
||||
|
||||
Released at 24-05-2021
|
||||
|
||||
* FEATURE: add ability to limit the number of unique time series, which can be added to storage per hour and per day. This can help dealing with high cardinality and high churn rate issues. See [these docs](https://docs.victoriametrics.com/#cardinality-limiter).
|
||||
* FEATURE: vmagent: add ability to limit the number of unique time series, which can be sent to remote storage systems per hour and per day. This can help dealing with high cardinality and high churn rate issues. See [these docs](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter).
|
||||
* FEATURE: vmalert: add ability to run alerting and recording rules for multiple tenants. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/740) and [these docs](https://docs.victoriametrics.com/vmalert.html#multitenancy).
|
||||
* FEATURE: vminsert: add support for data ingestion via other `vminsert` nodes. This allows building multi-level data ingestion paths in VictoriaMetrics cluster by writing data from one level of `vminsert` nodes to another level of `vminsert` nodes. See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multi-level-cluster-setup) and [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/541#issuecomment-835487858) for details.
|
||||
* FEATURE: vmagent: reload `bearer_token_file`, `credentials_file` and `password_file` contents every second. This allows dynamically changing the contents of these files during target scraping and service discovery without the need to restart `vmagent`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1297).
|
||||
* FEATURE: vmalert: add a flag to control behaviour on startup for state restore errors. Such errors were returned and logged before as well. Now user can specify whether to just log these errors (`-remoteRead.ignoreRestoreErrors=true`) or to stop the process (`-remoteRead.ignoreRestoreErrors=false`). The latter is important when VM isn't ready yet to serve queries from vmalert and it needs to wait. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1252).
|
||||
* FEATURE: vmalert: add ability to pass `round_digits` query arg to datasource via `-datasource.roundDigits` command-line flag. This can be used for limiting the number of decimal digits after the point in recording rule results. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/525).
|
||||
* FEATURE: return `X-Server-Hostname` header in http responses of all the VictoriaMetrics components. This should simplify tracing the origin server behind a load balancer or behind auth proxy during troubleshooting.
|
||||
* FEATURE: vmselect: allow to use 2x more memory for query processing at `vmselect` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html). This should allow processing heavy queries without the need to increase RAM size at `vmselect` nodes.
|
||||
* FEATURE: add ability to filter `/api/v1/status/tsdb` output with arbitrary [time series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) passed via `match[]` query args. See [these docs](https://docs.victoriametrics.com/#tsdb-stats) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1168) for details.
|
||||
* FEATURE: automatically detect memory and cpu limits for VictoriaMetrics components running under [cgroup v2](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html) environments such as [HashiCorp Nomad](https://www.nomadproject.io/). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1269).
|
||||
* FEATURE: vmauth: allow `-auth.config` reloading via `/-/reload` http endpoint. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1194).
|
||||
* FEATURE: add `timezone_offset(tz)` function. It returns offset in seconds for the given timezone `tz` relative to UTC. This can be useful when combining with datetime-related functions. For example, `day_of_week(time()+timezone_offset("America/Los_Angeles"))` would return weekdays for `America/Los_Angeles` time zone. Special `Local` time zone can be used for returning an offset for the time zone set on the host where VictoriaMetrics runs. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1306) and [MetricsQL docs](https://docs.victoriametrics.com/MetricsQL.html) for more details.
|
||||
* FEATURE: vmagent: add support for OAuth2 authorization for scrape targets and service discovery in the same way as Prometheus does. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#oauth2).
|
||||
* FEATURE: vmagent: add support for OAuth2 authorization when writing data to `-remoteWrite.url`. See `-remoteWrite.oauth2.*` config params in `/path/to/vmagent -help` output.
|
||||
* FEATURE: vmalert: add ability to set `extra_filter_labels` at alerting and recording group configs. See [these docs](https://docs.victoriametrics.com/vmalert.html#groups).
|
||||
* FEATURE: vmstorage: reduce memory usage by up to 30% when ingesting big number of active time series.
|
||||
|
||||
* BUGFIX: vmagent: do not retry scraping targets, which don't support HTTP. This should reduce CPU load and network usage at `vmagent` and at scrape target. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1289).
|
||||
* BUGFIX: vmagent: fix possible race when refreshing `role: endpoints` and `role: endpointslices` scrape targets in `kubernetes_sd_config`. Prevoiusly `pod` objects could be updated after the related `endpoints` object update. This could lead to missing scrape targets. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240).
|
||||
* BUGFIX: vmagent: properly spread scrape targets among `vmagent` replicas if `-promscrape.cluster.replicationFactor` exceeds 1. See [this pull request](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1292).
|
||||
* BUGFIX: vmagent: limit `scrape_timeout` by `scrape_interval`. This guarantees that only a single sample is lost during the configured `scrape_interval` when scrape target responds slowly. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1281#issuecomment-840538907) for details.
|
||||
* BUGFIX: properly remove stale parts outside the configured retention if `-retentionPeriod` is smaller than one month. Previously stale parts could remain active for up to a month after they go outside the retention.
|
||||
* BUGFIX: stop the process on panic errors, since such errors may leave the process in inconsistent state. Previously panics could be recovered, which could result in unexpected hard-to-debug further behavior of running process.
|
||||
* BUGFIX: vminsert, vmagent: make sure data ingestion connections are closed before completing graceful shutdown. Previously the connection may remain open, which could result in trailing samples loss.
|
||||
* BUGFIX: vmauth, vmalert: properly re-use HTTP keep-alive connections to backends and datasources. Previously only 2 keep-alive connections per backend could be re-used. Other connections were closed after the first request. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1300) for details.
|
||||
* BUGFIX: vmalert: fix false positive error `result contains metrics with the same labelset after applying rule labels`, which could be triggered when recording rules generate unique metrics. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1293).
|
||||
* BUGFIX: vmctl: properly import InfluxDB rows if they have a field and a tag with identical names. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1299).
|
||||
* BUGFIX: properly reload configs if `SIGHUP` signal arrives during service initialization. Previously such `SIGHUP` signal could be ingonred and configs weren't reloaded.
|
||||
* BUGFIX: vmalert: properly import default rules from OpenShift. See [this issue](https://github.com/VictoriaMetrics/operator/issues/243).
|
||||
* BUGFIX: reduce the probability of `the removal queue is full` panic when highly loaded VictoriaMetrics stores data on NFS. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1313).
|
||||
|
||||
|
||||
## [v1.59.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.59.0)
|
||||
|
||||
Released at 01-05-2021
|
||||
|
||||
* FEATURE: improved new time series registration speed on systems with many CPU cores. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1244). Thanks to @waldoweng for the idea and [draft implementation](https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1243).
|
||||
* FEATURE: vmalert: use the same technique as Grafana for determining evaluation timestamps for recording rules. This should make consistent graphs for series generated by recording rules compared to graphs generated for queries from recording rules in Grafana. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1232).
|
||||
* FEATURE: vmauth: add ability to set madatory query args in `url_prefix`. For example, `url_prefix: http://vm:8428/?extra_label=team=dev` would add `extra_label=team=dev` query arg to all the incoming requests. See [the example](https://docs.victoriametrics.com/vmauth.html#auth-config) for more details.
|
||||
* FEATURE: vmctl: add OpenTSDB migration option. See more details [here](https://docs.victoriametrics.com/vmctl#migrating-data-from-opentsdb).
|
||||
Thanks to @johnseekins!
|
||||
* FEATURE: log metrics with dropped labels if the number of labels in the ingested metric exceeds `-maxLabelsPerTimeseries`. This should simplify debugging for this case.
|
||||
* FEATURE: vmagent: list user-visible endpoints at `http://vmagent:8429/`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1251).
|
||||
|
||||
* BUGFIX: vmagent: properly update `role: endpoints` and `role: endpointslices` scrape targets if the underlying service objects are updated in `kubernetes_sd_config`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1240).
|
||||
* BUGFIX: vmagent: apply `scrape_timeout` on receiving the first response byte from `stream_parse: true` scrape targets. Previously it was applied to receiving and *processing* the full response stream. This could result in false timeout errors when scrape target exposes millions of metrics as described [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017#issuecomment-767235047).
|
||||
* BUGFIX: vmagent: eliminate possible data race when obtaining value for the metric `vm_persistentqueue_bytes_pending`. The data race could result in incorrect value for this metric.
|
||||
* BUGFIX: vmstorage: remove empty directories on startup. Such directories can be left after unclean shutdown on NFS storage. Previously such directories could lead to crashloop until manually removed. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1142).
|
||||
|
||||
|
||||
## [v1.58.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.58.0)
|
||||
|
||||
Released at 08-04-2021
|
||||
|
||||
* FEATURE: vminsert and vmagent: add `-sortLabels` command-line flag for sorting metric labels before pushing them to `vmstorage`. This should reduce the size of `MetricName -> internal_series_id` cache (aka `vm_cache_size_bytes{type="storage/tsid"}`) when ingesting samples for the same time series with distinct order of labels. For example, `foo{k1="v1",k2="v2"}` and `foo{k2="v2",k1="v1"}` represent a single time series. Labels sorting is disabled by default, since the majority of established exporters preserve the order of labels for the exported metrics.
|
||||
* FEATURE: allow specifying label value alongside label name for the `others sum` time series returned from `topk_*` and `bottomk_*` functions from [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html). For example, `topk_avg(3, max(process_resident_memory_bytes) by (instance), "instance=other_sum")` would return top 3 series from `max(process_resident_memory_bytes) by (instance)` plus a series containing the sum of other series. The `others sum` series will have `{instance="other_sum"}` label.
|
||||
* FEATURE: do not delete `dst_label` when applying `label_copy(q, "src_label", "dst_label")` and `label_move(q, "src_label", "dst_label")` to series without `src_label` and with non-empty `dst_label`. See more details at [MetricsQL docs](https://docs.victoriametrics.com/MetricsQL.html).
|
||||
* FEATURE: update Go builder from `v1.16.2` to `v1.16.3`. This should fix [these issues](https://github.com/golang/go/issues?q=milestone%3AGo1.16.3+label%3ACherryPickApproved).
|
||||
* FEATURE: vmagent: add support for `follow_redirects` option to `scrape_configs` section in the same way as [Prometheus 2.26 does](https://github.com/prometheus/prometheus/pull/8546).
|
||||
* FEATURE: vmagent: add support for `authorization` section in `-promscrape.config` in the same way as [Prometheus 2.26 does](https://github.com/prometheus/prometheus/pull/8512).
|
||||
* FEATURE: vmagent: add support for socks5 proxy in `proxy_url` config option. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1177).
|
||||
* FEATURE: vmagent: add support for `socks5 over tls` proxy in `proxy_url` config option. It can be set up with the following config: `proxy_url: "tls+socks5://proxy-addr:port"`.
|
||||
* FEATURE: vmagent: reduce memory usage when `-remoteWrite.queues` is set to a big value. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1167).
|
||||
* FEATURE: vmagent: add AWS IAM roles for tasks support for EC2 service discovery according to [these docs](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html).
|
||||
* FEATURE: vmagent: add support for `proxy_tls_config`, `proxy_authorization`, `proxy_basic_auth`, `proxy_bearer_token` and `proxy_bearer_token_file` options in `consul_sd_config`, `dockerswarm_sd_config` and `eureka_sd_config` sections.
|
||||
* FEATURE: vmagent: pass `X-Prometheus-Scrape-Timeout-Seconds` header to scrape targets as Prometheus does. In this case scrape targets can limit the time needed for performing the scrape. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179#issuecomment-813118733) for details.
|
||||
* FEATURE: vmagent: drop corrupted persistent queue files at `-remoteWrite.tmpDataPath` instead of throwing a fatal error. Corrupted files can appear after unclean shutdown of `vmagent` such as OOM kill or hardware reset. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1030).
|
||||
* FEATURE: vmauth: add support for authorization via [bearer token](https://swagger.io/docs/specification/authentication/bearer-authentication/). See [the docs](https://docs.victoriametrics.com/vmauth.html#auth-config) for details.
|
||||
* FEATURE: publish `arm64` and `amd64` binaries for cluster version of VictoriaMetrics at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
|
||||
|
||||
* BUGFIX: properly handle `/api/v1/labels` and `/api/v1/label/<label_name>/values` queries on big `start ... end` time range. This should fix big resource usage when VictoriaMetrics is queried with [Promxy](https://github.com/jacksontj/promxy) v0.0.62 or newer versions.
|
||||
* BUGFIX: do not break sort order for series returned from `topk*`, `bottomk*` and `outliersk` [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) functions. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1189).
|
||||
* BUGFIX: vmagent: properly work with simple HTTP proxies which don't support `CONNECT` method. For example, [PushProx](https://github.com/prometheus-community/PushProx). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1179).
|
||||
* BUGFIX: vmagent: properly discover targets if multiple namespace selectors are put inside `kubernetes_sd_config`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1170).
|
||||
* BUGFIX: vmagent: properly discover `role: endpoints` and `role: endpointslices` targets in `kubernetes_sd_config`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1182).
|
||||
* BUGFIX: properly generate filename for `*.tar.gz` archive inside `_checksums.txt` file posted at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1171).
|
||||
|
||||
|
||||
## [v1.57.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.57.1)
|
||||
|
||||
Released at 30-03-2021
|
||||
|
||||
* FEATURE: publish vmutils for `GOOS=arm` on [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
|
||||
|
||||
* BUGFIX: prevent from possible incomplete query results after timed out query.
|
||||
* BUGFIX: vmselect: remove `-search.storageTimeout` command-line flag, since it has the same meaning as `-search.maxQueryDuration`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711#issuecomment-808884995).
|
||||
* BUGFIX: vminsert: return back `type` label to per-tenant metric `vm_tenant_inserted_rows_total`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/932).
|
||||
|
||||
|
||||
## [v1.57.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.57.0)
|
||||
|
||||
Released at 29-03-2021
|
||||
|
||||
* FEATURE: optimize query performance by up to 10x on systems with many CPU cores. See [this tweet](https://twitter.com/MetricsVictoria/status/1375064484860067840).
|
||||
* FEATURE: add the following metrics at `/metrics` page for every VictoraMetrics app:
|
||||
* `process_resident_memory_anon_bytes` - RSS share for memory allocated by the process itself. This share cannot be freed by the OS, so it must be taken into account by OOM killer.
|
||||
* `process_resident_memory_file_bytes` - RSS share for page cache memory (aka memory-mapped files). This share can be freed by the OS at any time, so it must be ignored by OOM killer.
|
||||
* `process_resident_memory_shared_bytes` - RSS share for memory shared with other processes (aka shared memory). This share can be freed by the OS at any time, so it must be ignored by OOM killer.
|
||||
* `process_resident_memory_peak_bytes` - peak RSS usage for the process.
|
||||
* `process_virtual_memory_peak_bytes` - peak virtual memory usage for the process.
|
||||
* FEATURE: accept and enforce `extra_label=<label_name>=<label_value>` query arg at [Graphite APIs](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
* FEATURE: use InfluxDB field as metric name if measurement is empty and `-influxSkipSingleField` command-line is set. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1139).
|
||||
* FEATURE: vmagent: add `-promscrape.consul.waitTime` command-line flag for tuning the maximum wait time for Consul service discovery. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1144).
|
||||
* FEATURE: vmagent: add `vm_promscrape_discovery_kubernetes_stale_resource_versions_total` metric for monitoring the frequency of `too old resource version` errors during Kubernetes service discovery.
|
||||
* FEATURE: single-node VictoriaMetrics: log metrics with timestamps older than `-search.cacheTimestampOffset` compared to the current time. See [these docs](https://docs.victoriametrics.com/#backfilling) for details.
|
||||
|
||||
* BUGFIX: prevent from infinite loop on `{__graphite__="..."}` filters when a metric name contains `*`, `{` or `[` chars.
|
||||
* BUGFIX: prevent from infinite loop in `/metrics/find` and `/metrics/expand` [Graphite Metrics API handlers](https://docs.victoriametrics.com/#graphite-metrics-api-usage) when they match metric names or labels with `*`, `{` or `[` chars.
|
||||
* BUGFIX: do not merge duplicate time series during requests to `/api/v1/query`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1141
|
||||
* BUGFIX: vmagent: properly handle `too old resource version` error messages from Kubernetes watch API. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1150
|
||||
* BUGFIX: vmagent: do not retry sending data blocks if remote storage returns `400 Bad Request` error. The number of dropped blocks due to such errors can be monitored with `vmagent_remotewrite_packets_dropped_total` metrics. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1149
|
||||
* BUGFIX: properly calculate `summarize` and `*Series` functions in [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage).
|
||||
|
||||
|
||||
## [v1.56.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.56.0)
|
||||
|
||||
Released at 17-03-2021
|
||||
|
||||
* FEATURE: add the following functions to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html):
|
||||
- `histogram_avg(buckets)` - returns the average value for the given buckets.
|
||||
- `histogram_stdvar(buckets)` - returns standard variance for the given buckets.
|
||||
- `histogram_stddev(buckets)` - returns standard deviation for the given buckets.
|
||||
* FEATURE: export `vm_available_memory_bytes` and `vm_available_cpu_cores` metrics, which show the number of available RAM and available CPU cores for VictoriaMetrics apps.
|
||||
* FEATURE: export `vm_index_search_duration_seconds` histogram, which can be used for troubleshooting time series search performance.
|
||||
* FEATURE: vmagent: add ability to replicate scrape targets among `vmagent` instances in the cluster with `-promscrape.cluster.replicationFactor` command-line flag. See [these docs](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
||||
* FEATURE: vmagent: accept `scrape_offset` option at `scrape_config`. This option may be useful when scrapes must start at the specified offset of every scrape interval. See [these docs](https://docs.victoriametrics.com/vmagent.html#troubleshooting) for details.
|
||||
* FEATURE: vmagent: support `proxy_tls_config`, `proxy_basic_auth`, `proxy_bearer_token` and `proxy_bearer_token_file` options at `scrape_config` section for configuring proxies specified via `proxy_url`. See [these docs](https://docs.victoriametrics.com/vmagent.html#scraping-targets-via-a-proxy).
|
||||
* FEATURE: vmauth: allow using regexp paths in `url_map`. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1112) for details.
|
||||
* FEATURE: accept `round_digits` query arg at `/api/v1/query` and `/api/v1/query_range` handlers. This option can be set at Prometheus datasource in Grafana for limiting the number of digits after the decimal point in response values.
|
||||
* FEATURE: add `-influx.databaseNames` command-line flag, which can be used for accepting data from some Telegraf plugins such as [fluentd plugin](https://github.com/fangli/fluent-plugin-influxdb). See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1124).
|
||||
* FEATURE: add `-logNewSeries` command-line flag, which can be used for debugging the source of time series churn rate.
|
||||
* FEATURE: publish Windows builds for [vmagent](https://docs.victoriametrics.com/vmagent.html), [vmalert](https://docs.victoriametrics.com/vmalert.html), [vmauth](https://docs.victoriametrics.com/vmauth.html) and [vmctl](https://docs.victoriametrics.com/vmctl.html) at `vmutils-windows-*.zip` archives at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
|
||||
* FEATURE: listen for IPv6 UDP if `-enableTCP6` command-line flag is passed to VictoriaMetrics. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1131).
|
||||
|
||||
* BUGFIX: vmagent: prevent from high CPU usage bug during failing scrapes with small `scrape_timeout` (less than a few seconds).
|
||||
* BUGFIX: vmagent: reduce memory usage when Kubernetes service discovery is used in big number of distinct scrape config jobs by sharing Kubernetes object cache. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1113
|
||||
* BUGFIX: vmagent: apply `sample_limit` only after `metric_relabel_configs` are applied as Prometheus does. Previously the `sample_limit` was applied before metrics relabeling.
|
||||
* BUGFIX: vmagent: properly apply `tls_config`, `basic_auth` and `bearer_token` to proxy connections if `proxy_url` option is set. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
|
||||
* BUGFIX: vmagent: properly scrape targets via https proxy specified in `proxy_url` if `insecure_skip_verify` flag isn't set in `tls_config` section. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1116
|
||||
* BUGFUX: avoid `duplicate time series` error if `prometheus_buckets()` covers a time range with distinct set of buckets.
|
||||
* BUGFIX: prevent exponent overflow when processing extremely small values close to zero such as `2.964393875E-314`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1114
|
||||
* BUGFIX: do not include datapoints with a timestamp `t-d` when returning results from `/api/v1/query?query=m[d]&time=t` as Prometheus does.
|
||||
* BUGFIX: do not crash if a query contains `histogram_over_time()` function name with uppercase chars. For example, `Histogram_Over_Time(m[5m])`.
|
||||
|
||||
|
||||
## [v1.55.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.55.1)
|
||||
|
||||
Released at 03-03-2021
|
||||
|
||||
* BUGFIX: vmagent: fix a panic in Kubernetes service discovery when a target is filtered out with relabeling. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1107
|
||||
* BUGFIX: vmagent: fix Kubernetes service discovery for `role: ingress`. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1110
|
||||
|
||||
|
||||
## [v1.55.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.55.0)
|
||||
|
||||
Released at 02-03-2021
|
||||
|
||||
* FEATURE: add `sign(q)` and `clamp(q, min, max)` functions, which are planned to be added in [the upcoming Prometheus release](https://twitter.com/roidelapluie/status/1363428376162295811) . The `last_over_time(m[d])` function is already supported in [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html).
|
||||
* FEATURE: vmagent: add `scrape_align_interval` config option, which can be used for aligning scrapes to the beginning of the configured interval. See [these docs](https://docs.victoriametrics.com/vmagent.html#troubleshooting) for details.
|
||||
* FEATURE: expose io-related metrics at `/metrics` page for every VictoriaMetrics component:
|
||||
* `process_io_read_bytes_total` - the number of bytes read via io syscalls such as read and pread
|
||||
* `process_io_written_bytes_total` - the number of bytes written via io syscalls such as write and pwrite
|
||||
* `process_io_read_syscalls_total` - the number of read syscalls such as read and pread
|
||||
* `process_io_write_syscalls_total` - the number of write syscalls such as write and pwrite
|
||||
* `process_io_storage_read_bytes_total` - the number of bytes read from storage layer
|
||||
* `process_io_storage_written_bytes_total` - the number of bytes written to storage layer
|
||||
* FEATURE: vmagent: add ability to spread scrape targets among multiple `vmagent` instances. See [these docs](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1084) for details.
|
||||
* FEATURE: vmagent: use watch API for Kuberntes service discovery. This should reduce load on Kuberntes API server when it tracks big number of objects (for example, 10K pods). This should also reduce the time needed for k8s targets discovery. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1057) for details.
|
||||
* FEATURE: vmagent: export `vm_promscrape_target_relabel_duration_seconds` metric, which can be used for monitoring the time spend on relabeling for discovered targets.
|
||||
* FEATURE: vmagent: optimize [relabeling](https://docs.victoriametrics.com/vmagent.html#relabeling) performance for common cases.
|
||||
* FEATURE: add `increase_pure(m[d])` function to MetricsQL. It works the same as `increase(m[d])` except of various edge cases. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/962) for details.
|
||||
* FEATURE: increase accuracy for `buckets_limit(limit, buckets)` results for small `limit` values. See [MetricsQL docs](https://docs.victoriametrics.com/MetricsQL.html) for details.
|
||||
* FEATURE: vmagent: initial support for Windows build with `CGO_ENABLED=0 GOOS=windows go build -mod=vendor ./app/vmagent`. See [this](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70) and [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1036).
|
||||
* FEATURE: vmagent: support WebIdentityToken auth in EC2 service discovery. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1080) for details.
|
||||
* FEATURE: vmalert: properly process query params in `-datasource.url` and `-remoteRead.url` command-line flags. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1087) for details.
|
||||
|
||||
* BUGFIX: vmagent: properly apply `-remoteWrite.rateLimit` when `-remoteWrite.queues` is greater than 1. Previously there was a data race, which could prevent from proper rate limiting.
|
||||
* BUGFIX: vmagent: properly perform graceful shutdown on `SIGINT` and `SIGTERM` signals. The graceful shutdown has been broken in `v1.54.0`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1065
|
||||
* BUGFIX: reduce the probability of `duplicate time series` errors when querying Kubernetes metrics.
|
||||
* BUGFIX: properly calculate `histogram_quantile()` over time series with only a single non-zero bucket with `{le="+Inf"}`. Previously `NaN` was returned, now the value for the last bucket before `{le="+Inf"}` is returned like Prometheus does.
|
||||
* BUGFIX: vmselect: do not cache partial query results on timeout when receiving data from `vmstorage` nodes. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1085
|
||||
* BUGFIX: properly handle `stale NFS file handle` error.
|
||||
* BUGFIX: properly cache query results when `extra_label` query arg is used. Previously the cached results could clash for different `extra_label` values. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1095
|
||||
* BUGFIX: fix `http: superfluous response.WriteHeader call` issue. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1078
|
||||
* BUGFIX: fix arm64 builds due to the issue in `github.com/golang/snappy`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1074
|
||||
* BUGFIX: fix `index out of range [1024819115206086200] with length 27` panic, which could occur when `1e-9` value is passed to VictoriaMetrics histogram. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1096
|
||||
* BUGFIX: fix parsing for Graphite line with empty tags such as `foo; 123 456`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1100
|
||||
* BUGFIX: unescape only `\\`, `\n` and `\"` in label names when parsing Prometheus text exposition format as Prometheus does. Previously other escape sequences could be improperly unescaped.
|
||||
|
||||
|
||||
## [v1.54.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.54.1)
|
||||
|
||||
Released at 18-02-2021
|
||||
|
||||
* BUGFIX: properly handle queries containing a filter on metric name plus any number of negative filters and zero non-negative filters. For example, `node_cpu_seconds_total{mode!="idle"}`. The bug was introduced in [v1.54.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.54.0).
|
||||
|
||||
|
||||
## [v1.54.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.54.0)
|
||||
|
||||
Released at 18-02-2021
|
||||
|
||||
* FEATURE: optimize searching for matching metrics for `metric{<label_filters>}` queries if `<label_filters>` contains at least a single filter. For example, the query `up{job="foobar"}` should find the matching time series much faster than previously.
|
||||
* FEATURE: reduce execution times for `q1 <binary_op> q2` queries by executing `q1` and `q2` in parallel.
|
||||
* FEATURE: switch from Go1.15 to [Go1.16](https://golang.org/doc/go1.16) for building prod binaries.
|
||||
* FEATURE: single-node VictoriaMetrics now accepts requests to handlers with `/prometheus` and `/graphite` prefixes such as `/prometheus/api/v1/query`. This improves compatibility with [handlers from VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
|
||||
* FEATURE: expose `process_open_fds` and `process_max_fds` metrics. These metrics can be used for alerting when `process_open_fds` reaches `process_max_fds`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/402 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1037
|
||||
* FEATURE: vmalert: add `-datasource.appendTypePrefix` command-line option for querying both Prometheus and Graphite datasource in cluster version of VictoriaMetrics. See [these docs](https://docs.victoriametrics.com/vmalert.html#graphite) for details.
|
||||
* FEATURE: vmauth: add ability to route requests from a single user to multiple destinations depending on the requested paths. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1064
|
||||
* FEATURE: remove dependency on external programs such as `cat`, `grep` and `cut` when detecting cpu and memory limits inside Docker or LXC container.
|
||||
* FEATURE: vmagent: add `__meta_kubernetes_endpoints_label_*`, `__meta_kubernetes_endpoints_labelpresent_*`, `__meta_kubernetes_endpoints_annotation_*` and `__meta_kubernetes_endpoints_annotationpresent_*` labels for `role: endpoints` in Kubernetes service discovery. These labels where added in Prometheus 2.25.
|
||||
* FEATURE: reduce the minimum supported retention period for inverted index (aka `indexdb`) from one month to one day. This should reduce disk space usage for `<-storageDataPath>/indexdb` folder if `-retentionPeriod` is set to values smaller than one month.
|
||||
* FEATURE: vmselect: export per-tenant metrics `vm_vmselect_http_requests_total` and `vm_vmselect_http_requests_duration_ms_total` . Other per-tenant metrics are available as a part of [enterprise package](https://victoriametrics.com/products/enterprise/). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/932 for details.
|
||||
|
||||
* BUGFIX: properly convert regexp tag filters containing escaped dots to non-regexp tag filters. For example, `{foo=~"bar\.baz"}` should be converted to `{foo="bar.baz"}`. Previously it was incorrectly converted to `{foo="bar\.baz"}`, which could result in missing time series for this tag filter.
|
||||
* BUGFIX: do not spam error logs when discovering Docker Swarm targets without dedicated IP. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1028 .
|
||||
* BUGFIX: properly embed timezone data into VictoriaMetrics apps. This should fix `-loggerTimezone` usage inside Docker containers.
|
||||
* BUGFIX: properly build Docker images for non-amd64 architectures (arm, arm64, ppc64le, 386) on [Docker hub](https://hub.docker.com/u/victoriametrics/). Previously these images were incorrectly based on amd64 base image, so they didn't work.
|
||||
* BUGFIX: vmagent: return back unsent block to the queue during graceful shutdown. Previously this block could be dropped if remote storage is unavailable during vmagent shutdown. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1065 .
|
||||
|
||||
|
||||
## [v1.53.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.53.1)
|
||||
|
||||
Released at 03-02-2021
|
||||
|
||||
* BUGFIX: vmselect: fix the bug peventing from proper searching by Graphite filter with wildcards such as `{__graphite__="foo.*.bar"}`.
|
||||
|
||||
|
||||
## [v1.53.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.53.0)
|
||||
|
||||
Released at 03-02-2021
|
||||
|
||||
* FEATURE: added [vmctl tool](https://docs.victoriametrics.com/vmctl.html) to VictoriaMetrics release process. Now it is packaged in `vmutils-*.tar.gz` archive on [the releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases). Source code for `vmctl` tool has been moved from [github.com/VictoriaMetrics/vmctl](https://github.com/VictoriaMetrics/vmctl) to [github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmctl).
|
||||
* FEATURE: added `-loggerTimezone` command-line flag for adjusting time zone for timestamps in log messages. By default UTC is used.
|
||||
* FEATURE: added `-search.maxStepForPointsAdjustment` command-line flag, which can be used for disabling adjustment for points returned by `/api/v1/query_range` handler if such points have timestamps closer than `-search.latencyOffset` to the current time. Such points may contain incomplete data, so they are substituted by the previous values for `step` query args smaller than one minute by default.
|
||||
* FEATURE: vmselect: added ability to use Graphite-compatible filters in MetricsQL via `{__graphite__="foo.*.bar"}` syntax. This expression is equivalent to `{__name__=~"foo[.][^.]*[.]bar"}`, but it works faster and it is easier to use when migrating from Graphite to VictoriaMetrics. This feature deprecates the usage of `-search.treatDotsAsIsInRegexps` command-line flag.
|
||||
* FEATURE: vmselect: added ability to set additional label filters, which must be applied during queries. Such label filters can be set via optional `extra_label` query arg, which is accepted by [querying API](https://docs.victoriametrics.com/#prometheus-querying-api-usage) handlers. For example, the request to `/api/v1/query_range?extra_label=tenant_id=123&query=<query>` adds `{tenant_id="123"}` label filter to the given `<query>`. It is expected that the `extra_label` query arg is automatically set by auth proxy sitting
|
||||
in front of VictoriaMetrics. [Contact us](mailto:sales@victoriametrics.com) if you need assistance with such a proxy. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1021 .
|
||||
* FEATURE: vmalert: added `-datasource.queryStep` command-line flag for passing optional `step` query arg to `/api/v1/query` endpoint. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1025
|
||||
* FEATURE: vmalert: added ability to query Graphite datasource when evaluating alerting and recording rules. See [these docs](https://docs.victoriametrics.com/vmalert.html#graphite) for details.
|
||||
* FEATURE: vmagent: added `-remoteWrite.roundDigits` command-line option for rounding metric values to the given number of decimal digits after the point before sending the metric to the corresponding `-remoteWrite.url`. This option can be used for improving data compression on the remote storage, because values with lower number of decimal digits can be compressed better than values with bigger number of decimal digits.
|
||||
* FEATURE: vmagent: added `-remoteWrite.rateLimit` command-line flag for limiting data transfer rate to `-remoteWrite.url`. This may be useful when big amounts of buffered data is sent after temporarily unavailability of the remote storage. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1035
|
||||
* FEATURE: vmagent: export the following additional metrics, which may be useful during troubleshooting:
|
||||
- `vm_promscrape_scrapes_failed_per_url_total`
|
||||
- `vm_promscrape_scrapes_skipped_by_sample_limit_per_url_total`
|
||||
- `vm_promscrape_discovery_requests_total`
|
||||
- `vm_promscrape_discovery_retries_total`
|
||||
- `vm_promscrape_scrape_retries_total`
|
||||
- `vm_promscrape_service_discovery_duration_seconds`
|
||||
* FEATURE: vmselect: initial implementation for [Graphite Render API](https://docs.victoriametrics.com/#graphite-render-api-usage).
|
||||
|
||||
* BUGFIX: vmagent: reduce HTTP reconnection rate for scrape targets. Previously vmagent could errorneusly close HTTP keep-alive connections more frequently than needed.
|
||||
* BUGFIX: vmagent: retry scrape and service discovery requests when the remote server closes HTTP keep-alive connection. Previously `disable_keepalive: true` option could be used under `scrape_configs` section when working with such servers.
|
||||
|
||||
|
||||
## [v1.52.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.52.0)
|
||||
|
||||
Released at 13-01-2021
|
||||
|
||||
* FEATURE: provide a sample list of alerting rules for VictoriaMetrics components. It is available [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/alerts.yml).
|
||||
* FEATURE: disable final merge for data for the previous month at the beginning of new month, since it may result in high disk IO and CPU usage. Final merge can be enabled by setting `-finalMergeDelay` command-line flag to positive duration.
|
||||
* FEATURE: add `tfirst_over_time(m[d])` and `tlast_over_time(m[d])` functions to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) for returning timestamps for the first and the last data point in `m` over `d` duration.
|
||||
* FEATURE: add ability to pass multiple labels to `sort_by_label()` and `sort_by_label_desc()` functions. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/992 .
|
||||
* FEATURE: enforce at least TLS v1.2 when accepting HTTPS requests if `-tls`, `-tlsCertFile` and `-tlsKeyFile` command-line flags are set, because older TLS protocols such as v1.0 and v1.1 have been deprecated due to security vulnerabilities.
|
||||
* FEATURE: support `extra_label` query arg for all HTTP-based [data ingestion protocols](https://docs.victoriametrics.com/#how-to-import-time-series-data). This query arg can be used for specifying extra labels which should be added for the ingested data.
|
||||
* FEATURE: vmbackup: increase backup chunk size from 128MB to 1GB. This should reduce the number of Object storage API calls during backups by 8x. This may also reduce costs, since object storage API calls usually have non-zero costs. See https://aws.amazon.com/s3/pricing/ and https://cloud.google.com/storage/pricing#operations-pricing .
|
||||
|
||||
* BUGFIX: properly parse escaped unicode chars in MetricsQL metric names, label names and function names. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/990
|
||||
* BUGFIX: override user-provided labels with labels set in `extra_label` query args during data ingestion over HTTP-based protocols.
|
||||
* BUGFIX: vmagent: prevent from `dialing to the given TCP address time out` error when scraping big number of unavailable targets. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/987
|
||||
* BUGFIX: vmagent: properly show scrape duration on `/targets` page. Previously it was incorrectly shown as 0.000s.
|
||||
* BUGFIX: vmagent: properly log errors when `-promscrape.streamParse` command-line flag is set. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1009
|
||||
* BUGFIX: vmagent: properly suppress errors when both `-promscrape.suppressScrapeErrors` and `-promscrape.streamParse` command-line flags are set. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/1009 .
|
||||
* BUGFIX: vmalert: return non-empty result in template func `query` stub to pass validation. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/989 .
|
||||
* BUGFIX: upgrade base image for Docker packages from Alpine 3.12.1 to Alpine 3.12.3 in order to fix potential security issues. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1010
|
||||
|
||||
|
||||
## [v1.51.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.51.0)
|
||||
|
||||
Released at 27-12-2020
|
||||
|
||||
* FEATURE: add `/api/v1/status/top_queries` handler, which returns the most frequently executed queries and queries that took the most time for execution. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/907
|
||||
* FEATURE: vmagent: add support for `proxy_url` config option in Prometheus scrape configs. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/503
|
||||
* FEATURE: remove parts with stale data as soon as they go outside the configured `-retentionPeriod`. Previously such parts may remain active for long periods of time. This should help reducing disk usage for `-retentionPeriod` smaller than one month.
|
||||
* FEATURE: vmalert: allow setting multiple values for `-notifier.tlsInsecureSkipVerify` command-line flag per each `-notifier.url`.
|
||||
|
||||
* BUGFIX: vmalert: properly escape multiline queries when passing them to Grafana. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/890
|
||||
* BUGFIX: vmagent: set missing `__meta_kubernetes_service_*` labels in `kubernetes_sd_config` for `endpoints` and `endpointslices` roles. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/982
|
||||
* BUGFIX: do not adjust `offset` value provided in MetricsQL query. Previously it could be modified in order to improve response cache hit ratio. This is unneeded, since cache hit ratio should remain good because the query time range should be already aligned to multiple of `step` values. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/976
|
||||
|
||||
|
||||
## [v1.50.2](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.50.2)
|
||||
|
||||
Released at 19-12-2020
|
||||
|
||||
* FEATURE: do not publish duplicate Docker images with `-cluster` tag suffix for [vmagent](https://docs.victoriametrics.com/vmagent.html), [vmalert](https://docs.victoriametrics.com/vmalert.html), [vmauth](https://docs.victoriametrics.com/vmauth.html), [vmbackup](https://docs.victoriametrics.com/vmbackup.html) and [vmrestore](https://docs.victoriametrics.com/vmrestore.html), since they are identical to images without `-cluster` tag suffix.
|
||||
|
||||
* BUGFIX: vmalert: properly populate template variables. This has been broken in v1.50.0. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/974
|
||||
* BUGFIX: properly parse negative combined duration in MetricsQL such as `-1h3m4s`. It must be parsed as `-(1h + 3m + 4s)`. Prevsiously it was parsed as `-1h + 3m + 4s`.
|
||||
* BUGFIX: properly parse lines in [Prometheus exposition format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md) and in [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md) with whitespace after the timestamp. For example, `foo 123 456 ## some comment here`. See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/970
|
||||
|
||||
|
||||
## [v1.50.1](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.50.1)
|
||||
|
||||
Released at 15-12-2020
|
||||
|
||||
* FEATURE: vmagent: export `vmagent_remotewrite_blocks_sent_total` and `vmagent_remotewrite_blocks_sent_total` metrics for each `-remoteWrite.url`.
|
||||
|
||||
* BUGFIX: vmagent: properly delete unregistered scrape targets from `/targets` and `/api/v1/targets` pages. They weren't deleted due to the bug in `v1.50.0`.
|
||||
|
||||
|
||||
## [v1.50.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.50.0)
|
||||
|
||||
Released at 15-12-2020
|
||||
|
||||
* FEATURE: automatically reset response cache when samples with timestamps older than `now - search.cacheTimestampOffset` are ingested to VictoriaMetrics. This makes unnecessary disabling response cache during data backfilling or resetting it after backfilling is complete as described [in these docs](https://docs.victoriametrics.com/#backfilling). This feature applies only to single-node VictoriaMetrics. It doesn't apply to cluster version of VictoriaMetrics because `vminsert` nodes don't know about `vmselect` nodes where the response cache must be reset.
|
||||
* FEATURE: vmalert: add `query`, `first` and `value` functions to alert templates. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/539
|
||||
* FEATURE: vmagent: return user-friendly HTML page when requesting `/targets` page from web browser. The page is returned in the old plaintext format when requesting via curl or similar tool.
|
||||
* FEATURE: allow multiple whitespace chars between measurements, fields and timestamp when parsing InfluxDB line protocol.
|
||||
Though [InfluxDB line protocol](https://docs.influxdata.com/influxdb/v1.8/write_protocols/line_protocol_tutorial/) denies multiple whitespace chars between these entities,
|
||||
some apps improperly put multiple whitespace chars. This workaround allows accepting data from such apps.
|
||||
* FEATURE: export `vm_promscrape_active_scrapers{type="<sd_type>"}` metric for tracking the number of active scrapers per each service discovery type.
|
||||
* FEATURE: export `vm_promscrape_scrapers_started_total{type="<sd_type>"}` and `vm_promscrape_scrapers_stopped_total{type="<sd_type>"}` metrics for tracking churn rate for scrapers
|
||||
per each service discovery type.
|
||||
* FEATURE: vmagent: allow setting per-`-remoteWrite.url` command-line flags for `-remoteWrite.sendTimeout` and `-remoteWrite.tlsInsecureSkipVerify`.
|
||||
|
||||
* BUGFIX: properly handle `*` and `[...]` inside curly braces in query passed to Graphite Metrics API. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/952
|
||||
* BUGFIX: vmagent: fix memory leak when big number of targets is discovered via service discovery.
|
||||
* BUGFIX: vmagent: properly pass `datacenter` filter to Consul API server. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/574#issuecomment-740454170
|
||||
* BUGFIX: properly handle CPU limits set on the host system or host container. The bugfix may result in lower memory usage on systems with CPU limits. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/946
|
||||
* BUGFIX: prevent from duplicate `name` tag returned from `/tags/autoComplete/tags` handler. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/942
|
||||
* BUGFIX: do not enable strict parsing for `-promscrape.config` if `-promscrape.config.dryRun` comand-line flag is set. Strict parsing can be enabled with `-promscrape.config.strictParse` command-line flag. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/944
|
||||
* BUGFIX: vminsert: properly update `vm_rpc_rerouted_rows_processed_total` metric. Previously it wasn't updated. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/955
|
||||
* BUGFIX: vmagent: properly recover when opening incorrectly stored persistent queue. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/964
|
||||
* BUGFIX: vmagent: properly handle scrape errors when stream parsing is enabled with `-promscrape.streamParse` command-line flag or with `stream_parse: true` per-target config option. Previously such errors weren't reported at `/targets` page. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/967
|
||||
* BUGFIX: assume the previous value is 0 when calculating `increase()` for the first point on the graph if its value doesn't exceed 100 and the delta between two first points equals to 0. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/962
|
||||
|
||||
|
||||
## [v1.49.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.49.0)
|
||||
|
||||
Released at 05-12-2020
|
||||
|
||||
* FEATURE: optimize Consul service discovery speed when discovering big number of services. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/574
|
||||
* FEATURE: add `label_uppercase(q, label1, ... labelN)` and `label_lowercase(q, label1, ... labelN)` function to [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html)
|
||||
for uppercasing and lowercasing values for the given labels. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/936
|
||||
* FEATURE: add `count_eq_over_time(m[d], N)` and `count_ne_over_time(m[d], N)` for counting the number of samples for `m` over `d` that (equal / not equal) to `N`.
|
||||
* FEATURE: do not print usage info for all the command-line flags when incorrect command-line flag is passed. Previously it could be hard reading the error message
|
||||
about incorrect command-line flag because of too big usage info for all the flags.
|
||||
* FEATURE: upgrade Go builder from v1.15.5 to v1.15.6 . This fixes [issues found in Go since v1.15.5](https://github.com/golang/go/issues?q=milestone%3AGo1.15.6+label%3ACherryPickApproved).
|
||||
|
||||
* BUGFIX: properly parse timestamps in OpenMetrics format - they are exposed as floating-point number in seconds instead of integer milliseconds
|
||||
unlike in Prometheus exposition format. See [the docs](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md#timestamps).
|
||||
* BUGFIX: return `nan` for `a >bool b` query when `a` equals to `nan` like Prometheus does. Previously `0` was returned in this case. This applies to any comparison operation
|
||||
with `bool` modifier. See [these docs](https://prometheus.io/docs/prometheus/latest/querying/operators/#comparison-binary-operators) for details.
|
||||
* BUGFIX: properly parse hex numbers in MetricsQL. Previously hex numbers with non-decimal digits such as `0x3b` couldn't be parsed.
|
||||
* BUGFIX: handle `time() cmp_op metric` like Prometheus does - i.e. return `metric` value if `cmp_op` comparison is true. Previously `time()` value was returned.
|
||||
* BUGFIX: return `nan` for `minute(m)` query when `m` equals to `nan` like Prometheus does. This applies to all the time-related functions such as `day_of_month`, `day_of_week`,
|
||||
`days_in_month`, `hour`, `month` and `year`.
|
||||
|
||||
|
||||
## [v1.48.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.48.0)
|
||||
|
||||
Released at 26-11-2020
|
||||
|
||||
* FEATURE: added [Snap package for single-node VictoriaMetrics](https://snapcraft.io/victoriametrics). This simplifies installation under Ubuntu to a single command:
|
||||
```bash
|
||||
snap install victoriametrics
|
||||
```
|
||||
* FEATURE: vmselect: add `-replicationFactor` command-line flag for reducing query duration when replication is enabled and a part of vmstorage nodes
|
||||
are temporarily slow and/or temporarily unavailable. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/711
|
||||
* FEATURE: vminsert: export `vm_rpc_vmstorage_is_reachable` metric, which can be used for monitoring reachability of vmstorage nodes from vminsert nodes.
|
||||
* FEATURE: vmagent: add [Netflix Eureka](https://github.com/Netflix/eureka) service discovery (aka [eureka_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config)). See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/851
|
||||
* FEATURE: add `filters` option to `dockerswarm_sd_config` like Prometheus did in v2.23.0 - see https://github.com/prometheus/prometheus/pull/8074
|
||||
* FEATURE: expose `__meta_ec2_ipv6_addresses` label for `ec2_sd_config` like Prometheus will do in the next release.
|
||||
* FEATURE: add `-loggerWarnsPerSecondLimit` command-line flag for rate limiting of WARN messages in logs. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/905
|
||||
* FEATURE: apply `loggerErrorsPerSecondLimit` and `-loggerWarnsPerSecondLimit` rate limit per caller. I.e. log messages are suppressed if the same caller logs the same message
|
||||
at the rate exceeding the given limit. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/905#issuecomment-729395855
|
||||
* FEATURE: add remoteAddr to slow query log in order to simplify identifying the client that sends slow queries to VictoriaMetrics.
|
||||
Slow query logging is controlled with `-search.logSlowQueryDuration` command-line flag.
|
||||
* FEATURE: add `/tags/delSeries` handler from Graphite Tags API. See https://docs.victoriametrics.com/#graphite-tags-api-usage
|
||||
* FEATURE: log metric name plus all its labels when the metric timestamp is out of the configured retention. This should simplify detecting the source of metrics with unexpected timestamps.
|
||||
* FEATURE: add `-dryRun` command-line flag to single-node VictoriaMetrics in order to check config file pointed by `-promscrape.config`.
|
||||
|
||||
* BUGFIX: properly parse Prometheus metrics with [exemplars](https://github.com/OpenObservability/OpenMetrics/blob/master/OpenMetrics.md#exemplars-1) such as `foo 123 ## {bar="baz"} 1`.
|
||||
* BUGFIX: properly parse "infinity" values in [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/OpenMetrics.md#abnf).
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/924
|
||||
|
||||
|
||||
## [v1.47.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.47.0)
|
||||
|
||||
Released at 16-11-2020
|
||||
|
||||
* FEATURE: vmselect: return the original error from `vmstorage` node in query response if `-search.denyPartialResponse` is set.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/891
|
||||
* FEATURE: vmselect: add `"isPartial":{true|false}` field in JSON output for `/api/v1/*` functions
|
||||
from [Prometheus querying API](https://prometheus.io/docs/prometheus/latest/querying/api/). `"isPartial":true` is set if the response contains partial data
|
||||
because of a part of `vmstorage` nodes were unavailable during query processing.
|
||||
* FEATURE: improve performance for `/api/v1/series`, `/api/v1/labels` and `/api/v1/label/<labelName>/values` on time ranges exceeding one day.
|
||||
* FEATURE: vmagent: reduce memory usage when service discovery detects big number of scrape targets and the set of discovered targets changes over time.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825
|
||||
* FEATURE: vmagent: add `-promscrape.dropOriginalLabels` command-line option, which can be used for reducing memory usage when scraping big number of targets.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825#issuecomment-724308361 for details.
|
||||
* FEATURE: vmalert: explicitly set extra labels to alert entities. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
|
||||
* FEATURE: add `-search.treatDotsAsIsInRegexps` command-line flag, which can be used for automatic escaping of dots in regexp label filters used in queries.
|
||||
For example, if `-search.treatDotsAsIsInRegexps` is set, then the query `foo{bar=~"aaa.bb.cc|dd.eee"}` is automatically converted to `foo{bar=~"aaa\\.bb\\.cc|dd\\.eee"}`.
|
||||
This may be useful for querying Graphite data.
|
||||
* FEATURE: consistently return text-based HTTP responses such as `plain/text` and `application/json` with `charset=utf-8`.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/897
|
||||
* FEATURE: update Go builder from v1.15.4 to v1.15.5. This should fix [these issues in Go](https://github.com/golang/go/issues?q=milestone%3AGo1.15.5+label%3ACherryPickApproved).
|
||||
* FEATURE: added `/internal/force_flush` http handler for flushing recently ingested data from in-memory buffers to persistent storage.
|
||||
See [troubleshooting docs](https://docs.victoriametrics.com/#troubleshooting) for more details.
|
||||
* FEATURE: added [Graphite Tags API](https://graphite.readthedocs.io/en/stable/tags.html) support.
|
||||
See [these docs](https://docs.victoriametrics.com/#graphite-tags-api-usage) for details.
|
||||
|
||||
* BUGFIX: do not return data points in the end of the selected time range for time series ending in the middle of the selected time range.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/887 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
|
||||
* BUGFIX: remove spikes at the end of time series gaps for `increase()` or `delta()` functions. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/894
|
||||
* BUGFIX: vminsert: properly return HTTP 503 status code when all the vmstorage nodes are unavailable. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/896
|
||||
|
||||
|
||||
## [v1.46.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.46.0)
|
||||
|
||||
Released at 07-11-2020
|
||||
|
||||
* FEATURE: optimize requests to `/api/v1/labels` and `/api/v1/label/<name>/values` when `start` and `end` args are set.
|
||||
* FEATURE: reduce memory usage when query touches big number of time series.
|
||||
* FEATURE: vmagent: reduce memory usage when `kubernetes_sd_config` discovers big number of scrape targets (e.g. hundreds of thousands) and the majority of these targets (99%)
|
||||
are dropped during relabeling. Previously labels for all the dropped targets were displayed at `/api/v1/targets` page. Now only up to `-promscrape.maxDroppedTargets` such
|
||||
targets are displayed. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/878 for details.
|
||||
* FEATURE: vmagent: reduce memory usage when scraping big number of targets with big number of temporary labels starting with `__`.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825
|
||||
* FEATURE: vmagent: add `/ready` HTTP endpoint, which returns 200 OK status code when all the service discovery has been initialized.
|
||||
This may be useful during rolling upgrades. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/875
|
||||
|
||||
* BUGFIX: vmagent: eliminate data race when `-promscrape.streamParse` command-line is set. Previously this mode could result in scraped metrics with garbage labels.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825#issuecomment-723198247 for details.
|
||||
* BUGFIX: properly calculate `topk_*` and `bottomk_*` functions from [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) for time series with gaps.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/pull/883
|
||||
|
||||
|
||||
## [v1.45.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.45.0)
|
||||
|
||||
Released at 02-11-2020
|
||||
|
||||
* FEATURE: allow setting `-retentionPeriod` smaller than one month. I.e. `-retentionPeriod=3d`, `-retentionPeriod=2w`, etc. is supported now.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/173
|
||||
* FEATURE: optimize more cases according to https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization . Now the following cases are optimized too:
|
||||
* `rollup_func(foo{filters}[d]) op bar` -> `rollup_func(foo{filters}[d]) op bar{filters}`
|
||||
* `transform_func(foo{filters}) op bar` -> `transform_func(foo{filters}) op bar{filters}`
|
||||
* `num_or_scalar op foo{filters} op bar` -> `num_or_scalar op foo{filters} op bar{filters}`
|
||||
* FEATURE: improve time series search for queries with multiple label filters. I.e. `foo{label1="value", label2=~"regexp"}`.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/781
|
||||
* FEATURE: vmagent: add `stream parse` mode. This mode allows reducing memory usage when individual scrape targets expose tens of millions of metrics.
|
||||
For example, during scraping Prometheus in [federation](https://prometheus.io/docs/prometheus/latest/federation/) mode.
|
||||
See `-promscrape.streamParse` command-line option and `stream_parse: true` config option for `scrape_config` section in `-promscrape.config`.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825 and [troubleshooting docs for vmagent](https://docs.victoriametrics.com/vmagent.html#troubleshooting).
|
||||
* FEATURE: vmalert: add `-dryRun` command-line option for validating the provided config files without the need to start `vmalert` service.
|
||||
* FEATURE: accept optional third argument of string type at `topk_*` and `bottomk_*` functions. This is label name for additional time series to return with the sum of time series outside top/bottom K. See [MetricsQL docs](https://docs.victoriametrics.com/MetricsQL.html) for more details.
|
||||
* FEATURE: vmagent: expose `/api/v1/targets` page according to [the corresponding Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/643
|
||||
|
||||
* BUGFIX: vmagent: properly handle OpenStack endpoint ending with `v3.0` such as `https://ostack.example.com:5000/v3.0`
|
||||
in the same way as Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/728#issuecomment-709914803
|
||||
* BUGFIX: drop trailing data points for time series with a single raw sample. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
|
||||
* BUGFIX: do not drop trailing data points for instant queries to `/api/v1/query`. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845
|
||||
* BUGFIX: vmbackup: fix panic when `-origin` isn't specified. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/856
|
||||
* BUGFIX: vmalert: skip automatically added labels on alerts restore. Label `alertgroup` was introduced in [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/611)
|
||||
and automatically added to generated time series. By mistake, this new label wasn't correctly purged on restore event and affected alert's ID uniqueness.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/870
|
||||
* BUGFIX: vmagent: fix panic at scrape error body formating. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/864
|
||||
* BUGFIX: vmagent: add leading missing slash to metrics path like Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/835
|
||||
* BUGFIX: vmagent: drop packet if remote storage returns 4xx status code. This make the behaviour consistent with Prometheus.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/873
|
||||
* BUGFIX: vmagent: properly handle 301 redirects. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/869
|
||||
|
||||
|
||||
## [v1.44.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.44.0)
|
||||
|
||||
Released at 13-10-2020
|
||||
|
||||
* FEATURE: automatically add missing label filters to binary operands as described at https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusLabelNonOptimization .
|
||||
This should improve performance for queries with missing label filters in binary operands. For example, the following query should work faster now, because it shouldn't
|
||||
fetch and discard time series for `node_filesystem_files_free` metric without matching labels for the left side of the expression:
|
||||
```
|
||||
node_filesystem_files{ host="$host", mountpoint="/" } - node_filesystem_files_free
|
||||
```
|
||||
* FEATURE: vmagent: add Docker Swarm service discovery (aka [dockerswarm_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config)).
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/656
|
||||
* FEATURE: add ability to export data in CSV format. See [these docs](https://docs.victoriametrics.com/#how-to-export-csv-data) for details.
|
||||
* FEATURE: vmagent: add `-promscrape.suppressDuplicateScrapeTargetErrors` command-line flag for suppressing `duplicate scrape target` errors.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/651 and https://docs.victoriametrics.com/vmagent.html#troubleshooting .
|
||||
* FEATURE: vmagent: show original labels before relabeling is applied on `duplicate scrape target` errors. This should simplify debugging for incorrect relabeling.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/651
|
||||
* FEATURE: vmagent: `/targets` page now accepts optional `show_original_labels=1` query arg for displaying original labels for each target before relabeling is applied.
|
||||
This should simplify debugging for target relabeling configs. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/651
|
||||
* FEATURE: add `-finalMergeDelay` command-line flag for configuring the delay before final merge for per-month partitions.
|
||||
The final merge is started after no new data is ingested into per-month partition during `-finalMergeDelay`.
|
||||
* FEATURE: add `vm_rows_added_to_storage_total` metric, which shows the total number of rows added to storage since app start.
|
||||
The `sum(rate(vm_rows_added_to_storage_total))` can be smaller than `sum(rate(vm_rows_inserted_total))` if certain metrics are dropped
|
||||
due to [relabeling](https://docs.victoriametrics.com/#relabeling). The `sum(rate(vm_rows_added_to_storage_total))` can be bigger
|
||||
than `sum(rate(vm_rows_inserted_total))` if [replication](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety) is enabled.
|
||||
* FEATURE: keep metric name after applying [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) functions, which don't change time series meaning.
|
||||
The list of such functions:
|
||||
* `keep_last_value`
|
||||
* `keep_next_value`
|
||||
* `interpolate`
|
||||
* `running_min`
|
||||
* `running_max`
|
||||
* `running_avg`
|
||||
* `range_min`
|
||||
* `range_max`
|
||||
* `range_avg`
|
||||
* `range_first`
|
||||
* `range_last`
|
||||
* `range_quantile`
|
||||
* `smooth_exponential`
|
||||
* `ceil`
|
||||
* `floor`
|
||||
* `round`
|
||||
* `clamp_min`
|
||||
* `clamp_max`
|
||||
* `max_over_time`
|
||||
* `min_over_time`
|
||||
* `avg_over_time`
|
||||
* `quantile_over_time`
|
||||
* `mode_over_time`
|
||||
* `geomean_over_time`
|
||||
* `holt_winters`
|
||||
* `predict_linear`
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/674
|
||||
|
||||
* BUGFIX: properly handle stale time series after K8S deployment. Previously such time series could be double-counted.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/748
|
||||
* BUGFIX: return a single time series at max from `absent()` function like Prometheus does.
|
||||
* BUGFIX: vmalert: accept days, weeks and years in `for: ` part of config like Prometheus does. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/817
|
||||
* BUGFIX: fix `mode_over_time(m[d])` calculations. Previously the function could return incorrect results.
|
||||
|
||||
|
||||
## [v1.43.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.43.0)
|
||||
|
||||
Released at 06-10-2020
|
||||
|
||||
* FEATURE: reduce CPU usage for repeated queries over sliding time window when no new time series are added to the database.
|
||||
Typical use cases: repeated evaluation of alerting rules in [vmalert](https://docs.victoriametrics.com/vmalert.html) or dashboard auto-refresh in Grafana.
|
||||
* FEATURE: vmagent: add OpenStack service discovery aka [openstack_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config).
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/728 .
|
||||
* FEATURE: vmalert: make `-maxIdleConnections` configurable for datasource HTTP client. This option can be used for minimizing connection churn.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/795 .
|
||||
* FEATURE: add `-influx.maxLineSize` command-line flag for configuring the maximum size for a single InfluxDB line during parsing.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/807
|
||||
|
||||
* BUGFIX: properly handle `inf` values during [background merge of LSM parts](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
|
||||
Previously `Inf` values could result in `NaN` values for adjacent samples in time series. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/805 .
|
||||
* BUGFIX: fill gaps on graphs for `range_*` and `running_*` functions. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/806 .
|
||||
* BUGFIX: make a copy of label with new name during relabeling with `action: labelmap` in the same way as Prometheus does.
|
||||
Previously the original label name has been replaced. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/812 .
|
||||
* BUGFIX: support parsing floating-point timestamp like Graphite Carbon does. Such timestmaps are truncated to seconds.
|
||||
|
||||
|
||||
## [v1.42.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.42.0)
|
||||
|
||||
Released at 30-09-2020
|
||||
|
||||
* FEATURE: use all the available CPU cores when accepting data via a single TCP connection
|
||||
for [all the supported protocols](https://docs.victoriametrics.com/#how-to-import-time-series-data).
|
||||
Previously data ingested via a single TCP connection could use only a single CPU core. This could limit data ingestion performance.
|
||||
The main benefit of this feature is that data can be imported at max speed via a single connection - there is no need to open multiple concurrent
|
||||
connections to VictoriaMetrics or [vmagent](https://docs.victoriametrics.com/vmagent.html) in order to achieve the maximum data ingestion speed.
|
||||
* FEATURE: cluster: improve performance for data ingestion path from `vminsert` to `vmstorage` nodes. The maximum data ingestion performance
|
||||
for a single connection between `vminsert` and `vmstorage` node scales with the number of available CPU cores on `vmstorage` side.
|
||||
This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/791 .
|
||||
* FEATURE: add ability to export / import data in native format via `/api/v1/export/native` and `/api/v1/import/native`.
|
||||
This is the most optimized approach for data migration between VictoriaMetrics instances. Both single-node and cluster instances are supported.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/787#issuecomment-700632551 .
|
||||
* FEATURE: add `reduce_mem_usage` query option to `/api/v1/export` in order to reduce memory usage during data export / import.
|
||||
See [these docs](https://docs.victoriametrics.com/#how-to-export-data-in-json-line-format) for details.
|
||||
* FEATURE: improve performance for `/api/v1/series` handler when it returns big number of time series.
|
||||
* FEATURE: add `vm_merge_need_free_disk_space` metric, which can be used for estimating the number of deferred background data merges due to the lack of free disk space.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/686 .
|
||||
* FEATURE: add OpenBSD support. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/785 .
|
||||
|
||||
* BUGFIX: properly apply `-search.maxStalenessInterval` command-line flag value. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/784 .
|
||||
* BUGFIX: fix displaying data in Grafana tables. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/720 .
|
||||
* BUGFIX: do not adjust the number of detected CPU cores found at `/sys/devices/system/cpu/online`.
|
||||
The adjustment was increasing the resulting GOMAXPROC by 1, which looked confusing to users.
|
||||
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685#issuecomment-698595309 .
|
||||
* BUGFIX: vmagent: do not show `-remoteWrite.url` in initial logs if `-remoteWrite.showURL` isn't set. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/773 .
|
||||
* BUGFIX: properly handle case when [/metrics/find](https://docs.victoriametrics.com/#graphite-metrics-api-usage) finds both a leaf and a node for the given `query=prefix.*`.
|
||||
In this case only the node must be returned with stripped dot in the end of id as carbonapi does.
|
||||
|
||||
|
||||
## Previous releases
|
||||
|
||||
See [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
|
||||
1
CNAME
Normal file
@@ -0,0 +1 @@
|
||||
docs.victoriametrics.com
|
||||
554
CaseStudies.md
Normal file
@@ -0,0 +1,554 @@
|
||||
---
|
||||
sort: 11
|
||||
---
|
||||
|
||||
# Case studies and talks
|
||||
|
||||
Below please find public case studies and talks from VictoriaMetrics users. You can also join our [community Slack channel](https://slack.victoriametrics.com/)
|
||||
where you can chat with VictoriaMetrics users to get additional references, reviews and case studies.
|
||||
|
||||
* [AbiosGaming](#abiosgaming)
|
||||
* [adidas](#adidas)
|
||||
* [Adsterra](#adsterra)
|
||||
* [ARNES](#arnes)
|
||||
* [Brandwatch](#brandwatch)
|
||||
* [CERN](#cern)
|
||||
* [COLOPL](#colopl)
|
||||
* [Dreamteam](#dreamteam)
|
||||
* [Fly.io](#flyio)
|
||||
* [German Research Center for Artificial Intelligence](#german-research-center-for-artificial-intelligence)
|
||||
* [Grammarly](#grammarly)
|
||||
* [Groove X](#groove-x)
|
||||
* [Idealo.de](#idealode)
|
||||
* [MHI Vestas Offshore Wind](#mhi-vestas-offshore-wind)
|
||||
* [Percona](#percona)
|
||||
* [Razorpay](#razorpay)
|
||||
* [Sensedia](#sensedia)
|
||||
* [Smarkets](#smarkets)
|
||||
* [Synthesio](#synthesio)
|
||||
* [Wedos.com](#wedoscom)
|
||||
* [Wix.com](#wixcom)
|
||||
* [Zerodha](#zerodha)
|
||||
* [zhihu](#zhihu)
|
||||
|
||||
You can also read [articles about VictoriaMetrics from our users](https://docs.victoriametrics.com/Articles.html#third-party-articles-and-slides-about-victoriametrics).
|
||||
|
||||
|
||||
## AbiosGaming
|
||||
|
||||
[AbiosGaming](https://abiosgaming.com/) provides industry leading esports data and technology across the globe.
|
||||
|
||||
> At Abios, we are running Grafana and Prometheus for our operational insights. We are collecting all sorts of operational metrics such as request latency, active WebSocket connections, and cache statistics to determine if things are working as we expect them to.
|
||||
|
||||
> Prometheus explicitly recommends their users not to use high cardinality labels for their time-series data, which is exactly what we want to do. Prometheus is thus a poor solution to keep using. However, since we were already using Prometheus, we needed an alternative solution to be fully compatible with the Prometheus query language.
|
||||
|
||||
> The options we decided to try were TimescaleDB together with Promscale to act as a remote write intermediary and VictoriaMetrics. In both cases we still used Prometheus Operator to launch Prometheus instances to scrape metrics and send them to the respective storage layers.
|
||||
|
||||
> The biggest difference for our day-to-day operation is perhaps that VictoriaMetrics does not have a Write-Ahead log. The WAL has caused us trouble when Prometheus has experienced issues and starts to run out of RAM when replaying the WAL, thus entering a crash-loop.
|
||||
|
||||
> All in all, we are quite impressed with VictoriaMetrics. Not only is the core time-series database well designed, easy to deploy and operate, and performant but the entire ecosystem around it seems to have been given an equal amount of love. There are utilities for things such as taking snapshots (backups) and storing to S3 (and reloading from S3), a Kubernetes Operator, and authentication proxies. It also provides a cluster deployment option if we were to scale up to those numbers.
|
||||
|
||||
> From a usability point of view, VictoriaMetrics is the clear winner. Neither Prometheus nor TimescaleDB managed to do any kind of aggregations on our high cardinality metrics, whereas VictoriaMetrics does.
|
||||
|
||||
See [the full article](https://abiosgaming.com/press/high-cardinality-aggregations/).
|
||||
|
||||
|
||||
## adidas
|
||||
|
||||
See our [slides](https://promcon.io/2019-munich/slides/remote-write-storage-wars.pdf) and [video](https://youtu.be/OsH6gPdxR4s)
|
||||
from [Remote Write Storage Wars](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) talk at [PromCon 2019](https://promcon.io/2019-munich/).
|
||||
VictoriaMetrics is compared to Thanos, Cortex and M3DB in the talk.
|
||||
|
||||
## Adsterra
|
||||
|
||||
[Adsterra Network](https://adsterra.com) is a leading digital advertising agency that offers
|
||||
performance-based solutions for advertisers and media partners worldwide.
|
||||
|
||||
We used to collect and store our metrics with Prometheus. Over time, the data volume on our servers
|
||||
and metrics increased to the point that we were forced to gradually reduce what we were retaining. When our retention got as low as 7 days
|
||||
we looked for alternative solutions. We chose between Thanos, VictoriaMetrics and Prometheus federation.
|
||||
|
||||
We ended up with the following configuration:
|
||||
|
||||
- Local instances of Prometheus with VictoriaMetrics as the remote storage on our backend servers.
|
||||
- A single Prometheus on our monitoring server scrapes metrics from other servers and writes to VictoriaMetrics.
|
||||
- A separate Prometheus that federates from other instances of Prometheus and processes alerts.
|
||||
|
||||
We learned that remote write protocol generated too much traffic and connections so after 8 months we started looking for alternatives.
|
||||
|
||||
Around the same time, VictoriaMetrics released [vmagent](https://docs.victoriametrics.com/vmagent.html).
|
||||
We tried to scrape all the metrics via a single instance of vmagent but it that didn't work because vmgent wasn't able to catch up with writes
|
||||
into VictoriaMetrics. We tested different options and end up with the following scheme:
|
||||
|
||||
- We removed Prometheus from our setup.
|
||||
- VictoriaMetrics [can scrape targets](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-scrape-prometheus-exporters-such-as-node-exporter) as well
|
||||
so we removed vmagent. Now, VictoriaMetrics scrapes all the metrics from 110 jobs and 5531 targets.
|
||||
- We use [Promxy](https://github.com/jacksontj/promxy) for alerting.
|
||||
|
||||
Such a scheme has generated the following benefits compared with Prometheus:
|
||||
|
||||
- We can store more metrics.
|
||||
- We need less RAM and CPU for the same workload.
|
||||
|
||||
Cons are the following:
|
||||
|
||||
- VictoriaMetrics didn't support replication (it [supports replication now](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety)) - we run an extra instance of VictoriaMetrics and Promxy in front of a VictoriaMetrics pair for high availability.
|
||||
- VictoriaMetrics stores 1 extra month for defined retention (if retention is set to N months, then VM stores N+1 months of data), but this is still better than other solutions.
|
||||
|
||||
Here are some numbers from our single-node VictoriaMetrics setup:
|
||||
|
||||
- active time series: 10M
|
||||
- ingestion rate: 800K samples/sec
|
||||
- total number of datapoints: more than 2 trillion
|
||||
- total number of entries in inverted index: more than 1 billion
|
||||
- daily time series churn rate: 2.6M
|
||||
- data size on disk: 1.5 TB
|
||||
- index size on disk: 27 GB
|
||||
- average datapoint size on disk: 0.75 bytes
|
||||
- range query rate: 16 rps
|
||||
- instant query rate: 25 rps
|
||||
- range query duration: max: 0.5s; median: 0.05s; 97th percentile: 0.29s
|
||||
- instant query duration: max: 2.1s; median: 0.04s; 97th percentile: 0.15s
|
||||
|
||||
VictoriaMetrics consumes about 50GB of RAM.
|
||||
|
||||
Setup:
|
||||
|
||||
We have 2 single-node instances of VictoriaMetrics. The first instance collects and stores high-resolution metrics (10s scrape interval) for a month.
|
||||
The second instance collects and stores low-resolution metrics (300s scrape interval) for a month.
|
||||
We use Promxy + Alertmanager for global view and alerts evaluation.
|
||||
|
||||
|
||||
## ARNES
|
||||
|
||||
[The Academic and Research Network of Slovenia](https://www.arnes.si/en/) (ARNES) is a public institute that provides network services to research,
|
||||
educational and cultural organizations enabling connections and cooperation with each other and with related organizations worldwide.
|
||||
|
||||
After using Cacti, Graphite and StatsD for years, we wanted to upgrade our monitoring stack to something that:
|
||||
|
||||
- has native alerting support
|
||||
- can be run on-prem
|
||||
- has multi-dimensional metrics
|
||||
- has lower hardware requirements
|
||||
- is scalable
|
||||
- has a simple client that allows for provisioning and discovery with Puppet
|
||||
|
||||
We had been running Prometheus for about a year in a test environment and it was working well but there was a need/wish for a few more years of retention than the old system provided. We tested Thanos which was a bit resource hungry but worked great for about half a year.
|
||||
Then we discovered VictoriaMetrics. Our scale isn't that big so we don't have on-prem S3 and no Kubernetes. VM's single node instance provided
|
||||
the same result with far less maintenance overhead and lower hardware requirements.
|
||||
|
||||
After testing it a few months and with great support from the maintainers on [Slack](https://slack.victoriametrics.com/),
|
||||
we decided to go with it. VM's support for the ingestion of InfluxDB metrics was an additional bonus as our hardware team uses
|
||||
SNMPCollector to collect metrics from network devices and switching from InfluxDB to VictoriaMetrics required just a simple change in the config file.
|
||||
|
||||
Numbers:
|
||||
|
||||
- 2 single node instances per DC (one for Prometheus and one for InfluxDB metrics)
|
||||
- Active time series per VictoriaMetrics instance: ~500k (Prometheus) + ~320k (InfluxDB)
|
||||
- Ingestion rate per VictoriaMetrics instance: 45k/s (Prometheus) / 30k/s (InfluxDB)
|
||||
- Query duration: median ~5ms, 99th percentile ~45ms
|
||||
- Total number of datapoints per instance: 390B (Prometheus), 110B (InfluxDB)
|
||||
- Average datapoint size on drive: 0.4 bytes
|
||||
- Disk usage per VictoriaMetrics instance: 125GB (Prometheus), 185GB (InfluxDB)
|
||||
- Index size per VictoriaMetrics instance: 1.6GB (Prometheus), 1.2GB (InfluxDB)
|
||||
|
||||
We are running 1 Prometheus, 1 VictoriaMetrics and 1 Grafana server in each datacenter on baremetal servers, scraping 350+ targets
|
||||
(and 3k+ devices collected via SNMPCollector sending metrics directly to VM). Each Prometheus is scraping all targets
|
||||
so we have all metrics in both VictoriaMetrics instances. We are using [Promxy](https://github.com/jacksontj/promxy) to deduplicate metrics from both instances.
|
||||
Grafana has an LB infront so if one DC has problems we can still view all metrics from both DCs on the other Grafana instance.
|
||||
|
||||
We are still in the process of migration, but we are really happy with the whole stack. It has proven to be an essential tool
|
||||
for gathering insights into our services during COVID-19 and has enabled us to provide better service and identify problems faster.
|
||||
|
||||
## Brandwatch
|
||||
|
||||
[Brandwatch](https://www.brandwatch.com/) is the world's pioneering digital consumer intelligence suite,
|
||||
helping over 2,000 of the world's most admired brands and agencies to make insightful, data-driven business decisions.
|
||||
|
||||
The engineering department at Brandwatch has been using InfluxDB to store application metrics for many years
|
||||
but when End-of-Life of InfluxDB version 1.x was announced we decided to re-evaluate our entire metrics collection and storage stack.
|
||||
|
||||
The main goals for the new metrics stack were:
|
||||
- improved performance
|
||||
- lower maintenance
|
||||
- support for native clustering in open source version
|
||||
- the less metrics shipment had to change, the better
|
||||
- longer data retention time period would be great but not critical
|
||||
|
||||
We initially tested CrateDB and TimescaleDB wand found that both had limitations or requirements in their open source versions
|
||||
that made them unfit for our use case. Prometheus was also considered but it's push vs. pull metrics was a big change we did not want
|
||||
to include in the already significant change.
|
||||
|
||||
Once we found VictoriaMetrics it solved the following problems:
|
||||
- it is very lightweight and we can now run virtual machines instead of dedicated hardware machines for metrics storage
|
||||
- very short startup time and any possible gaps in data can easily be filled in using Promxy
|
||||
- we could continue using Telegraf as our metrics agent and ship identical metrics to both InfluxDB and VictoriaMetrics during the migration period (migration just about to start)
|
||||
- compression im VM is really good. We can store more metrics and we can easily spin up new VictoriaMetrics instances
|
||||
for new data and keep read-only nodes with older data if we need to extend our retention period further
|
||||
than single virtual machine disks allow and we can aggregate all the data from VictoriaMetrics with Promxy
|
||||
|
||||
High availability is done the same way we did with InfluxDB by running parallel single nodes of VictoriaMetrics.
|
||||
|
||||
Numbers:
|
||||
|
||||
- active time series: up to 25 million
|
||||
- ingestion rate: ~300 000
|
||||
- total number of datapoints: 380 billion and growing
|
||||
- total number of entries in inverted index: 575 million and growing
|
||||
- daily time series churn rate: ~550 000
|
||||
- data size on disk: ~660GB and growing
|
||||
- index size on disk: ~9,3GB and growing
|
||||
- average datapoint size on disk: ~1.75 bytes
|
||||
|
||||
Query rates are insignificant as we have concentrated on data ingestion so far.
|
||||
|
||||
Anders Bomberg, Monitoring and Infrastructure Team Lead, brandwatch.com
|
||||
|
||||
## CERN
|
||||
|
||||
The European Organization for Nuclear Research better known as [CERN](https://home.cern/) uses VictoriaMetrics for real-time monitoring
|
||||
of the [CMS](https://home.cern/science/experiments/cms) detector system.
|
||||
According to [published talk](https://indico.cern.ch/event/877333/contributions/3696707/attachments/1972189/3281133/CMS_mon_RD_for_opInt.pdf)
|
||||
VictoriaMetrics is used for the following purposes as a part of the "CMS Monitoring cluster":
|
||||
|
||||
* As a long-term storage for messages ingested from the [NATS messaging system](https://nats.io/). Ingested messages are pushed directly to VictoriaMetrics via HTTP protocol
|
||||
* As a long-term storage for Prometheus monitoring system (30 days retention policy. There are plans to increase it up to ½ year)
|
||||
* As a data source for visualizing metrics in Grafana.
|
||||
|
||||
R&D topic: Evaluate VictoraMetrics vs InfluxDB for large cardinality data.
|
||||
|
||||
Please also see [The CMS monitoring infrastructure and applications](https://arxiv.org/pdf/2007.03630.pdf) publication from CERN with details about their VictoriaMetrics usage.
|
||||
|
||||
|
||||
## COLOPL
|
||||
|
||||
[COLOPL](http://www.colopl.co.jp/en/) is Japanese game development company. It started using VictoriaMetrics
|
||||
after evaulating the following remote storage solutions for Prometheus:
|
||||
|
||||
* Cortex
|
||||
* Thanos
|
||||
* M3DB
|
||||
* VictoriaMetrics
|
||||
|
||||
See [slides](https://speakerdeck.com/inletorder/monitoring-platform-with-victoria-metrics) and [video](https://www.youtube.com/watch?v=hUpHIluxw80)
|
||||
from `Large-scale, super-load system monitoring platform built with VictoriaMetrics` talk at [Prometheus Meetup Tokyo #3](https://prometheus.connpass.com/event/157721/).
|
||||
|
||||
## Dreamteam
|
||||
|
||||
[Dreamteam](https://dreamteam.gg/) successfully uses single-node VictoriaMetrics in multiple environments.
|
||||
|
||||
Numbers:
|
||||
|
||||
* Active time series: from 350K to 725K
|
||||
* Total number of time series: from 100M to 320M
|
||||
* Total number of datapoints: from 120 billions to 155 billions
|
||||
* Retention period: 3 months
|
||||
|
||||
VictoriaMetrics in production environment runs on 2 M5 EC2 instances in "HA" mode, managed by Terraform and Ansible TF module.
|
||||
2 Prometheus instances are writing to both VMs, with 2 [Promxy](https://github.com/jacksontj/promxy) replicas
|
||||
as the load balancer for reads.
|
||||
|
||||
## Fly.io
|
||||
|
||||
[Fly.io](https://fly.io/about/) is a platform for running full stack apps and databases close to your users.
|
||||
|
||||
> Victoria Metrics (“Vicky”), in a clustered configuration, is our metrics database. We run a cluster of fairly big Vicky hosts.
|
||||
|
||||
> Like everyone else, we started with a simple Prometheus server. That worked until it didn’t. We spent some time scaling it with Thanos, and Thanos was a lot, as far as ops hassle goes. We’d dabbled with Vicky just as a long-term storage engine for vanilla Prometheus, with promxy set up to deduplicate metrics.
|
||||
|
||||
> Vicky grew into a more ambitious offering, and added its own Prometheus scraper; we adopted it and scaled it as far as we reasonably could in a single-node configuration. Scaling requirements ultimately pushed us into a clustered deployment; we run an HA cluster (fronted by haproxy). Current Vicky has a really straightforward multi-tenant API — it’s easy to namespace metrics for customers — and it chugs along for us without too much minding.
|
||||
|
||||
See [the full post](https://fly.io/blog/measuring-fly/).
|
||||
|
||||
|
||||
## German Research Center for Artificial Intelligence
|
||||
|
||||
[German Research Center for Artificial Intelligence](https://en.wikipedia.org/wiki/German_Research_Centre_for_Artificial_Intelligence) (DFKI) is one of the world's largest nonprofit contract research institutes for software technology based on artificial intelligence (AI) methods. DFKI was founded in 1988, and has facilities in the German cities of Kaiserslautern, Saarbrücken, Bremen and Berlin.
|
||||
|
||||
> Traditionally research groups in DFKI each used their own hardware. In mid 2020 we started an initiative to consolidate existing (and future) hardware into a central Slurm cluster to enable our researchers and students to run more and larger experiments. Based on the Nvidia deepops stack this included Prometheus for short-term metric storage. Our users liked the level of detail they got from our custom dashboards compared to our previous Zabbix-based solution, so we decided to extend the retention period to several years. Ideally we wanted PhD students to be able to recall even their earliest experiments by the time they finished their thesis. Since we do everything on-premise we needed a solution that is primarily space-efficient.
|
||||
|
||||
> We initially considered simply extending the retention period of the Prometheus instances included with deepops, since this would be the “batteries included” solution and appeared to be what everyone else was doing. We naively also liked the concept behind TimescaleDB, since it relies on Postgres for storage that has had decades of development. Turns out relational databases are not good at storing time-series and integration with existing exporters and Grafana would have been more difficult.
|
||||
|
||||
> VictoriaMetrics kept showing up in searches and benchmarks on time-series DB performance and consistently came out on top when it came to required storage. Quite frankly, the presented numbers looked like magic, so we decided to put this to the test. First impressions upon trial were excellent. Download the binary and point it at a storage location. Almost no configuration required. Apart from minor tweaks to the command line (turning on deduplication) and running it as a systemd unit we still use the same instance from the first tests today. It was further superior to Prometheus in every measurable way. It used considerably less CPU time and RAM than Prometheus and a third of the storage.
|
||||
|
||||
> While initially storage efficiency was the primary driver, the simplicity of setting up a testbed definitely helped. Seeing how effortlessly the single-node instance deals with our current setup gives us confidence that it will keep up with our growth for quite a while. And when the time comes that we outgrow it there is always the robust cluster variant of VictoriaMetrics that we can turn to.
|
||||
|
||||
> We like hassle-free experience with VictoriaMetrics. And at least for our use case a straight upgrade compared to Prometheus, while fully compatible with that ecosystem. While it can use cloud storage, there appears to be no downsides to using the filesystem instead, so it fits very well into our on-premise culture. It even comes with an excellent official Grafana dashboard to monitor performance.
|
||||
|
||||
Joachim Folz, Researcher, German Research Center for Artificial Intelligence (DFKI)
|
||||
|
||||
Numbers:
|
||||
|
||||
- Single-node mode
|
||||
- Active time series: 130K
|
||||
- Ingestion rate: 24K new samples per second
|
||||
- Total number of datapoints: 160 billions
|
||||
- Churn rate: 20K new time series per day
|
||||
- Data size on disk: 82 GB
|
||||
- Index size on disk: 300 MB
|
||||
- Query rate:
|
||||
- `/api/v1/query_range`: 2 queries per second
|
||||
- `/api/v1/query`: 1.2 queries per second
|
||||
- Query duration:
|
||||
- 99th percentile: 6.5 milliseconds
|
||||
- 90th percentile: 4 milliseconds
|
||||
- median: 1 millisecond
|
||||
- CPU usage: 0.1 CPU cores
|
||||
- RAM usage: 2.8 GB
|
||||
|
||||
|
||||
## Grammarly
|
||||
|
||||
[Grammarly](https://www.grammarly.com/) provides digital writing assistant that helps 30 million people and 30 thousand teams write more clearly and effectively every day. In building a product that scales across multiple platforms and devices, Grammarly works to empower users whenever and wherever they communicate.
|
||||
|
||||
> Maintenance and scaling for our previous on-premise monitoring system was hard and required a lot of effort from our side. The previous system was not optimized for storing frequently changing metrics (moderate [churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) was a concern). The costs of the previous solution were not optimal.
|
||||
|
||||
> We evaluated various cloud-based and on-premise monitoring solutions: Sumo Logic, DataDog, SignalFX, Amazon CloudWatch, Prometheus, M3DB, Thanos, Graphite, etc. PoC results were sufficient for us to move forward with VictoriaMetrics due to the following reasons:
|
||||
|
||||
- High performance
|
||||
- Support for Graphite and OpenMetrics data ingestion types
|
||||
- Good documentation and easy bootstrap
|
||||
- Responsiveness of VictoriaMetrics support team during research and afterward
|
||||
|
||||
> Switching from our previous on-premise monitoring system to VictoriaMetrics allowed reducing infrastructure costs by an order of magnitude while improving DevOps experience and developer experience.
|
||||
|
||||
Numbers:
|
||||
|
||||
- [Cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) of VictoriaMetrics
|
||||
- Active time series: 35M
|
||||
- Ingestion rate: 950K new samples per second
|
||||
- Total number of datapoints: 44 trillions
|
||||
- Churn rate: 27M new time series per day
|
||||
- Data size on disk: 23 TB
|
||||
- Index size on disk: 700 GB
|
||||
- The average datapoint size on disk: 0.5 bytes
|
||||
- Query rate:
|
||||
- `/api/v1/query_range`: 350 queries per second
|
||||
- `/api/v1/query`: 24 queries per second
|
||||
- Query duration:
|
||||
- 99th percentile: 500 milliseconds
|
||||
- 90th percentile: 70 milliseconds
|
||||
- median: 2 milliseconds
|
||||
- CPU usage: 12 CPU cores
|
||||
- RAM usage: 250 GB
|
||||
|
||||
|
||||
## Groove X
|
||||
|
||||
[Groove X](https://groove-x.com/en/) designs and produces robotics solutions. Its mission is to bring out humanity’s full potential through robotics.
|
||||
|
||||
> We need monitoring solution for Device (Robot and Charge Station) health monitoring. At first, we used the Prometheus server, and then migrated to Thanos. But it was difficult to manage Thanos cluster and also we had a performance issue (long latency on request). Colopl, Inc. used VictoriaMetrics and we got interested in it. We built another k8s cluster besides our original Thanos cluster, and tried VictoriaMetrics in parallel for a while. It worked better and finally we decided to switch to VictoriaMetrics, because it provides low latency, it is in active development and it is easy to maintain.
|
||||
|
||||
> We like performance and scalability provided by VictoriaMetrics. We use metrics in our daily work, and long latency would be a big problem. Also, metrics correctness is important. We reported some inconsistencies with Prometheus during the evaluation period and received quick feedback from VictoriaMetrics developers.
|
||||
|
||||
Junya Hayashi, Senior Software Engineer, Groove X
|
||||
|
||||
Numbers:
|
||||
|
||||
- Active time series: 14 millions
|
||||
- Ingestion rate: 235K samples per second
|
||||
- Total number of datapoints: 3.2 trillions
|
||||
- Churn rate: 420K new time series per day
|
||||
- Data size on disk: 2 TB
|
||||
- Index size on disk: 52 GB
|
||||
- Query duration:
|
||||
- 99th percentile: 2.6 seconds
|
||||
- 90th percentile: 0.4 seconds
|
||||
- median: 0.006 seconds
|
||||
|
||||
## Idealo.de
|
||||
|
||||
[idealo.de](https://www.idealo.de/) is the leading price comparison website in Germany. We use Prometheus for metrics on our container platform.
|
||||
When we introduced Prometheus at idealo we started with m3db as our longterm storage. In our setup, m3db was quite unstable and consumed a lot of resources.
|
||||
|
||||
VictoriaMetrics in production is very stable for us and uses only a fraction of the resources even though we also increased our retention period from 1 month to 13 months.
|
||||
|
||||
Numbers:
|
||||
|
||||
- The number of active time series per VictoriaMetrics instance is 21M
|
||||
- Total ingestion rate 120k metrics per second
|
||||
- The total number of datapoints 3.1 trillion
|
||||
- The average time series churn rate is ~9M per day
|
||||
- The average query rate is ~20 per second. Response time for 99th quantile is 120ms
|
||||
- Retention: 13 months
|
||||
- Size of all datapoints: 3.5 TB
|
||||
|
||||
|
||||
## MHI Vestas Offshore Wind
|
||||
|
||||
The mission of [MHI Vestas Offshore Wind](http://www.mhivestasoffshore.com) is to co-develop offshore wind as an economically viable and sustainable energy resource to benefit future generations.
|
||||
|
||||
MHI Vestas Offshore Wind is using VictoriaMetrics to ingest and visualize sensor data from offshore wind turbines. The very efficient storage and ability to backfill was key in choosing VictoriaMetrics. MHI Vestas Offshore Wind is running the cluster version of VictoriaMetrics on Kubernetes using the Helm charts for deployment to be able to scale up capacity as the solution is rolled out.
|
||||
|
||||
Numbers with current, limited roll out:
|
||||
|
||||
- Active time series: 270K
|
||||
- Ingestion rate: 70K samples per second
|
||||
- Total number of datapoints: 850 billions
|
||||
- Data size on disk: 800 GiB
|
||||
- Retention period: 3 years
|
||||
|
||||
|
||||
## Percona
|
||||
|
||||
[Percona](https://www.percona.com/) is a leader in providing best-of-breed enterprise-class support, consulting, managed services, training and software for MySQL®, MariaDB®, MongoDB®, PostgreSQL® and other open source databases in on-premises and cloud environments.
|
||||
|
||||
Percona migrated from Prometheus to VictoriaMetrics in the [Percona Monitoring and Management](https://www.percona.com/software/database-tools/percona-monitoring-and-management) product. This allowed [reducing resource usage](https://www.percona.com/blog/2020/12/23/observations-on-better-resource-usage-with-percona-monitoring-and-management-v2-12-0/) and [getting rid of complex firewall setup](https://www.percona.com/blog/2020/12/01/foiled-by-the-firewall-a-tale-of-transition-from-prometheus-to-victoriametrics/), while [improving user experience](https://www.percona.com/blog/2020/02/28/better-prometheus-rate-function-with-victoriametrics/).
|
||||
|
||||
|
||||
## Razorpay
|
||||
|
||||
[Razorpay](https://razorpay.com/) aims to revolutionize money management for online businesses by providing clean, developer-friendly APIs and hassle-free integration.
|
||||
|
||||
> As a fintech organization, we move billions of dollars every month. Our customers and merchants have entrusted us with a paramount responsibility. To handle our ever-growing business, building a robust observability stack is not just “nice to have”, but absolutely essential. And all of this starts with better monitoring and metrics.
|
||||
|
||||
> We executed a variety of POCs on various solutions and finally arrived at the following technologies: M3DB, Thanos, Cortex and VictoriaMetrics. The clear winner was VictoriaMetrics.
|
||||
|
||||
> The following are some of the basic observations we derived from Victoria Metrics:
|
||||
> * Simple components, each horizontally scalable.
|
||||
> * Clear separation between writes and reads.
|
||||
> * Runs from default configurations, with no extra frills.
|
||||
> * Default retention starts with 1 month
|
||||
> * Storage, ingestion, and reads can be easily scaled.
|
||||
> * High Compression store ~ 70% more compression.
|
||||
> * Currently running in production with commodity hardware with a good mix of spot instances.
|
||||
> * Successfully ran some of the worst Grafana dashboards/queries that have historically failed to run.
|
||||
|
||||
See [the full article](https://engineering.razorpay.com/scaling-to-trillions-of-metric-data-points-f569a5b654f2).
|
||||
|
||||
|
||||
## Sensedia
|
||||
|
||||
[Sensedia](https://www.sensedia.com) is a leading integration solutions provider with more than 120 enterprise clients across a range of sectors. Its world-class portfolio includes: an API Management Platform, Adaptive Governance, Events Hub, Service Mesh, Cloud Connectors and Strategic Professional Services' teams.
|
||||
|
||||
> Our initial requirements for monitoring solution: the metrics must be stored for 15 days, the solution must be scalable and must offer high availability of the metrics. It must being integrated into Grafana and allowing the use of PromQL when creating/editing dashboards in Grafana to obtain metrics from the Prometheus datasource. The solution also needs to receive data from Prometheus using HTTPS and needs to request a login and password to write/read the metrics. Details are available [in this article](https://nordicapis.com/api-monitoring-with-prometheus-grafana-alertmanager-and-victoriametrics/).
|
||||
|
||||
> We evaluated VictoriaMetrics, InfluxDB OpenSource and Enterprise, ElasticSearch, Thanos, Cortex, TimescaleDB/PostgreSQL and M3DB. We selected VictoriaMetrics because it has [good community support](https://slack.victoriametrics.com/), [good documentation](https://docs.victoriametrics.com/) and it just works.
|
||||
|
||||
> We started using VictoriaMetrics in the production environment days before the start of BlackFriday in 2020, the period of greatest use of the Sensedia API-Platform by customers. There was a record in the generation of metrics and there was no instability with the monitoring stack.
|
||||
|
||||
> We use VictoriaMetrics in cluster mode for centralized storage of metrics collected by several Prometheus servers installed in Kubernetes clusters from two different cloud providers. VictoriaMetrics has also been integrated with Grafana to view metrics.
|
||||
|
||||
[Aecio dos Santos Pires](http://aeciopires.com), Cloud Architect, Sensedia.
|
||||
|
||||
Numbers:
|
||||
- Cluster mode
|
||||
- Active time series: 700K
|
||||
- Ingestion rate: 70K datapoints per second
|
||||
- Datapoints: 112 billions
|
||||
- Data size on disk: 82 GB
|
||||
- Index size on disk: 30 GB
|
||||
- Churn rate: 3 million of new time series per day
|
||||
- Query response time (99th percentile): 500ms
|
||||
|
||||
|
||||
## Smarkets
|
||||
|
||||
[Smarkets](https://smarkets.com/) simplifies peer-to-peer trading on sporting and political events.
|
||||
|
||||
> We always wanted our developers to have out-of-the-box monitoring available for any application or service. Before we adopted Kubernetes this was achieved either with Prometheus metrics, or with statsd being sent over to the underlying host and then converted into Prometheus metrics. As we expanded our Kubernetes adoption and started to split clusters, we also wanted developers to be able to expose metrics directly to Prometheus by annotating services. Those metrics were then only available inside the cluster so they couldn’t be scraped globally.
|
||||
|
||||
> We considered three different solutions to improve our architecture:
|
||||
> * Prometheus + Cortex
|
||||
> * Prometheus + Thanos Receive
|
||||
> * Prometheus + Victoria Metrics
|
||||
|
||||
> We selected Victoria Metrics. Our new architecture has been very stable since it was put into production. With the previous setup we would have had two or three cardinality explosions in a two-week period, with this new one we have none.
|
||||
|
||||
See [the full article](https://smarketshq.com/monitoring-kubernetes-clusters-41a4b24c19e3).
|
||||
|
||||
|
||||
## Synthesio
|
||||
|
||||
[Synthesio](https://www.synthesio.com/) is the leading social intelligence tool for social media monitoring and analytics.
|
||||
|
||||
> We fully migrated from [Metrictank](https://grafana.com/oss/metrictank/) to VictoriaMetrics
|
||||
|
||||
Numbers:
|
||||
- Single node
|
||||
- Active time series: 5 millions
|
||||
- Datapoints: 1.25 trillions
|
||||
- Ingestion rate: 550K datapoints per second
|
||||
- Disk usage: 150 GB
|
||||
- Index size: 3 GB
|
||||
- Query duration 99th percentile: 147ms
|
||||
- Churn rate: 2400 new time series per day
|
||||
|
||||
## Wedos.com
|
||||
|
||||
> [Wedos](https://www.wedos.com/) is the biggest hosting provider in the Czech Republic. We have our own private data center that holds our servers and technologies. We are in the process of building a second, stae of the art data center where the servers will be cooled in an oil bath. We started using [cluster VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) to store Prometheus metrics from all our infrastructure after receiving positive references from people who had successfully used VictoriaMetrics.
|
||||
|
||||
Numbers:
|
||||
|
||||
* The number of acitve time series: 5M.
|
||||
* Ingestion rate: 170K data points per second.
|
||||
* Query duration: median is ~2ms, 99th percentile is ~50ms.
|
||||
|
||||
> We like that VictoriaMetrics is simple to configuree and requires zero maintenance. It works right out of the box and once it's set up you can just forget about it.
|
||||
|
||||
## Wix.com
|
||||
|
||||
[Wix.com](https://en.wikipedia.org/wiki/Wix.com) is the leading web development platform.
|
||||
|
||||
> We needed to redesign our metrics infrastructure from the ground up after the move to Kubernetes. We had tried out a few different options before landing on this solution which is working great. We have a Prometheus instance in every datacenter with 2 hours retention for local storage and remote write into [HA pair of single-node VictoriaMetrics instances](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#high-availability).
|
||||
|
||||
Numbers:
|
||||
|
||||
* The number of active time series per VictoriaMetrics instance is 50 millions.
|
||||
* The total number of time series per VictoriaMetrics instance is 5000 million.
|
||||
* Ingestion rate per VictoriaMetrics instance is 1.1 millions data points per second.
|
||||
* The total number of datapoints per VictoriaMetrics instance is 8.5 trillion.
|
||||
* The average churn rate is 150 millions new time series per day.
|
||||
* The average query rate is ~150 per second (mostly alert queries).
|
||||
* Query duration: median is ~1ms, 99th percentile is ~1sec.
|
||||
* Retention period: 3 months.
|
||||
|
||||
> The alternatives that we tested prior to choosing VictoriaMetrics were: Prometheus federated, Cortex, IronDB and Thanos.
|
||||
> The items that were critical to us central tsdb, in order of importance were as follows:
|
||||
|
||||
* At least 3 month worth of retention.
|
||||
* Raw data, no aggregation, no sampling.
|
||||
* High query speed.
|
||||
* Clean fail state for HA (multi-node clusters may return partial data resulting in false alerts).
|
||||
* Enough headroom/scaling capacity for future growth which is planned to be up to 100M active time series.
|
||||
* Ability to split DB replicas per workload. Alert queries go to one replica and user queries go to another (speed for users, effective cache).
|
||||
|
||||
> Optimizing for those points and our specific workload, VictoriaMetrics proved to be the best option. As icing on the cake we’ve got [PromQL extensions](https://docs.victoriametrics.com/MetricsQL.html) - `default 0` and `histogram` are my favorite ones. We really like having a lot of tsdb params easily available via config options which makes tsdb easy to tune for each specific use case. We've also found a great community in [Slack channel](https://slack.victoriametrics.com/) and responsive and helpful maintainer support.
|
||||
|
||||
Alex Ulstein, Head of Monitoring, Wix.com
|
||||
|
||||
## Zerodha
|
||||
|
||||
[Zerodha](https://zerodha.com/) is India's largest stock broker. The monitoring team at Zerodha had the following requirements:
|
||||
|
||||
* Multiple K8s clusters to monitor
|
||||
* Consistent monitoring infra for each cluster across the fleet
|
||||
* The ability to handle billions of timeseries events at any point of time
|
||||
* Easy to operate and cost effective
|
||||
|
||||
Thanos, Cortex and VictoriaMetrics were evaluated as a long-term storage for Prometheus. VictoriaMetrics has been selected for the following reasons:
|
||||
|
||||
* Blazingly fast benchmarks for a single node setup.
|
||||
* Single binary mode. Easy to scale vertically with far fewer operational headaches.
|
||||
* Considerable [improvements on creating Histograms](https://medium.com/@valyala/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350).
|
||||
* [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) gives us the ability to extend PromQL with more aggregation operators.
|
||||
* The API is compatible with Prometheus and nearly all standard PromQL queries work well out of the box.
|
||||
* Handles storage well, with periodic compaction which makes it easy to take snapshots.
|
||||
|
||||
Please see [Monitoring K8S with VictoriaMetrics](https://docs.google.com/presentation/d/1g7yUyVEaAp4tPuRy-MZbPXKqJ1z78_5VKuV841aQfsg/edit) slides,
|
||||
[video](https://youtu.be/ZJQYW-cFOms) and [Infrastructure monitoring with Prometheus at Zerodha](https://zerodha.tech/blog/infra-monitoring-at-zerodha/) blog post for more details.
|
||||
|
||||
|
||||
## zhihu
|
||||
|
||||
[zhihu](https://www.zhihu.com) is the largest Chinese question-and-answer website. We use VictoriaMetrics to store and use Graphite metrics. We shared the [promate](https://github.com/zhihu/promate) solution in our [单机 20 亿指标,知乎 Graphite 极致优化!](https://qcon.infoq.cn/2020/shenzhen/presentation/2881)([slides](https://static001.geekbang.org/con/76/pdf/828698018/file/%E5%8D%95%E6%9C%BA%2020%20%E4%BA%BF%E6%8C%87%E6%A0%87%EF%BC%8C%E7%9F%A5%E4%B9%8E%20Graphite%20%E6%9E%81%E8%87%B4%E4%BC%98%E5%8C%96%EF%BC%81-%E7%86%8A%E8%B1%B9.pdf)) talk at [QCon 2020](https://qcon.infoq.cn/2020/shenzhen/).
|
||||
|
||||
Numbers:
|
||||
|
||||
- Active time series: ~25 Million
|
||||
- Datapoints: ~20 Trillion
|
||||
- Ingestion rate: ~1800k/s
|
||||
- Disk usage: ~20 TB
|
||||
- Index size: ~600 GB
|
||||
- The average query rate is ~3k per second (mostly alert queries).
|
||||
- Query duration: median is ~40ms, 99th percentile is ~100ms.
|
||||
860
Cluster-VictoriaMetrics.md
Normal file
@@ -0,0 +1,860 @@
|
||||
---
|
||||
sort: 2
|
||||
---
|
||||
|
||||
# Cluster version
|
||||
|
||||
<img alt="VictoriaMetrics" src="logo.png">
|
||||
|
||||
VictoriaMetrics is a fast, cost-effective and scalable time series database. It can be used as a long-term remote storage for Prometheus.
|
||||
|
||||
It is recommended using [single-node version](https://github.com/VictoriaMetrics/VictoriaMetrics) instead of cluster version
|
||||
for ingestion rates lower than a million of data points per second.
|
||||
Single-node version [scales perfectly](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae)
|
||||
with the number of CPU cores, RAM and available storage space.
|
||||
Single-node version is easier to configure and operate comparing to cluster version, so think twice before sticking to cluster version.
|
||||
|
||||
Join [our Slack](https://slack.victoriametrics.com/) or [contact us](mailto:info@victoriametrics.com) with consulting and support questions.
|
||||
|
||||
|
||||
## Prominent features
|
||||
|
||||
- Supports all the features of [single-node version](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
- Performance and capacity scales horizontally. See [these docs for details](#cluster-resizing-and-scalability).
|
||||
- Supports multiple independent namespaces for time series data (aka multi-tenancy). See [these docs for details](#multitenancy).
|
||||
- Supports replication. See [these docs for details](#replication-and-data-safety).
|
||||
|
||||
|
||||
## Architecture overview
|
||||
|
||||
VictoriaMetrics cluster consists of the following services:
|
||||
|
||||
- `vmstorage` - stores the raw data and returns the queried data on the given time range for the given label filters
|
||||
- `vminsert` - accepts the ingested data and spreads it among `vmstorage` nodes according to consistent hashing over metric name and all its labels
|
||||
- `vmselect` - performs incoming queries by fetching the needed data from all the configured `vmstorage` nodes
|
||||
|
||||
Each service may scale independently and may run on the most suitable hardware.
|
||||
`vmstorage` nodes don't know about each other, don't communicate with each other and don't share any data.
|
||||
This is [shared nothing architecture](https://en.wikipedia.org/wiki/Shared-nothing_architecture).
|
||||
It increases cluster availability, simplifies cluster maintenance and cluster scaling.
|
||||
|
||||
<img src="https://docs.google.com/drawings/d/e/2PACX-1vTvk2raU9kFgZ84oF-OKolrGwHaePhHRsZEcfQ1I_EC5AB_XPWwB392XshxPramLJ8E4bqptTnFn5LL/pub?w=1104&h=746">
|
||||
|
||||
|
||||
## Multitenancy
|
||||
|
||||
VictoriaMetrics cluster supports multiple isolated tenants (aka namespaces).
|
||||
Tenants are identified by `accountID` or `accountID:projectID`, which are put inside request urls.
|
||||
See [these docs](#url-format) for details. Some facts about tenants in VictoriaMetrics:
|
||||
|
||||
* Each `accountID` and `projectID` is identified by an arbitrary 32-bit integer in the range `[0 .. 2^32)`.
|
||||
If `projectID` is missing, then it is automatically assigned to `0`. It is expected that other information about tenants
|
||||
such as auth tokens, tenant names, limits, accounting, etc. is stored in a separate relational database. This database must be managed
|
||||
by a separate service sitting in front of VictoriaMetrics cluster such as [vmauth](https://docs.victoriametrics.com/vmauth.html) or [vmgateway](https://docs.victoriametrics.com/vmgateway.html). [Contact us](mailto:info@victoriametrics.com) if you need assistance with such service.
|
||||
|
||||
* Tenants are automatically created when the first data point is written into the given tenant.
|
||||
|
||||
* Data for all the tenants is evenly spread among available `vmstorage` nodes. This guarantees even load among `vmstorage` nodes
|
||||
when different tenants have different amounts of data and different query load.
|
||||
|
||||
* The database performance and resource usage doesn't depend on the number of tenants. It depends mostly on the total number of active time series in all the tenants. A time series is considered active if it received at least a single sample during the last hour or it has been touched by queries during the last hour.
|
||||
|
||||
* VictoriaMetrics doesn't support querying multiple tenants in a single request.
|
||||
|
||||
|
||||
## Binaries
|
||||
|
||||
Compiled binaries for cluster version are available in the `assets` section of [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases).
|
||||
See archives containing `cluster` word.
|
||||
|
||||
Docker images for cluster version are available here:
|
||||
|
||||
- `vminsert` - https://hub.docker.com/r/victoriametrics/vminsert/tags
|
||||
- `vmselect` - https://hub.docker.com/r/victoriametrics/vmselect/tags
|
||||
- `vmstorage` - https://hub.docker.com/r/victoriametrics/vmstorage/tags
|
||||
|
||||
|
||||
## Building from sources
|
||||
|
||||
Source code for cluster version is available at [cluster branch](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
|
||||
|
||||
|
||||
### Production builds
|
||||
|
||||
There is no need in installing Go on a host system since binaries are built
|
||||
inside [the official docker container for Go](https://hub.docker.com/_/golang).
|
||||
This makes reproducible builds.
|
||||
So [install docker](https://docs.docker.com/install/) and run the following command:
|
||||
|
||||
```
|
||||
make vminsert-prod vmselect-prod vmstorage-prod
|
||||
```
|
||||
|
||||
Production binaries are built into statically linked binaries. They are put into `bin` folder with `-prod` suffixes:
|
||||
```
|
||||
$ make vminsert-prod vmselect-prod vmstorage-prod
|
||||
$ ls -1 bin
|
||||
vminsert-prod
|
||||
vmselect-prod
|
||||
vmstorage-prod
|
||||
```
|
||||
|
||||
### Development Builds
|
||||
|
||||
1. [Install go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make` from [the repository root](https://github.com/VictoriaMetrics/VictoriaMetrics). It should build `vmstorage`, `vmselect`
|
||||
and `vminsert` binaries and put them into the `bin` folder.
|
||||
|
||||
|
||||
### Building docker images
|
||||
|
||||
Run `make package`. It will build the following docker images locally:
|
||||
|
||||
* `victoriametrics/vminsert:<PKG_TAG>`
|
||||
* `victoriametrics/vmselect:<PKG_TAG>`
|
||||
* `victoriametrics/vmstorage:<PKG_TAG>`
|
||||
|
||||
`<PKG_TAG>` is auto-generated image tag, which depends on source code in [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package`.
|
||||
|
||||
By default images are built on top of [alpine](https://hub.docker.com/_/scratch) image in order to improve debuggability.
|
||||
It is possible to build an image on top of any other base image by setting it via `<ROOT_IMAGE>` environment variable.
|
||||
For example, the following command builds images on top of [scratch](https://hub.docker.com/_/scratch) image:
|
||||
|
||||
```bash
|
||||
ROOT_IMAGE=scratch make package
|
||||
```
|
||||
|
||||
## Operation
|
||||
|
||||
## Cluster setup
|
||||
|
||||
A minimal cluster must contain the following nodes:
|
||||
|
||||
* a single `vmstorage` node with `-retentionPeriod` and `-storageDataPath` flags
|
||||
* a single `vminsert` node with `-storageNode=<vmstorage_host>`
|
||||
* a single `vmselect` node with `-storageNode=<vmstorage_host>`
|
||||
|
||||
It is recommended to run at least two nodes for each service
|
||||
for high availability purposes.
|
||||
|
||||
An http load balancer such as [vmauth](https://docs.victoriametrics.com/vmauth.html) or `nginx` must be put in front of `vminsert` and `vmselect` nodes. It must contain the following routing configs according to [the url format](#url-format):
|
||||
- requests starting with `/insert` must be routed to port `8480` on `vminsert` nodes.
|
||||
- requests starting with `/select` must be routed to port `8481` on `vmselect` nodes.
|
||||
|
||||
Ports may be altered by setting `-httpListenAddr` on the corresponding nodes.
|
||||
|
||||
It is recommended setting up [monitoring](#monitoring) for the cluster.
|
||||
|
||||
The following tools can simplify cluster setup:
|
||||
* [An example docker-compose config for VictoriaMetrics cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/deployment/docker/docker-compose.yml)
|
||||
* [Helm charts for VictoriaMetrics](https://github.com/VictoriaMetrics/helm-charts)
|
||||
* [Kubernetes operator for VictoriaMetrics](https://github.com/VictoriaMetrics/operator)
|
||||
|
||||
|
||||
It is possible manualy setting up a toy cluster on a single host. In this case every cluster component - `vminsert`, `vmselect` and `vmstorage` - must have distinct values for `-httpListenAddr` command-line flag. This flag specifies http address for accepting http requests for [monitoring](#monitoring) and [profiling](#profiling). `vmstorage` node must have distinct values for the following additional command-line flags in order to prevent resource usage clash:
|
||||
* `-storageDataPath` - every `vmstorage` node must have a dedicated data storage.
|
||||
* `-vminsertAddr` - every `vmstorage` node must listen for a distinct tcp address for accepting data from `vminsert` nodes.
|
||||
* `-vmselectAddr` - every `vmstorage` node must listen for a distinct tcp address for accepting requests from `vmselect` nodes.
|
||||
|
||||
|
||||
### Environment variables
|
||||
|
||||
Each flag values can be set thru environment variables by following these rules:
|
||||
|
||||
- The `-envflag.enable` flag must be set
|
||||
- Each `.` in flag names must be substituted by `_` (for example `-insert.maxQueueDuration <duration>` will translate to `insert_maxQueueDuration=<duration>`)
|
||||
- For repeating flags, an alternative syntax can be used by joining the different values into one using `,` as separator (for example `-storageNode <nodeA> -storageNode <nodeB>` will translate to `storageNode=<nodeA>,<nodeB>`)
|
||||
- It is possible setting prefix for environment vars with `-envflag.prefix`. For instance, if `-envflag.prefix=VM_`, then env vars must be prepended with `VM_`
|
||||
|
||||
|
||||
## Monitoring
|
||||
|
||||
All the cluster components expose various metrics in Prometheus-compatible format at `/metrics` page on the TCP port set in `-httpListenAddr` command-line flag.
|
||||
By default the following TCP ports are used:
|
||||
- `vminsert` - 8480
|
||||
- `vmselect` - 8481
|
||||
- `vmstorage` - 8482
|
||||
|
||||
It is recommended setting up [vmagent](https://docs.victoriametrics.com/vmagent.html)
|
||||
or Prometheus to scrape `/metrics` pages from all the cluster components, so they can be monitored and analyzed
|
||||
with [the official Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176)
|
||||
or [an alternative dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11831). Graphs on these dashboards contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
|
||||
|
||||
It is recommended setting up alerts in [vmalert](https://docs.victoriametrics.com/vmalert.html) or in Prometheus from [this config](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/deployment/docker/alerts.yml).
|
||||
|
||||
## Readonly mode
|
||||
|
||||
`vmstorage` nodes automatically switch to readonly mode when the directory pointed by `-storageDataPath` contains less than `-storage.minFreeDiskSpaceBytes` of free space. `vminsert` nodes stop sending data to such nodes and start re-routing the data to the remaining `vmstorage` nodes.
|
||||
|
||||
|
||||
|
||||
## URL format
|
||||
|
||||
* URLs for data ingestion: `http://<vminsert>:8480/insert/<accountID>/<suffix>`, where:
|
||||
- `<accountID>` is an arbitrary 32-bit integer identifying namespace for data ingestion (aka tenant). It is possible to set it as `accountID:projectID`,
|
||||
where `projectID` is also arbitrary 32-bit integer. If `projectID` isn't set, then it equals to `0`.
|
||||
- `<suffix>` may have the following values:
|
||||
- `prometheus` and `prometheus/api/v1/write` - for inserting data with [Prometheus remote write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
|
||||
- `datadog/api/v1/series` - for inserting data with [DataDog submit metrics API](https://docs.datadoghq.com/api/latest/metrics/#submit-metrics). See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-datadog-agent) for details.
|
||||
- `influx/write` and `influx/api/v2/write` - for inserting data with [InfluxDB line protocol](https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/). See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf) for details.
|
||||
- `opentsdb/api/put` - for accepting [OpenTSDB HTTP /api/put requests](http://opentsdb.net/docs/build/html/api_http/put.html). This handler is disabled by default. It is exposed on a distinct TCP address set via `-opentsdbHTTPListenAddr` command-line flag. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#sending-opentsdb-data-via-http-apiput-requests) for details.
|
||||
- `prometheus/api/v1/import` - for importing data obtained via `api/v1/export` at `vmselect` (see below).
|
||||
- `prometheus/api/v1/import/native` - for importing data obtained via `api/v1/export/native` on `vmselect` (see below).
|
||||
- `prometheus/api/v1/import/csv` - for importing arbitrary CSV data. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-csv-data) for details.
|
||||
- `prometheus/api/v1/import/prometheus` - for importing data in [Prometheus text exposition format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) and in [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md). See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format) for details.
|
||||
|
||||
* URLs for [Prometheus querying API](https://prometheus.io/docs/prometheus/latest/querying/api/): `http://<vmselect>:8481/select/<accountID>/prometheus/<suffix>`, where:
|
||||
- `<accountID>` is an arbitrary number identifying data namespace for the query (aka tenant)
|
||||
- `<suffix>` may have the following values:
|
||||
- `api/v1/query` - performs [PromQL instant query](https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries).
|
||||
- `api/v1/query_range` - performs [PromQL range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
|
||||
- `api/v1/series` - performs [series query](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers).
|
||||
- `api/v1/labels` - returns a [list of label names](https://prometheus.io/docs/prometheus/latest/querying/api/#getting-label-names).
|
||||
- `api/v1/label/<label_name>/values` - returns values for the given `<label_name>` according [to API](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values).
|
||||
- `federate` - returns [federated metrics](https://prometheus.io/docs/prometheus/latest/federation/).
|
||||
- `api/v1/export` - exports raw data in JSON line format. See [this article](https://medium.com/@valyala/analyzing-prometheus-data-with-external-tools-5f3e5e147639) for details.
|
||||
- `api/v1/export/native` - exports raw data in native binary format. It may be imported into another VictoriaMetrics via `api/v1/import/native` (see above).
|
||||
- `api/v1/export/csv` - exports data in CSV. It may be imported into another VictoriaMetrics via `api/v1/import/csv` (see above).
|
||||
- `api/v1/series/count` - returns the total number of series.
|
||||
- `api/v1/status/tsdb` - for time series stats. See [these docs](https://docs.victoriametrics.com/#tsdb-stats) for details.
|
||||
- `api/v1/status/active_queries` - for currently executed active queries. Note that every `vmselect` maintains an independent list of active queries,
|
||||
which is returned in the response.
|
||||
- `api/v1/status/top_queries` - for listing the most frequently executed queries and queries taking the most duration.
|
||||
|
||||
* URLs for [Graphite Metrics API](https://graphite-api.readthedocs.io/en/latest/api.html#the-metrics-api): `http://<vmselect>:8481/select/<accountID>/graphite/<suffix>`, where:
|
||||
- `<accountID>` is an arbitrary number identifying data namespace for query (aka tenant)
|
||||
- `<suffix>` may have the following values:
|
||||
- `render` - implements Graphite Render API. See [these docs](https://graphite.readthedocs.io/en/stable/render_api.html). This functionality is available in [Enterprise package](https://victoriametrics.com/products/enterprise/).
|
||||
- `metrics/find` - searches Graphite metrics. See [these docs](https://graphite-api.readthedocs.io/en/latest/api.html#metrics-find).
|
||||
- `metrics/expand` - expands Graphite metrics. See [these docs](https://graphite-api.readthedocs.io/en/latest/api.html#metrics-expand).
|
||||
- `metrics/index.json` - returns all the metric names. See [these docs](https://graphite-api.readthedocs.io/en/latest/api.html#metrics-index-json).
|
||||
- `tags/tagSeries` - registers time series. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb).
|
||||
- `tags/tagMultiSeries` - register multiple time series. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#adding-series-to-the-tagdb).
|
||||
- `tags` - returns tag names. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags).
|
||||
- `tags/<tag_name>` - returns tag values for the given `<tag_name>`. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags).
|
||||
- `tags/findSeries` - returns series matching the given `expr`. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#exploring-tags).
|
||||
- `tags/autoComplete/tags` - returns tags matching the given `tagPrefix` and/or `expr`. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#auto-complete-support).
|
||||
- `tags/autoComplete/values` - returns tag values matching the given `valuePrefix` and/or `expr`. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#auto-complete-support).
|
||||
- `tags/delSeries` - deletes series matching the given `path`. See [these docs](https://graphite.readthedocs.io/en/stable/tags.html#removing-series-from-the-tagdb).
|
||||
|
||||
* URL with basic Web UI: `http://<vmselect>:8481/select/<accountID>/vmui/`.
|
||||
|
||||
* URL for query stats across all tenants: `http://<vmselect>:8481/api/v1/status/top_queries`. It lists with the most frequently executed queries and queries taking the most duration.
|
||||
|
||||
* URL for time series deletion: `http://<vmselect>:8481/delete/<accountID>/prometheus/api/v1/admin/tsdb/delete_series?match[]=<timeseries_selector_for_delete>`.
|
||||
Note that the `delete_series` handler should be used only in exceptional cases such as deletion of accidentally ingested incorrect time series. It shouldn't
|
||||
be used on a regular basis, since it carries non-zero overhead.
|
||||
|
||||
* `vmstorage` nodes provide the following HTTP endpoints on `8482` port:
|
||||
- `/internal/force_merge` - initiate [forced compactions](https://docs.victoriametrics.com/#forced-merge) on the given `vmstorage` node.
|
||||
- `/snapshot/create` - create [instant snapshot](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282),
|
||||
which can be used for backups in background. Snapshots are created in `<storageDataPath>/snapshots` folder, where `<storageDataPath>` is the corresponding
|
||||
command-line flag value.
|
||||
- `/snapshot/list` - list available snasphots.
|
||||
- `/snapshot/delete?snapshot=<id>` - delete the given snapshot.
|
||||
- `/snapshot/delete_all` - delete all the snapshots.
|
||||
|
||||
Snapshots may be created independently on each `vmstorage` node. There is no need in synchronizing snapshots' creation
|
||||
across `vmstorage` nodes.
|
||||
|
||||
|
||||
## Cluster resizing and scalability
|
||||
|
||||
Cluster performance and capacity scales with adding new nodes.
|
||||
|
||||
* `vminsert` and `vmselect` nodes are stateless and may be added / removed at any time.
|
||||
Do not forget updating the list of these nodes on http load balancer.
|
||||
Adding more `vminsert` nodes scales data ingestion rate. See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/175#issuecomment-536925841)
|
||||
about ingestion rate scalability.
|
||||
Adding more `vmselect` nodes scales select queries rate.
|
||||
* `vmstorage` nodes own the ingested data, so they cannot be removed without data loss.
|
||||
Adding more `vmstorage` nodes scales cluster capacity.
|
||||
|
||||
Steps to add `vmstorage` node:
|
||||
|
||||
1. Start new `vmstorage` node with the same `-retentionPeriod` as existing nodes in the cluster.
|
||||
2. Gradually restart all the `vmselect` nodes with new `-storageNode` arg containing `<new_vmstorage_host>`.
|
||||
3. Gradually restart all the `vminsert` nodes with new `-storageNode` arg containing `<new_vmstorage_host>`.
|
||||
|
||||
|
||||
## Updating / reconfiguring cluster nodes
|
||||
|
||||
All the node types - `vminsert`, `vmselect` and `vmstorage` - may be updated via graceful shutdown.
|
||||
Send `SIGINT` signal to the corresponding process, wait until it finishes and then start new version
|
||||
with new configs.
|
||||
|
||||
Cluster should remain in working state if at least a single node of each type remains available during
|
||||
the update process. See [cluster availability](#cluster-availability) section for details.
|
||||
|
||||
See also more advanced [cardinality limiter in vmagent](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter).
|
||||
|
||||
|
||||
## Cluster availability
|
||||
|
||||
* HTTP load balancer must stop routing requests to unavailable `vminsert` and `vmselect` nodes.
|
||||
* The cluster remains available if at least a single `vmstorage` node exists:
|
||||
|
||||
- `vminsert` re-routes incoming data from unavailable `vmstorage` nodes to healthy `vmstorage` nodes
|
||||
- `vmselect` continues serving partial responses if at least a single `vmstorage` node is available. If consistency over availability is preferred, then either pass `-search.denyPartialResponse` command-line flag to `vmselect` or pass `deny_partial_response=1` query arg in requests to `vmselect`.
|
||||
|
||||
`vmselect` doesn't serve partial responses for API handlers returning raw datapoints - [`/api/v1/export*` endpoints](https://docs.victoriametrics.com/#how-to-export-time-series), since users usually expect this data is always complete.
|
||||
|
||||
Data replication can be used for increasing storage durability. See [these docs](#replication-and-data-safety) for details.
|
||||
|
||||
|
||||
## Capacity planning
|
||||
|
||||
VictoriaMetrics uses lower amounts of CPU, RAM and storage space on production workloads compared to competing solutions (Prometheus, Thanos, Cortex, TimescaleDB, InfluxDB, QuestDB, M3DB) according to [our case studies](https://docs.victoriametrics.com/CaseStudies.html).
|
||||
|
||||
Each node type - `vminsert`, `vmselect` and `vmstorage` - can run on the most suitable hardware. Cluster capacity scales linearly with the available resources. The needed amounts of CPU and RAM per each node type highly depends on the workload - the number of [active time series](https://docs.victoriametrics.com/FAQ.html#what-is-active-time-series), [series churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate), query types, query qps, etc. It is recommended setting up a test VictoriaMetrics cluster for your production workload and iteratively scaling per-node resources and the number of nodes per node type until the cluster becomes stable. It is recommended setting up [monitoring for the cluster](#monitoring). It helps determining bottlenecks in cluster setup. It is also recommended following [the troubleshooting docs](https://docs.victoriametrics.com/#troubleshooting).
|
||||
|
||||
The needed storage space for the given retention (the retention is set via `-retentionPeriod` command-line flag at `vmstorage`) can be extrapolated from disk space usage in a test run. For example, if the storage space usage is 10GB after a day-long test run on a production workload, then it will need at least `10GB*100=1TB` of disk space for `-retentionPeriod=100d` (100-days retention period). Storage space usage can be monitored with [the official Grafana dashboard for VictoriaMetrics cluster](#monitoring).
|
||||
|
||||
It is recommended leaving the following amounts of spare resources:
|
||||
|
||||
* 50% of free RAM across all the node types for reducing the probability of OOM (out of memory) crashes and slowdowns during temporary spikes in workload.
|
||||
* 50% of spare CPU across all the node types for reducing the probability of slowdowns during temporary spikes in workload.
|
||||
* At least 30% of free storage space at the directory pointed by `-storageDataPath` command-line flag at `vmstorage` nodes. See also `-storage.minFreeDiskSpaceBytes` command-line flag [description for vmstorage](#list-of-command-line-flags-for-vmstorage).
|
||||
|
||||
|
||||
Some capacity planning tips for VictoriaMetrics cluster:
|
||||
|
||||
* The [replication](#replication-and-data-safety) increases the amounts of needed resources for the cluster by up to `N` times where `N` is replication factor.
|
||||
* Cluster capacity for [active time series](https://docs.victoriametrics.com/FAQ.html#what-is-active-time-series) can be increased by adding more `vmstorage` nodes and/or by increasing RAM and CPU resources per each `vmstorage` node.
|
||||
* Query latency can be reduced by increasing the number of `vmstorage` nodes and/or by increasing RAM and CPU resources per each `vmselect` node.
|
||||
* The total number of CPU cores needed for all the `vminsert` nodes can be calculated from the ingestion rate: `CPUs = ingestion_rate / 100K`.
|
||||
* The `-rpc.disableCompression` command-line flag at `vminsert` nodes can increase ingestion capacity at the cost of higher network bandwidth usage between `vminsert` and `vmstorage`.
|
||||
|
||||
|
||||
## High availability
|
||||
|
||||
It is recommended to run all the components for a single cluster in the same subnetwork with high bandwidth, low latency and low error rates.
|
||||
This improves cluster performance and availability.
|
||||
It isn't recommended spreading components for a single cluster across multiple availability zones, since cross-AZ network usually has lower bandwidth, higher latency
|
||||
and higher error rates comparing the network inside AZ.
|
||||
|
||||
If you need multi-AZ setup, then it is recommended running independed clusters in each AZ and setting up
|
||||
[vmagent](https://docs.victoriametrics.com/vmagent.html) in front of these clusters, so it could replicate incoming data
|
||||
into all the cluster. Then [promxy](https://github.com/jacksontj/promxy) could be used for querying the data from multiple clusters.
|
||||
|
||||
Another solution is to use [multi-level cluster setup](#multi-level-cluster-setup).
|
||||
|
||||
|
||||
## Multi-level cluster setup
|
||||
|
||||
`vminsert` nodes can accept data from another `vminsert` nodes starting from [v1.60.0](https://docs.victoriametrics.com/CHANGELOG.html#v1600) if `-clusternativeListenAddr` command-line flag is set. For example, if `vminsert` is started with `-clusternativeListenAddr=:8400` command-line flag, then it can accept data from another `vminsert` nodes at TCP port 8400 in the same way as `vmstorage` nodes do. This allows chaining `vminsert` nodes and building multi-level cluster topologies with flexible configs. For example, the top level of `vminsert` nodes can replicate data among the second level of `vminsert` nodes located in distinct availability zones (AZ), while the second-level `vminsert` nodes can spread the data among `vmstorage` nodes located in the same AZ. Such setup guarantees cluster availability if some AZ becomes unavailable. The data from all the `vmstorage` nodes in all the AZs can be read via `vmselect` nodes, which are configured to query all the `vmstorage` nodes in all the availability zones (e.g. all the `vmstorage` addresses are passed via `-storageNode` command-line flag to `vmselect` nodes). Additionally, `-replicationFactor=k+1` must be passed to `vmselect` nodes, where `k` is the lowest number of `vmstorage` nodes in a single AZ. See [replication docs](#replication-and-data-safety) for more details.
|
||||
|
||||
Another option is to set up [vmagent](https://docs.victoriametrics.com/vmagent.html) for replicating the data among multiple VictoriaMetrics clusters. See [these docs](https://docs.victoriametrics.com/vmagent.html#multitenancy) for details.
|
||||
|
||||
|
||||
## Helm
|
||||
|
||||
Helm chart simplifies managing cluster version of VictoriaMetrics in Kubernetes.
|
||||
It is available in the [helm-charts](https://github.com/VictoriaMetrics/helm-charts) repository.
|
||||
|
||||
|
||||
## Kubernetes operator
|
||||
|
||||
[K8s operator](https://github.com/VictoriaMetrics/operator) simplifies managing VictoriaMetrics components in Kubernetes.
|
||||
|
||||
|
||||
## Replication and data safety
|
||||
|
||||
By default VictoriaMetrics offloads replication to the underlying storage pointed by `-storageDataPath`.
|
||||
|
||||
The replication can be enabled by passing `-replicationFactor=N` command-line flag to `vminsert`.
|
||||
This guarantees that all the data remains available for querying if up to `N-1` `vmstorage` nodes are unavailable.
|
||||
The cluster must contain at least `2*N-1` `vmstorage` nodes, where `N`
|
||||
is replication factor, in order to maintain the given replication factor for newly ingested data when `N-1` of storage nodes are lost.
|
||||
For example, when `-replicationFactor=3` is passed to `vminsert`, then it replicates all the ingested data to 3 distinct `vmstorage` nodes,
|
||||
so up to 2 `vmstorage` nodes can be lost without data loss. The minimum number of `vmstorage` nodes should be equal to `2*3-1 = 5`, so when 2 `vmstorage` nodes are lost,
|
||||
the remaining 3 `vmstorage` nodes could provide the `-replicationFactor=3` for newly ingested data.
|
||||
|
||||
When the replication is enabled, `-dedup.minScrapeInterval=1ms` command-line flag must be passed to `vmselect` nodes.
|
||||
Optional `-replicationFactor=N` command-line flag can be passed to `vmselect` for improving query performance when up to `N-1` vmstorage nodes respond slowly and/or temporarily unavailable, since `vmselect` doesn't wait for responses from up to `N-1` `vmstorage` nodes. Sometimes `-replicationFactor` at `vmselect` nodes can result in partial responses. See [this issues](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1207) for details.
|
||||
The `-dedup.minScrapeInterval=1ms` de-duplicates replicated data during queries. If duplicate data is pushed to VictoriaMetrics from identically configured [vmagent](https://docs.victoriametrics.com/vmagent.html) instances or Prometheus instances, then the `-dedup.minScrapeInterval` must be set to bigger values according to [deduplication docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication).
|
||||
|
||||
Note that [replication doesn't save from disaster](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883),
|
||||
so it is recommended performing regular backups. See [these docs](#backups) for details.
|
||||
|
||||
Note that the replication increases resource usage - CPU, RAM, disk space, network bandwidth - by up to `-replicationFactor` times. So it may be worth
|
||||
offloading the replication to underlying storage pointed by `-storageDataPath` such as [Google Compute Engine persistent disk](https://cloud.google.com/compute/docs/disks/#pdspecs),
|
||||
which is protected from data loss and data corruption. It also provide consistently high performance
|
||||
and [may be resized](https://cloud.google.com/compute/docs/disks/add-persistent-disk) without downtime.
|
||||
HDD-based persistent disks should be enough for the majority of use cases.
|
||||
|
||||
It is recommended using durable replicated persistent volumes in Kubernetes.
|
||||
|
||||
|
||||
## Backups
|
||||
|
||||
It is recommended performing periodical backups from [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
|
||||
for protecting from user errors such as accidental data deletion.
|
||||
|
||||
The following steps must be performed for each `vmstorage` node for creating a backup:
|
||||
|
||||
1. Create an instant snapshot by navigating to `/snapshot/create` HTTP handler. It will create snapshot and return its name.
|
||||
2. Archive the created snapshot from `<-storageDataPath>/snapshots/<snapshot_name>` folder using [vmbackup](https://docs.victoriametrics.com/vmbackup.html).
|
||||
The archival process doesn't interfere with `vmstorage` work, so it may be performed at any suitable time.
|
||||
3. Delete unused snapshots via `/snapshot/delete?snapshot=<snapshot_name>` or `/snapshot/delete_all` in order to free up occupied storage space.
|
||||
|
||||
There is no need in synchronizing backups among all the `vmstorage` nodes.
|
||||
|
||||
Restoring from backup:
|
||||
|
||||
1. Stop `vmstorage` node with `kill -INT`.
|
||||
2. Restore data from backup using [vmrestore](https://docs.victoriametrics.com/vmrestore.html) into `-storageDataPath` directory.
|
||||
3. Start `vmstorage` node.
|
||||
|
||||
|
||||
## Downsampling
|
||||
|
||||
Downsampling is available in [enterprise version of VictoriaMetrics](https://victoriametrics.com/products/enterprise/). It is configured with `-downsampling.period` command-line flag. The same flag value must be passed to both `vmstorage` and `vmselect` nodes. See [these docs](https://docs.victoriametrics.com/#downsampling) for details.
|
||||
|
||||
|
||||
## Profiling
|
||||
|
||||
All the cluster components provide the following handlers for [profiling](https://blog.golang.org/profiling-go-programs):
|
||||
|
||||
* `http://vminsert:8480/debug/pprof/heap` for memory profile and `http://vminsert:8480/debug/pprof/profile` for CPU profile
|
||||
* `http://vmselect:8481/debug/pprof/heap` for memory profile and `http://vmselect:8481/debug/pprof/profile` for CPU profile
|
||||
* `http://vmstorage:8482/debug/pprof/heap` for memory profile and `http://vmstorage:8482/debug/pprof/profile` for CPU profile
|
||||
|
||||
Example command for collecting cpu profile from `vmstorage`:
|
||||
|
||||
```bash
|
||||
curl -s http://vmstorage:8482/debug/pprof/profile > cpu.pprof
|
||||
```
|
||||
|
||||
Example command for collecting memory profile from `vminsert`:
|
||||
|
||||
```bash
|
||||
curl -s http://vminsert:8480/debug/pprof/heap > mem.pprof
|
||||
```
|
||||
|
||||
|
||||
## Community and contributions
|
||||
|
||||
We are open to third-party pull requests provided they follow the [KISS design principle](https://en.wikipedia.org/wiki/KISS_principle):
|
||||
|
||||
- Prefer simple code and architecture.
|
||||
- Avoid complex abstractions.
|
||||
- Avoid magic code and fancy algorithms.
|
||||
- Avoid [big external dependencies](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d).
|
||||
- Minimize the number of moving parts in the distributed system.
|
||||
- Avoid automated decisions, which may hurt cluster availability, consistency or performance.
|
||||
|
||||
Adhering to the `KISS` principle simplifies the resulting code and architecture, so it can be reviewed, understood and verified by many people.
|
||||
|
||||
Due to `KISS`, cluster version of VictoriaMetrics has no the following "features" popular in distributed computing world:
|
||||
|
||||
- Fragile gossip protocols. See [failed attempt in Thanos](https://github.com/improbable-eng/thanos/blob/030bc345c12c446962225221795f4973848caab5/docs/proposals/completed/201809_gossip-removal.md).
|
||||
- Hard-to-understand-and-implement-properly [Paxos protocols](https://www.quora.com/In-distributed-systems-what-is-a-simple-explanation-of-the-Paxos-algorithm).
|
||||
- Complex replication schemes, which may go nuts in unforeseen edge cases. See [replication docs](#replication-and-data-safety) for details.
|
||||
- Automatic data reshuffling between storage nodes, which may hurt cluster performance and availability.
|
||||
- Automatic cluster resizing, which may cost you a lot of money if improperly configured.
|
||||
- Automatic discovering and addition of new nodes in the cluster, which may mix data between dev and prod clusters :)
|
||||
- Automatic leader election, which may result in split brain disaster on network errors.
|
||||
|
||||
|
||||
## Reporting bugs
|
||||
|
||||
Report bugs and propose new features [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
|
||||
|
||||
|
||||
## List of command-line flags
|
||||
|
||||
* [List of command-line flags for vminsert](#list-of-command-line-flags-for-vminsert)
|
||||
* [List of command-line flags for vmselect](#list-of-command-line-flags-for-vmselect)
|
||||
* [List of command-line flags for vmstorage](#list-of-command-line-flags-for-vmstorage)
|
||||
|
||||
|
||||
### List of command-line flags for vminsert
|
||||
|
||||
Below is the output for `/path/to/vminsert -help`:
|
||||
|
||||
```
|
||||
-clusternativeListenAddr string
|
||||
TCP address to listen for data from other vminsert nodes in multi-level cluster setup. See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multi-level-cluster-setup . Usually :8400 must be set. Doesn't work if empty
|
||||
-csvTrimTimestamp duration
|
||||
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-datadog.maxInsertRequestSize size
|
||||
The maximum size in bytes of a single DataDog POST request to /api/v1/series
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 67108864)
|
||||
-disableRerouting
|
||||
Whether to disable re-routing when some of vmstorage nodes accept incoming data at slower speed compared to other storage nodes. Disabled re-routing limits the ingestion rate by the slowest vmstorage node. On the other side, disabled re-routing minimizes the number of active time series in the cluster during rolling restarts and during spikes in series churn rate (default true)
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-eula
|
||||
By specifying this flag, you confirm that you have an enterprise license and accept the EULA https://victoriametrics.com/legal/eula/
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-graphiteListenAddr string
|
||||
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
|
||||
-graphiteTrimTimestamp duration
|
||||
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpListenAddr string
|
||||
Address to listen for http connections (default ":8480")
|
||||
-import.maxLineLen size
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.databaseNames array
|
||||
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-influx.maxLineSize size
|
||||
The maximum size in bytes for a single InfluxDB line during parsing
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr string
|
||||
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<victoriametrics>:8428/write
|
||||
-influxMeasurementFieldSeparator string
|
||||
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
|
||||
-influxSkipMeasurement
|
||||
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
|
||||
-influxSkipSingleField
|
||||
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field
|
||||
-influxTrimTimestamp duration
|
||||
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-insert.maxQueueDuration duration
|
||||
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-maxConcurrentInserts int
|
||||
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
|
||||
-maxInsertRequestSize size
|
||||
The maximum size in bytes of a single Prometheus remote_write API request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-maxLabelValueLen int
|
||||
The maximum length of label values in the accepted time series. Longer label values are truncated. In this case the vm_too_long_label_values_total metric at /metrics page is incremented (default 16384)
|
||||
-maxLabelsPerTimeseries int
|
||||
The maximum number of labels accepted per time series. Superfluous labels are dropped. In this case the vm_metrics_with_dropped_labels_total metric at /metrics page is incremented (default 30)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-opentsdbHTTPListenAddr string
|
||||
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbListenAddr string
|
||||
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-opentsdbhttp.maxInsertRequestSize size
|
||||
The maximum size of OpenTSDB HTTP put request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-opentsdbhttpTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-relabelConfig string
|
||||
Optional path to a file with relabeling rules, which are applied to all the ingested metrics. The path can point either to local file or to http url. See https://docs.victoriametrics.com/#relabeling for details. The config is reloaded on SIGHUP signal
|
||||
-relabelDebug
|
||||
Whether to log metrics before and after relabeling with -relabelConfig. If the -relabelDebug is enabled, then the metrics aren't sent to storage. This is useful for debugging the relabeling configs
|
||||
-replicationFactor int
|
||||
Replication factor for the ingested data, i.e. how many copies to make among distinct -storageNode instances. Note that vmselect must run with -dedup.minScrapeInterval=1ms for data de-duplication when replicationFactor is greater than 1. Higher values for -dedup.minScrapeInterval at vmselect is OK (default 1)
|
||||
-rpc.disableCompression
|
||||
Whether to disable compression of RPC traffic. This reduces CPU usage at the cost of higher network bandwidth usage
|
||||
-sortLabels
|
||||
Whether to sort labels for incoming samples before writing them to storage. This may be needed for reducing memory usage at storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}. Enabled sorting for labels can slow down ingestion performance a bit
|
||||
-storageNode array
|
||||
Comma-separated addresses of vmstorage nodes; usage: -storageNode=vmstorage-host1,...,vmstorage-hostN
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
### List of command-line flags for vmselect
|
||||
|
||||
Below is the output for `/path/to/vmselect -help`:
|
||||
|
||||
```
|
||||
-cacheDataPath string
|
||||
Path to directory for cache files. Cache isn't saved if empty
|
||||
-dedup.minScrapeInterval duration
|
||||
Leave only the first sample in every time series per each discrete interval equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication for details
|
||||
-downsampling.period array
|
||||
Comma-separated downsampling periods in the format 'offset:period'. For example, '30d:10m' instructs to leave a single sample per 10 minutes for samples older than 30 days. See https://docs.victoriametrics.com/#downsampling for details
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-eula
|
||||
By specifying this flag, you confirm that you have an enterprise license and accept the EULA https://victoriametrics.com/legal/eula/
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-graphiteTrimTimestamp duration
|
||||
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpListenAddr string
|
||||
Address to listen for http connections (default ":8481")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-replicationFactor int
|
||||
How many copies of every time series is available on vmstorage nodes. See -replicationFactor command-line flag for vminsert nodes (default 1)
|
||||
-search.cacheTimestampOffset duration
|
||||
The maximum duration since the current time for response data, which is always queried from the original raw data, without using the response cache. Increase this value if you see gaps in responses due to time synchronization issues between VictoriaMetrics and data sources (default 5m0s)
|
||||
-search.denyPartialResponse
|
||||
Whether to deny partial responses if a part of -storageNode instances fail to perform queries; this trades availability over consistency; see also -search.maxQueryDuration
|
||||
-search.disableCache
|
||||
Whether to disable response caching. This may be useful during data backfilling
|
||||
-search.graphiteMaxPointsPerSeries int
|
||||
The maximum number of points per series Graphite render API can return (default 1000000)
|
||||
-search.graphiteStorageStep duration
|
||||
The interval between datapoints stored in the database. It is used at Graphite Render API handler for normalizing the interval between datapoints in case it isn't normalized. It can be overriden by sending 'storage_step' query arg to /render API or by sending the desired interval via 'Storage-Step' http header during querying /render API (default 10s)
|
||||
-search.latencyOffset duration
|
||||
The time when data points become visible in query results after the collection. Too small value can result in incomplete last points for query results (default 30s)
|
||||
-search.logSlowQueryDuration duration
|
||||
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
|
||||
-search.maxConcurrentRequests int
|
||||
The maximum number of concurrent search requests. It shouldn't be high, since a single request can saturate all the CPU cores. See also -search.maxQueueDuration (default 8)
|
||||
-search.maxExportDuration duration
|
||||
The maximum duration for /api/v1/export call (default 720h0m0s)
|
||||
-search.maxLookback duration
|
||||
Synonym to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via max_lookback arg. See also '-search.maxStalenessInterval' flag, which has the same meaining due to historical reasons
|
||||
-search.maxPointsPerTimeseries int
|
||||
The maximum points per a single timeseries returned from /api/v1/query_range. This option doesn't limit the number of scanned raw samples in the database. The main purpose of this option is to limit the number of per-series points returned to graphing UI such as Grafana. There is no sense in setting this limit to values bigger than the horizontal resolution of the graph (default 30000)
|
||||
-search.maxQueryDuration duration
|
||||
The maximum duration for query execution (default 30s)
|
||||
-search.maxQueryLen size
|
||||
The maximum search query length in bytes
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16384)
|
||||
-search.maxQueueDuration duration
|
||||
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached; see also -search.maxQueryDuration (default 10s)
|
||||
-search.maxSamplesPerQuery int
|
||||
The maximum number of raw samples a single query can process across all time series. This protects from heavy queries, which select unexpectedly high number of raw samples. See also -search.maxSamplesPerSeries (default 1000000000)
|
||||
-search.maxSamplesPerSeries int
|
||||
The maximum number of raw samples a single query can scan per each time series. See also -search.maxSamplesPerQuery (default 30000000)
|
||||
-search.maxStalenessInterval duration
|
||||
The maximum interval for staleness calculations. By default it is automatically calculated from the median interval between samples. This flag could be useful for tuning Prometheus data model closer to Influx-style data model. See https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness for details. See also '-search.maxLookback' flag, which has the same meaning due to historical reasons
|
||||
-search.maxStatusRequestDuration duration
|
||||
The maximum duration for /api/v1/status/* requests (default 5m0s)
|
||||
-search.maxStepForPointsAdjustment duration
|
||||
The maximum step when /api/v1/query_range handler adjusts points with timestamps closer than -search.latencyOffset to the current time. The adjustment is needed because such points may contain incomplete data (default 1m0s)
|
||||
-search.minStalenessInterval duration
|
||||
The minimum interval for staleness calculations. This flag could be useful for removing gaps on graphs generated from time series with irregular intervals between samples. See also '-search.maxStalenessInterval'
|
||||
-search.noStaleMarkers
|
||||
Set this flag to true if the database doesn't contain Prometheus stale markers, so there is no need in spending additional CPU time on its handling. Staleness markers may exist only in data obtained from Prometheus scrape targets
|
||||
-search.queryStats.lastQueriesCount int
|
||||
Query stats for /api/v1/status/top_queries is tracked on this number of last queries. Zero value disables query stats tracking (default 20000)
|
||||
-search.queryStats.minQueryDuration duration
|
||||
The minimum duration for queries to track in query stats at /api/v1/status/top_queries. Queries with lower duration are ignored in query stats (default 1ms)
|
||||
-search.resetCacheAuthKey string
|
||||
Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call
|
||||
-search.treatDotsAsIsInRegexps
|
||||
Whether to treat dots as is in regexp label filters used in queries. For example, foo{bar=~"a.b.c"} will be automatically converted to foo{bar=~"a\\.b\\.c"}, i.e. all the dots in regexp filters will be automatically escaped in order to match only dot char instead of matching any char. Dots in ".+", ".*" and ".{n}" regexps aren't escaped. This option is DEPRECATED in favor of {__graphite__="a.*.c"} syntax for selecting metrics matching the given Graphite metrics filter
|
||||
-selectNode array
|
||||
Comma-serparated addresses of vmselect nodes; usage: -selectNode=vmselect-host1,...,vmselect-hostN
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-storageNode array
|
||||
Comma-separated addresses of vmstorage nodes; usage: -storageNode=vmstorage-host1,...,vmstorage-hostN
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
### List of command-line flags for vmstorage
|
||||
|
||||
Below is the output for `/path/to/vmstorage -help`:
|
||||
|
||||
```
|
||||
-bigMergeConcurrency int
|
||||
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
|
||||
-dedup.minScrapeInterval duration
|
||||
Leave only the first sample in every time series per each discrete interval equal to -dedup.minScrapeInterval > 0. See https://docs.victoriametrics.com/#deduplication for details
|
||||
-denyQueriesOutsideRetention
|
||||
Whether to deny queries outside of the configured -retentionPeriod. When set, then /api/v1/query_range would return '503 Service Unavailable' error for queries with 'from' value outside -retentionPeriod. This may be useful when multiple data sources with distinct retentions are hidden behind query-tee
|
||||
-downsampling.period array
|
||||
Comma-separated downsampling periods in the format 'offset:period'. For example, '30d:10m' instructs to leave a single sample per 10 minutes for samples older than 30 days. See https://docs.victoriametrics.com/#downsampling for details
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-eula
|
||||
By specifying this flag, you confirm that you have an enterprise license and accept the EULA https://victoriametrics.com/legal/eula/
|
||||
-finalMergeDelay duration
|
||||
The delay before starting final merge for per-month partition after no new data is ingested into it. Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. Zero value disables final merge
|
||||
-forceFlushAuthKey string
|
||||
authKey, which must be passed in query string to /internal/force_flush pages
|
||||
-forceMergeAuthKey string
|
||||
authKey, which must be passed in query string to /internal/force_merge pages
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpListenAddr string
|
||||
Address to listen for http connections (default ":8482")
|
||||
-logNewSeries
|
||||
Whether to log new series. This option is for debug purposes only. It can lead to performance issues when big number of new series are ingested into VictoriaMetrics
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-precisionBits int
|
||||
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
|
||||
-retentionPeriod value
|
||||
Data with timestamps outside the retentionPeriod is automatically deleted
|
||||
The following optional suffixes are supported: h (hour), d (day), w (week), y (year). If suffix isn't set, then the duration is counted in months (default 1)
|
||||
-rpc.disableCompression
|
||||
Disable compression of RPC traffic. This reduces CPU usage at the cost of higher network bandwidth usage
|
||||
-search.maxTagKeys int
|
||||
The maximum number of tag keys returned per search (default 100000)
|
||||
-search.maxTagValueSuffixesPerSearch int
|
||||
The maximum number of tag value suffixes returned from /metrics/find (default 100000)
|
||||
-search.maxTagValues int
|
||||
The maximum number of tag values returned per search (default 100000)
|
||||
-search.maxUniqueTimeseries int
|
||||
The maximum number of unique time series a single query can process. This allows protecting against heavy queries, which select unexpectedly high number of series. See also -search.maxSamplesPerQuery and -search.maxSamplesPerSeries (default 300000)
|
||||
-smallMergeConcurrency int
|
||||
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
|
||||
-snapshotAuthKey string
|
||||
authKey, which must be passed in query string to /snapshot* pages
|
||||
-storage.maxDailySeries int
|
||||
The maximum number of unique series can be added to the storage during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See also -storage.maxHourlySeries
|
||||
-storage.maxHourlySeries int
|
||||
The maximum number of unique series can be added to the storage during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See also -storage.maxDailySeries
|
||||
-storage.minFreeDiskSpaceBytes size
|
||||
The minimum free disk space at -storageDataPath after which the storage stops accepting new data
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 10000000)
|
||||
-storageDataPath string
|
||||
Path to storage data (default "vmstorage-data")
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
-vminsertAddr string
|
||||
TCP address to accept connections from vminsert services (default ":8400")
|
||||
-vmselectAddr string
|
||||
TCP address to accept connections from vmselect services (default ":8401")
|
||||
```
|
||||
|
||||
|
||||
## VictoriaMetrics Logo
|
||||
|
||||
[Zip](VM_logo.zip) contains three folders with different image orientation (main color and inverted version).
|
||||
|
||||
Files included in each folder:
|
||||
|
||||
* 2 JPEG Preview files
|
||||
* 2 PNG Preview files with transparent background
|
||||
* 2 EPS Adobe Illustrator EPS10 files
|
||||
|
||||
|
||||
### Logo Usage Guidelines
|
||||
|
||||
#### Font used:
|
||||
|
||||
* Lato Black
|
||||
* Lato Regular
|
||||
|
||||
#### Color Palette:
|
||||
|
||||
* HEX [#110f0f](https://www.color-hex.com/color/110f0f)
|
||||
* HEX [#ffffff](https://www.color-hex.com/color/ffffff)
|
||||
|
||||
### We kindly ask:
|
||||
|
||||
- Please don't use any other font instead of suggested.
|
||||
- There should be sufficient clear space around the logo.
|
||||
- Do not change spacing, alignment, or relative locations of the design elements.
|
||||
- Do not change the proportions of any of the design elements or the design itself. You may resize as needed but must retain all proportions.
|
||||
362
FAQ.md
Normal file
@@ -0,0 +1,362 @@
|
||||
---
|
||||
sort: 14
|
||||
---
|
||||
|
||||
# FAQ
|
||||
|
||||
## What is the main purpose of VictoriaMetrics?
|
||||
|
||||
To provide the best monitoring solution.
|
||||
|
||||
|
||||
## Who uses VictoriaMetrics?
|
||||
|
||||
See [case studies](https://docs.victoriametrics.com/CaseStudies.html).
|
||||
|
||||
|
||||
## Which features does VictoriaMetrics have?
|
||||
|
||||
See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#prominent-features).
|
||||
|
||||
|
||||
## Are there performance comparisons with other solutions?
|
||||
|
||||
Yes. See [these benchmarks](https://docs.victoriametrics.com/Articles.html#benchmarks).
|
||||
|
||||
|
||||
## How to start using VictoriaMetrics?
|
||||
|
||||
See [these docs](https://docs.victoriametrics.com/Quick-Start.html).
|
||||
|
||||
|
||||
## Does VictoriaMetrics support replication?
|
||||
|
||||
Yes. See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety) for details.
|
||||
|
||||
|
||||
## Can I use VictoriaMetrics instead of Prometheus?
|
||||
|
||||
Yes in most cases. VictoriaMetrics can substitute Prometheus in the following aspects:
|
||||
|
||||
* Prometheus-compatible service discovery and target scraping can be done with [vmagent](https://docs.victoriametrics.com/vmagent.html) and with single-node VictoriaMetrics. See [these docs](https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter).
|
||||
* Prometheus-compatible alerting rules and recording rules can be processed with [vmalert](https://docs.victoriametrics.com/vmalert.html).
|
||||
* Prometheus-compatible querying in Grafana is supported by VictoriaMetrics. See [these docs](https://docs.victoriametrics.com/#grafana-setup).
|
||||
|
||||
|
||||
## What is the difference between vmagent and Prometheus?
|
||||
|
||||
While both [vmagent](https://docs.victoriametrics.com/vmagent.html) and Prometheus may scrape Prometheus targets (aka `/metrics` pages)
|
||||
according to the provided Prometheus-compatible [scrape configs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config)
|
||||
and send data to multiple remote storage systems, vmagent has the following additional features:
|
||||
|
||||
- vmagent usually requires lower amounts of CPU, RAM and disk IO compared to Prometheus when scraping an enormous number of targets (more than 1000)
|
||||
or targets with a great number of exposed metrics.
|
||||
- vmagent provides independent disk-backed buffers for each configured remote storage (see `-remoteWrite.url`). This means that slow or temporarily unavailable storage
|
||||
doesn't prevent it from sending data to healthy storage in parallel. Prometheus uses a single shared buffer for all the configured remote storage systems (see `remote_write->url`)
|
||||
with a hardcoded retention of 2 hours.
|
||||
- vmagent may accept, relabel and filter data obtained via multiple data ingestion protocols in addition to data scraped from Prometheus targets.
|
||||
That means it supports both `pull` and `push` protocols for data ingestion.
|
||||
See [these docs](https://docs.victoriametrics.com/vmagent.html#features) for details.
|
||||
- vmagent may be used in different use cases:
|
||||
- [IoT and edge monitoring](https://docs.victoriametrics.com/vmagent.html#iot-and-edge-monitoring)
|
||||
- [Drop-in replacement for Prometheus](https://docs.victoriametrics.com/vmagent.html#drop-in-replacement-for-prometheus)
|
||||
- [Replication and High Availability](https://docs.victoriametrics.com/vmagent.html#replication-and-high-availability)
|
||||
- [Relabeling and Filtering](https://docs.victoriametrics.com/vmagent.html#relabeling-and-filtering)
|
||||
- [Splitting data streams among multiple systems](https://docs.victoriametrics.com/vmagent.html#splitting-data-streams-among-multiple-systems)
|
||||
- [Prometheus remote_write proxy](https://docs.victoriametrics.com/vmagent.html#prometheus-remote_write-proxy)
|
||||
|
||||
|
||||
## What is the difference between vmagent and Prometheus agent?
|
||||
|
||||
Both [vmagent](https://docs.victoriametrics.com/vmagent.html) and [Prometheus agent](https://prometheus.io/blog/2021/11/16/agent/) serve the same purpose – to efficently scrape Prometheus-compatible targets at the edge. They have the following differences:
|
||||
|
||||
- vmagent usually requires lower amounts of CPU, RAM and disk IO compared to the Prometheus agent.
|
||||
- Prometheus agent supports only pull-based data collection (e.g. it can scrape Prometheus-compatible targets), while vmagent supports both pull and push data collection – it can accept data via many popular data ingestion protocols such as InfluxDB line protocol, Graphite protocol, OpenTSDB protocol, DataDog protocol, Prometheus protocol, CSV and JSON – see [these docs](https://docs.victoriametrics.com/vmagent.html#features).
|
||||
- vmagent can easily scale horizontally to multiple instances for scraping a big number of targets – see [these docs](https://docs.victoriametrics.com/vmagent.html#scraping-big-number-of-targets).
|
||||
- vmagent supports [improved relabeling](https://docs.victoriametrics.com/vmagent.html#relabeling).
|
||||
- vmagent can limit the number of scraped metrics per target – see [these docs](https://docs.victoriametrics.com/vmagent.html#cardinality-limiter).
|
||||
- vmagent supports loading scrape configs from multiple files – see [these docs](https://docs.victoriametrics.com/vmagent.html#loading-scrape-configs-from-multiple-files).
|
||||
- vmagent supports data reading and data writing from/to Kafka – see [these docs](https://docs.victoriametrics.com/vmagent.html#kafka-integration).
|
||||
- vmagent can read and update scrape configs from http and https URLs, while the Prometheus agent can read them only from the local file system.
|
||||
|
||||
|
||||
## Is it safe to enable [remote write](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage) in Prometheus?
|
||||
|
||||
Yes. Prometheus continues writing data to local storage after enabling remote write, so all the existing local storage data
|
||||
and new data is available for querying via Prometheus as usual.
|
||||
|
||||
It is recommended using [vmagent](https://docs.victoriametrics.com/vmagent.html) for scraping Prometheus targets
|
||||
and writing data to VictoriaMetrics.
|
||||
|
||||
|
||||
## How does VictoriaMetrics compare to other remote storage solutions for Prometheus such as [M3 from Uber](https://eng.uber.com/m3/), [Thanos](https://github.com/thanos-io/thanos), [Cortex](https://github.com/cortexproject/cortex), etc.?
|
||||
|
||||
VictoriaMetrics is simpler, faster, more cost-effective and it provides [MetricsQL query language](MetricsQL) based on PromQL. The simplicity is twofold:
|
||||
- It is simpler to configure and operate. There is no need for configuring [sidecars](https://github.com/thanos-io/thanos/blob/master/docs/components/sidecar.md),
|
||||
fighting the [gossip protocol](https://github.com/improbable-eng/thanos/blob/030bc345c12c446962225221795f4973848caab5/docs/proposals/completed/201809_gossip-removal.md)
|
||||
or setting up third-party systems such as [Consul](https://github.com/cortexproject/cortex/issues/157), [Cassandra](https://cortexmetrics.io/docs/production/cassandra/),
|
||||
[DynamoDB](https://cortexmetrics.io/docs/production/aws/) or [Memcached](https://cortexmetrics.io/docs/production/caching/).
|
||||
- VictoriaMetrics has a simpler architecture. This means fewer bugs and more useful features in the long run compared to competing TSDBs.
|
||||
|
||||
See [comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683)
|
||||
and the [Remote Write Storage Wars](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) talk from [PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/).
|
||||
|
||||
VictoriaMetrics also [uses less RAM than Thanos components](https://github.com/thanos-io/thanos/issues/448).
|
||||
|
||||
|
||||
## What is the difference between VictoriaMetrics and [QuestDB](https://questdb.io/)?
|
||||
|
||||
- QuestDB needs more than 20x storage space than VictoriaMetrics. This translates to higher storage costs and slower queries over historical data, which must be read from the disk.
|
||||
- QuestDB is much harder to set up and operate than VictoriaMetrics. Compare [setup instructions for QuestDB](https://questdb.io/docs/get-started/binaries) to [setup instructions for VictoriaMetrics](https://docs.victoriametrics.com/#how-to-start-victoriametrics).
|
||||
- VictoriaMetrics provides the [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) query language, which is better suited for typical queries over time series data than the SQL-like query language provided by QuestDB. See [this article](https://valyala.medium.com/promql-tutorial-for-beginners-9ab455142085) for details.
|
||||
- VictoriaMetrics can be queried via the [Prometheus querying API](https://docs.victoriametrics.com/#prometheus-querying-api-usage) and via [Graphite's API](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
- Thanks to PromQL support, VictoriaMetrics [can be used as a drop-in replacement for Prometheus in Grafana](https://docs.victoriametrics.com/#grafana-setup), while QuestDB needs a full rewrite of existing dashboards in Grafana.
|
||||
- Thanks to Prometheus' remote_write API support, VictoriaMetrics can be used as a long-term storage for Prometheus or for [vmagent](https://docs.victoriametrics.com/vmagent.html), while QuestDB has no integration with Prometheus.
|
||||
- QuestDB [supports a smaller range of popular data ingestion protocols](https://questdb.io/docs/develop/insert-data) compared to VictoriaMetrics (compare to [the list of supported data ingestion protocols for VictoriaMetrics](https://docs.victoriametrics.com/#how-to-import-time-series-data)).
|
||||
- [VictoriaMetrics supports backfilling (e.g. storing historical data) out of the box](https://docs.victoriametrics.com/#backfilling), while QuestDB provides [very limited support for backfilling](https://questdb.io/blog/2021/05/10/questdb-release-6-0-tsbs-benchmark#the-problem-with-out-of-order-data).
|
||||
|
||||
|
||||
## What is the difference between VictoriaMetrics and [Cortex](https://github.com/cortexproject/cortex)?
|
||||
|
||||
VictoriaMetrics is similar to Cortex in the following aspects:
|
||||
|
||||
- Both systems accept data from [vmagent](https://docs.victoriametrics.com/vmagent.html) or Prometheus
|
||||
via the standard [remote_write API](https://prometheus.io/docs/practices/remote_write/), so there is no need for running sidecars
|
||||
unlike in [Thanos](https://github.com/thanos-io/thanos)' case.
|
||||
- Both systems support multi-tenancy out of the box. See [the corresponding docs for VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy).
|
||||
- Both systems support data replication. See [replication in Cortex](https://github.com/cortexproject/cortex/blob/fe56f1420099aa1bf1ce09316c186e05bddee879/docs/architecture.md#hashing) and [replication in VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety).
|
||||
- Both systems scale horizontally to multiple nodes. See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-resizing-and-scalability) for details.
|
||||
- Both systems support alerting and recording rules via the corresponding tools such as [vmalert](https://docs.victoriametrics.com/vmalert.html).
|
||||
- Both systems can be queried via the [Prometheus querying API](https://prometheus.io/docs/prometheus/latest/querying/api/) and integrate perfectly with Grafana.
|
||||
|
||||
The main differences between Cortex and VictoriaMetrics:
|
||||
|
||||
- Cortex re-uses Prometheus source code, while VictoriaMetrics is written from scratch.
|
||||
- Cortex heavily relies on third-party services such as Consul, Memcache, DynamoDB, BigTable, Cassandra, etc.
|
||||
This may increase operational complexity and reduce system reliability compared to VictoriaMetrics' case,
|
||||
which doesn't use any external services. Compare [Cortex' Architecture](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md)
|
||||
to [VictoriaMetrics' architecture](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview).
|
||||
- VictoriaMetrics provides [production-ready single-node solution](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html),
|
||||
which is much easier to set up and operate than a Cortex cluster.
|
||||
- Cortex may lose up to 12 hours of recent data on Ingestor failure – see [the corresponding docs](https://github.com/cortexproject/cortex/blob/fe56f1420099aa1bf1ce09316c186e05bddee879/docs/architecture.md#ingesters-failure-and-data-loss).
|
||||
VictoriaMetrics may lose only a few seconds of recent data, which isn't synced to persistent storage yet.
|
||||
See [this article for details](https://medium.com/@valyala/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704).
|
||||
- Cortex is usually slower and requires more CPU and RAM than VictoriaMetrics. See [this talk from adidas at PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/) and [other case studies](https://docs.victoriametrics.com/CaseStudies.html).
|
||||
- VictoriaMetrics accepts data in multiple popular data ingestion protocols additionally to Prometheus remote_write protocol – InfluxDB, OpenTSDB, Graphite, CSV, JSON, native binary.
|
||||
See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-time-series-data) for details.
|
||||
- VictoriaMetrics provides the [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) query language, while Cortex provides the [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) query language.
|
||||
- VictoriaMetrics can be queried via [Graphite's API](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
|
||||
|
||||
## What is the difference between VictoriaMetrics and [Thanos](https://github.com/thanos-io/thanos)?
|
||||
|
||||
- Thanos re-uses Prometheus source code, while VictoriaMetrics is written from scratch.
|
||||
- VictoriaMetrics accepts data via the [standard remote_write API for Prometheus](https://prometheus.io/docs/practices/remote_write/),
|
||||
while Thanos uses a non-standard [sidecar](https://github.com/thanos-io/thanos/blob/master/docs/components/sidecar.md) which must run alongside each Prometheus instance.
|
||||
- The Thanos sidecar requires disabling data compaction in Prometheus, which may hurt Prometheus performance and increase RAM usage. See [these docs](https://thanos.io/tip/components/sidecar.md/) for more details.
|
||||
- Thanos stores data in object storage (Amazon S3 or Google GCS), while VictoriaMetrics stores data in block storage
|
||||
([GCP persistent disks](https://cloud.google.com/compute/docs/disks#pdspecs), Amazon EBS or bare metal HDD).
|
||||
While object storage is usually less expensive, block storage provides much lower latencies and higher throughput.
|
||||
VictoriaMetrics works perfectly with HDD-based block storage – there is no need for using more expensive SSD or NVMe disks in most cases.
|
||||
- Thanos may lose up to 2 hours of recent data, which wasn't uploaded yet to object storage. VictoriaMetrics may lose only a few seconds of recent data,
|
||||
which hasn't been synced to persistent storage yet. See [this article for details](https://medium.com/@valyala/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704).
|
||||
- VictoriaMetrics provides a [production-ready single-node solution](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html),
|
||||
which is much easier to set up and operate than Thanos components.
|
||||
- Thanos may be harder to set up and operate compared to VictoriaMetrics, since it has more moving parts, which can be connected with fewer reliable networks.
|
||||
See [this article for details](https://medium.com/faun/comparing-thanos-to-victoriametrics-cluster-b193bea1683).
|
||||
- Thanos is usually slower and requires more CPU and RAM than VictoriaMetrics. See [this talk from adidas at PromCon 2019](https://promcon.io/2019-munich/talks/remote-write-storage-wars/).
|
||||
- VictoriaMetrics accepts data via multiple popular data ingestion protocols in addition to the Prometheus remote_write protocol – InfluxDB, OpenTSDB, Graphite, CSV, JSON, native binary.
|
||||
See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-time-series-data) for details.
|
||||
- VictoriaMetrics provides the [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) query language, while Thanos provides the [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) query language.
|
||||
- VictoriaMetrics can be queried via [Graphite's API](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
|
||||
|
||||
## How does VictoriaMetrics compare to [InfluxDB](https://www.influxdata.com/time-series-platform/influxdb/)?
|
||||
|
||||
- VictoriaMetrics requires [10x less RAM](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) and it [works faster](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).
|
||||
- VictoriaMetrics needs lower amounts of storage space than InfluxDB for production data.
|
||||
- VictoriaMetrics provides a better query language – [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) – than InfluxQL or Flux. See [this tutorial](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085) for details.
|
||||
- VictoriaMetrics accepts data in multiple popular data ingestion protocols in addition to InfluxDB – Prometheus remote_write, OpenTSDB, Graphite, CSV, JSON, native binary.
|
||||
See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-time-series-data) for details.
|
||||
- VictoriaMetrics can be queried via [Graphite's API](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
|
||||
|
||||
## How does VictoriaMetrics compare to [TimescaleDB](https://www.timescale.com/)?
|
||||
|
||||
- TimescaleDB insists on using SQL as a query language. While SQL is more powerful than PromQL, this power is rarely required during typical usages of a TSDB. Real-world queries usually [look clearer and simpler when written in PromQL than in SQL](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085).
|
||||
- VictoriaMetrics requires [up to 70x less storage space compared to TimescaleDB](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4) for storing the same amount of time series data. The gap in storage space usage can be lowered from 70x to 3x if [compression in TimescaleDB is properly configured](https://docs.timescale.com/latest/using-timescaledb/compression) (it isn't an easy task in general :)).
|
||||
- VictoriaMetrics requires up to 10x less CPU and RAM resources than TimescaleDB for processing production data. See [this article](https://abiosgaming.com/press/high-cardinality-aggregations/) for details.
|
||||
- TimescaleDB is [harder to set up, configure and operate](https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/self-hosted/ubuntu/installation-apt-ubuntu/) than VictoriaMetrics (see [how to run VictoriaMetrics](https://docs.victoriametrics.com/#how-to-start-victoriametrics)).
|
||||
- VictoriaMetrics accepts data in multiple popular data ingestion protocols – InfluxDB, OpenTSDB, Graphite, CSV – while TimescaleDB supports only SQL inserts.
|
||||
- VictoriaMetrics can be queried via [Graphite's API](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
|
||||
|
||||
## Does VictoriaMetrics use Prometheus technologies like other clustered TSDBs built on top of Prometheus such as [Thanos](https://github.com/thanos-io/thanos) or [Cortex](https://github.com/cortexproject/cortex)?
|
||||
|
||||
No. VictoriaMetrics core is written in Go from scratch by [fasthttp](https://github.com/valyala/fasthttp)'s [author](https://github.com/valyala).
|
||||
The architecture is [optimized for storing and querying large amounts of time series data with high cardinality](https://medium.com/devopslinks/victoriametrics-creating-the-best-remote-storage-for-prometheus-5d92d66787ac). VictoriaMetrics storage uses [certain ideas from ClickHouse](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282). Special thanks to [Alexey Milovidov](https://github.com/alexey-milovidov).
|
||||
|
||||
|
||||
## What is the pricing for VictoriaMetrics?
|
||||
|
||||
The following versions are open source and free:
|
||||
* [Single-node version](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html).
|
||||
* [Cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
|
||||
|
||||
We provide commercial support for both versions. [Contact us](mailto:info@victoriametrics.com) for the pricing.
|
||||
|
||||
The following commercial versions of VictoriaMetrics are available:
|
||||
* [Managed VictoriaMetrics at AWS](https://aws.amazon.com/marketplace/pp/prodview-4tbfq5icmbmyc) (aka managed Prometheus).
|
||||
|
||||
The following commercial versions of VictoriaMetrics are planned:
|
||||
* Managed VictoriaMetrics at Google Cloud.
|
||||
* Cloud monitoring solution based on VictoriaMetrics.
|
||||
|
||||
[Contact us](mailto:info@victoriametrics.com) for more information on our plans.
|
||||
|
||||
|
||||
## Why doesn't VictoriaMetrics support the [Prometheus remote read API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cremote_read%3E)?
|
||||
|
||||
The remote read API requires transferring all the raw data for all the requested metrics over the given time range. For instance,
|
||||
if a query covers 1000 metrics with 10K values each, then the remote read API has to return `1000*10K`=10M metric values to Prometheus.
|
||||
This is slow and expensive.
|
||||
Prometheus' remote read API isn't intended for querying foreign data – aka `global query view`. See [this issue](https://github.com/prometheus/prometheus/issues/4456) for details.
|
||||
|
||||
So just query VictoriaMetrics directly via [vmui](https://docs.victoriametrics.com/#vmui), the [Prometheus Querying API](https://docs.victoriametrics.com/#prometheus-querying-api-usage)
|
||||
or via [Prometheus datasource in Grafana](https://docs.victoriametrics.com/#grafana-setup).
|
||||
|
||||
|
||||
## Does VictoriaMetrics deduplicate data from Prometheus instances scraping the same targets (aka `HA pairs`)?
|
||||
|
||||
Yes. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication) for details.
|
||||
|
||||
|
||||
## Where is the source code of VictoriaMetrics?
|
||||
|
||||
Source code for the following versions is available in the following places:
|
||||
* [Single-node version](https://github.com/VictoriaMetrics/VictoriaMetrics)
|
||||
* [Cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster)
|
||||
|
||||
|
||||
## Is VictoriaMetrics a good fit for data from IoT sensors and industrial sensors?
|
||||
|
||||
VictoriaMetrics is able to handle data from hundreds of millions of IoT sensors and industrial sensors.
|
||||
It supports [high cardinality data](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b),
|
||||
perfectly [scales up on a single node](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae)
|
||||
and scales horizontally to multiple nodes.
|
||||
|
||||
|
||||
## Where can I ask questions about VictoriaMetrics?
|
||||
|
||||
Questions about VictoriaMetrics can be asked via the following channels:
|
||||
|
||||
- [Slack channel](https://slack.victoriametrics.com/)
|
||||
- [Telegram channel](https://t.me/VictoriaMetrics_en)
|
||||
- [Google group](https://groups.google.com/forum/#!forum/victorametrics-users)
|
||||
|
||||
|
||||
## Where can I file bugs and feature requests regarding VictoriaMetrics?
|
||||
|
||||
File bugs and feature requests [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
|
||||
|
||||
|
||||
## Where can I find information about multi-tenancy?
|
||||
|
||||
See [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy). Multitenancy is supported only by the [cluster version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) of VictoriaMetrics.
|
||||
|
||||
|
||||
## How to set a memory limit for VictoriaMetrics components?
|
||||
|
||||
All the VictoriaMetrics components provide command-line flags to control the size of internal buffers and caches: `-memory.allowedPercent` and `-memory.allowedBytes` (pass `-help` to any VictoriaMetrics component in order to see the description for these flags). These limits don't take into account additional memory, which may be needed for processing incoming queries. Hard limits may be enforced only by the OS via [cgroups](https://en.wikipedia.org/wiki/Cgroups), Docker (see [these docs](https://docs.docker.com/config/containers/resource_constraints)) or Kubernetes (see [these docs](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers)).
|
||||
|
||||
Memory usage for VictoriaMetrics components can be tuned according to the following docs:
|
||||
|
||||
* [Capacity planning for single-node VictoriaMetrics](https://docs.victoriametrics.com/#capacity-planning)
|
||||
* [Capacity planning for cluster VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#capacity-planning)
|
||||
* [Troubleshooting for vmagent](https://docs.victoriametrics.com/vmagent.html#troubleshooting)
|
||||
* [Troubleshooting for single-node VictoriaMetrics](https://docs.victoriametrics.com/#troubleshooting)
|
||||
|
||||
|
||||
## How can I run VictoriaMetrics on FreeBSD?
|
||||
|
||||
VictoriaMetrics is included in FreeBSD ports, so just install it from there. See [this link](https://www.freebsd.org/cgi/ports.cgi?query=victoria&stype=all).
|
||||
|
||||
|
||||
## Does VictoriaMetrics support the Graphite query language?
|
||||
|
||||
Yes. See [these docs](https://docs.victoriametrics.com/#graphite-api-usage).
|
||||
|
||||
|
||||
## What is an active time series?
|
||||
|
||||
A time series is uniquely identified by its name plus a set of its labels. For example, `temperature{city="NY",country="US"}` and `temperature{city="SF",country="US"}` are two distinct series, since they differ by the `city` label. A time series is considered active if it receives at least a single new sample during the last hour.
|
||||
|
||||
|
||||
## What is high churn rate?
|
||||
|
||||
If old time series are constantly substituted by new time series at a high rate, then such a state is called `high churn rate`. High churn rate has the following negative consequences:
|
||||
|
||||
* Increased total number of time series stored in the database.
|
||||
* Increased size of inverted index, which is stored at `<-storageDataPath>/indexdb`, since the inverted index contains entries for every label of every time series with at least a single ingested sample.
|
||||
* Slow-down of queries over multiple days.
|
||||
|
||||
The main reason for high churn rate is a metric label with frequently changed value. Examples of such labels:
|
||||
|
||||
* `queryid`, which changes with each query at `postgres_exporter`.
|
||||
* `app_name` or `deployment_id`, which changes with each new deployment in Kubernetes.
|
||||
* A label derived from the current time such as `timestamp`, `minute` or `hour`.
|
||||
* A `hash` or `uuid` label, which changes frequently.
|
||||
|
||||
The solution against high churn rate is to identify and eliminate labels with frequently changed values. The [/api/v1/status/tsdb](https://docs.victoriametrics.com/#tsdb-stats) page can help determining these labels.
|
||||
|
||||
|
||||
## What is high cardinality?
|
||||
|
||||
High cardinality usually means a high number of [active time series](#what-is-active-time-series). High cardinality may lead to high memory usage and/or to a high percentage of [slow inserts](#what-is-slow-insert). The source of high cardinality is usually a label with a large number of unique values, which presents a big share of the ingested time series. The solution is to identify and remove the source of high cardinality with the help of [/api/v1/status/tsdb](https://docs.victoriametrics.com/#tsdb-stats).
|
||||
|
||||
|
||||
## What is a slow insert?
|
||||
|
||||
VictoriaMetrics maintains in-memory cache for mapping of [active time series](#what-is-active-time-series) into internal series ids. The cache size depends on the available memory for VictoriaMetrics in the host system. If the information about all the active time series doesn't fit the cache, then VictoriaMetrics needs to read and unpack the information from disk on every incoming sample for time series missing in the cache. This operation is much slower than the cache lookup, so such an insert is named a `slow insert`. A high percentage of slow inserts on the [official dashboard for VictoriaMetrics](https://docs.victoriametrics.com/#monitoring) indicates a memory shortage for the current number of [active time series](#what-is-active-time-series). Such a condition usually leads to a significant slowdown for data ingestion and to significantly increased disk IO and CPU usage. The solution is to add more memory or to reduce the number of [active time series](#what-is-active-time-series). The `/api/v1/status/tsdb` page can be helpful for locating the source of high number of active time seriess – see [these docs](https://docs.victoriametrics.com/#tsdb-stats).
|
||||
|
||||
|
||||
## How to optimize MetricsQL query?
|
||||
|
||||
See [this article](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986).
|
||||
|
||||
|
||||
## Why isn't MetricsQL 100% compatible with PromQL?
|
||||
|
||||
[MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) provides better user experience than PromQL. It fixes a few annoying issues in PromQL. This prevents MetricsQL to be 100% compatible with PromQL. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details.
|
||||
|
||||
|
||||
## How to migrate data from Prometheus to VictoriaMetrics?
|
||||
|
||||
Please see [these docs](https://docs.victoriametrics.com/vmctl.html#migrating-data-from-prometheus).
|
||||
|
||||
|
||||
## How to migrate data from InfluxDB to VictoriaMetrics?
|
||||
|
||||
Please see [these docs](https://docs.victoriametrics.com/vmctl.html#migrating-data-from-influxdb-1x).
|
||||
|
||||
|
||||
## How to migrate data from OpenTSDB to VictoriaMetrics?
|
||||
|
||||
Please see [these docs](https://docs.victoriametrics.com/vmctl.html#migrating-data-from-opentsdb).
|
||||
|
||||
|
||||
## How to migrate data from Graphite to VictoriaMetrics?
|
||||
|
||||
Please use the [whisper-to-graphite](https://github.com/bzed/whisper-to-graphite) tool for reading data from Graphite and pushing them to VictoriaMetrics via [Graphite's import API](https://docs.victoriametrics.com/#how-to-send-data-from-graphite-compatible-agents-such-as-statsd).
|
||||
|
||||
|
||||
## Why do the same metrics have differences in VictoriaMetrics' and Prometheus' dashboards?
|
||||
|
||||
There could be a slight difference in stored values for time series. Due to different compression algorithms, VM may reduce the precision for float values with more than 12 significant decimal digits. Please see [this article](https://valyala.medium.com/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87).
|
||||
|
||||
The query engine may behave differently for some functions. Please see [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e).
|
||||
|
||||
|
||||
## If downsampling and deduplication are enabled how will this work?
|
||||
|
||||
[Deduplication](https://docs.victoriametrics.com/#deduplication) is a special case of zero-offset [downsampling](https://docs.victoriametrics.com/#downsampling). So, if both downsampling and deduplication are enabled, then deduplication is replaced by zero-offset downsampling
|
||||
3
Gemfile
Normal file
@@ -0,0 +1,3 @@
|
||||
source "https://rubygems.org"
|
||||
gem "jekyll-rtd-theme", "~> 2.0.6"
|
||||
gem "github-pages", group: :jekyll_plugins
|
||||
960
MetricsQL.md
Normal file
@@ -0,0 +1,960 @@
|
||||
---
|
||||
sort: 13
|
||||
---
|
||||
|
||||
# MetricsQL
|
||||
|
||||
[VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) implements MetricsQL - query language inspired by [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/).
|
||||
MetricsQL is backwards-compatible with PromQL, so Grafana dashboards backed by Prometheus datasource should work the same after switching from Prometheus to VictoriaMetrics.
|
||||
However, there are some [intentional differences](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) between these two languages.
|
||||
|
||||
[Standalone MetricsQL package](https://godoc.org/github.com/VictoriaMetrics/metricsql) can be used for parsing MetricsQL in external apps.
|
||||
|
||||
If you are unfamiliar with PromQL, then it is suggested reading [this tutorial for beginners](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085).
|
||||
|
||||
The following functionality is implemented differently in MetricsQL compared to PromQL. This improves user experience:
|
||||
* MetricsQL takes into account the previous point before the window in square brackets for range functions such as [rate](#rate) and [increase](#increase). This allows returning the exact results users expect for `increase(metric[$__interval])` queries instead of incomplete results Prometheus returns for such queries.
|
||||
* MetricsQL doesn't extrapolate range function results. This addresses [this issue from Prometheus](https://github.com/prometheus/prometheus/issues/3746). See technical details about VictoriaMetrics and Prometheus calculations for [rate](#rate) and [increase](#increase) [in this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1215#issuecomment-850305711).
|
||||
* MetricsQL returns the expected non-empty responses for [rate](#rate) with `step` values smaller than scrape interval. This addresses [this issue from Grafana](https://github.com/grafana/grafana/issues/11451). See also [this blog post](https://www.percona.com/blog/2020/02/28/better-prometheus-rate-function-with-victoriametrics/).
|
||||
* MetricsQL treats `scalar` type the same as `instant vector` without labels, since subtle differences between these types usually confuse users. See [the corresponding Prometheus docs](https://prometheus.io/docs/prometheus/latest/querying/basics/#expression-language-data-types) for details.
|
||||
* MetricsQL removes all the `NaN` values from the output, so some queries like `(-1)^0.5` return empty results in VictoriaMetrics, while returning a series of `NaN` values in Prometheus. Note that Grafana doesn't draw any lines or dots for `NaN` values, so the end result looks the same for both VictoriaMetrics and Prometheus.
|
||||
* MetricsQL keeps metric names after applying functions, which don't change the meaning of the original time series. For example, [min_over_time(foo)](#min_over_time) or [round(foo)](#round) leaves `foo` metric name in the result. See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/674) for details.
|
||||
|
||||
Read more about the diffferences between PromQL and MetricsQL in [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e).
|
||||
|
||||
Other PromQL functionality should work the same in MetricsQL. [File an issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you notice discrepancies between PromQL and MetricsQL results other than mentioned above.
|
||||
|
||||
## MetricsQL features
|
||||
|
||||
MetricsQL implements [PromQL](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085) and provides additional functionality mentioned below, which is aimed towards solving practical cases. Feel free [filing a feature request](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you think MetricsQL misses certain useful functionality.
|
||||
|
||||
This functionality can be evaluated at [an editable Grafana dashboard](https://play-grafana.victoriametrics.com/d/4ome8yJmz/node-exporter-on-victoriametrics-demo) or at your own [VictoriaMetrics instance](https://docs.victoriametrics.com/#how-to-start-victoriametrics).
|
||||
|
||||
- Graphite-compatible filters can be passed via `{__graphite__="foo.*.bar"}` syntax. See [these docs](https://docs.victoriametrics.com/#selecting-graphite-metrics). VictoriaMetrics also can be used as Graphite datasource in Grafana. See [these docs](https://docs.victoriametrics.com/#graphite-api-usage) for details. See also [label_graphite_group](#label_graphite_group) function, which can be used for extracting the given groups from Graphite metric name.
|
||||
- Lookbehind window in square brackets may be omitted. VictoriaMetrics automatically selects the lookbehind window depending on the current step used for building the graph (e.g. `step` query arg passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries)). For instance, the following query is valid in VictoriaMetrics: `rate(node_network_receive_bytes_total)`. It is equivalent to `rate(node_network_receive_bytes_total[$__interval])` when used in Grafana.
|
||||
- [Aggregate functions](#aggregate-functions) accept arbitrary number of args. For example, `avg(q1, q2, q3)` would return the average values for every point across time series returned by `q1`, `q2` and `q3`.
|
||||
- [@ modifier](https://prometheus.io/docs/prometheus/latest/querying/basics/#modifier) can be put anywhere in the query. For example, `sum(foo) @ end()` calculates `sum(foo)` at the `end` timestamp of the selected time range `[start ... end]`.
|
||||
- Arbitrary subexpression can be used as [@ modifier](https://prometheus.io/docs/prometheus/latest/querying/basics/#modifier). For example, `foo @ (end() - 1h)` calculates `foo` at the `end - 1 hour` timestamp on the selected time range `[start ... end]`.
|
||||
- [offset](https://prometheus.io/docs/prometheus/latest/querying/basics/#offset-modifier), lookbehind window in square brackets and `step` value for [subquery](#subqueries) may refer to the current step aka `$__interval` value from Grafana with `[Ni]` syntax. For instance, `rate(metric[10i] offset 5i)` would return per-second rate over a range covering 10 previous steps with the offset of 5 steps.
|
||||
- [offset](https://prometheus.io/docs/prometheus/latest/querying/basics/#offset-modifier) may be put anywere in the query. For instance, `sum(foo) offset 24h`.
|
||||
- Lookbehind window in square brackets and [offset](https://prometheus.io/docs/prometheus/latest/querying/basics/#offset-modifier) may be fractional. For instance, `rate(node_network_receive_bytes_total[1.5m] offset 0.5d)`.
|
||||
- The duration suffix is optional. The duration is in seconds if the suffix is missing. For example, `rate(m[300] offset 1800)` is equivalent to `rate(m[5m]) offset 30m`.
|
||||
- The duration can be placed anywhere in the query. For example, `sum_over_time(m[1h]) / 1h` is equivalent to `sum_over_time(m[1h]) / 3600`.
|
||||
- Trailing commas on all the lists are allowed - label filters, function args and with expressions. For instance, the following queries are valid: `m{foo="bar",}`, `f(a, b,)`, `WITH (x=y,) x`. This simplifies maintenance of multi-line queries.
|
||||
- Metric names and metric labels may contain escaped chars. For instance, `foo\-bar{baz\=aa="b"}` is valid expression. It returns time series with name `foo-bar` containing label `baz=aa` with value `b`. Additionally, `\xXX` escape sequence is supported, where `XX` is hexadecimal representation of escaped char.
|
||||
- Aggregate functions support optional `limit N` suffix in order to limit the number of output series. For example, `sum(x) by (y) limit 3` limits the number of output time series after the aggregation to 3. All the other time series are dropped.
|
||||
- [histogram_quantile](#histogram_quantile) accepts optional third arg - `boundsLabel`. In this case it returns `lower` and `upper` bounds for the estimated percentile. See [this issue for details](https://github.com/prometheus/prometheus/issues/5706).
|
||||
- `default` binary operator. `q1 default q2` fills gaps in `q1` with the corresponding values from `q2`.
|
||||
- `if` binary operator. `q1 if q2` removes values from `q1` for missing values from `q2`.
|
||||
- `ifnot` binary operator. `q1 ifnot q2` removes values from `q1` for existing values from `q2`.
|
||||
- String literals may be concatenated. This is useful with `WITH` templates: `WITH (commonPrefix="long_metric_prefix_") {__name__=commonPrefix+"suffix1"} / {__name__=commonPrefix+"suffix2"}`.
|
||||
- `WITH` templates. This feature simplifies writing and managing complex queries. Go to [WITH templates playground](https://play.victoriametrics.com/promql/expand-with-exprs) and try it.
|
||||
- `keep_metric_names` modifier can be applied to all the [rollup functions](#rollup-functions) and [transform functions](#transform-functions). This modifier prevents from dropping metric names in function results. For example, `rate({__name__=~"foo|bar"}[5m]) keep_metric_names` leaves `foo` and `bar` metric names in the resulting time series.
|
||||
|
||||
|
||||
## MetricsQL functions
|
||||
|
||||
If you are unfamiliar with PromQL, then please read [this tutorial](https://medium.com/@valyala/promql-tutorial-for-beginners-9ab455142085) at first.
|
||||
|
||||
MetricsQL provides the following functions:
|
||||
|
||||
* [Rollup functions](#rollup-functions)
|
||||
* [Transform functions](#transform-functions)
|
||||
* [Label manipulation functions](#label-manipulation-functions)
|
||||
* [Aggregate functions](#aggregate-functions)
|
||||
|
||||
|
||||
### Rollup functions
|
||||
|
||||
**Rollup functions** (aka range functions or window functions) calculate rollups over **raw samples** on the given lookbehind window for the [selected time series](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). For example, `avg_over_time(temperature[24h])` calculates the average temperature over raw samples for the last 24 hours. Additional details:
|
||||
* If rollup functions are used for building graphs in Grafana, then the rollup is calculated independently per each point on the graph. For example, every point for `avg_over_time(temperature[24h])` graph shows the average temperature for the last 24 hours ending at this point. The interval between points is set as `step` query arg passed by Grafana to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
|
||||
* If the given [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) returns multiple time series, then rollups are calculated individually per each returned series.
|
||||
* If lookbehind window in square brackets is missing, then MetricsQL automatically sets the lookbehind window to the interval between points on the graph (aka `step` query arg at [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries), `$__interval` value from Grafana or `1i` duration in MetricsQL). For example, `rate(http_requests_total)` is equivalent to `rate(http_requests_total[$__interval])` in Grafana. It is also equivalent to `rate(http_requests_total[1i])`.
|
||||
* Every [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) in MetricsQL must be wrapped into a rollup function. Otherwise it is automatically wrapped into [default_rollup](#default_rollup). For example, `foo{bar="baz"}` is automatically converted to `default_rollup(foo{bar="baz"}[1i])` before performing the calculations.
|
||||
* If something other than [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) is passed to rollup function, then the inner arg is automatically converted to a [subquery](#subqueries).
|
||||
* All the rollup functions accept optional `keep_metric_names` modifier. If it is set, then the function keeps metric names in results. For example, `rate({__name__=~"foo|bar}[5m]) keep_metric_names` leaves `foo` and `bar` metric names in results.
|
||||
|
||||
See also [implicit query conversions](#implicit-query-conversions).
|
||||
|
||||
|
||||
#### absent_over_time
|
||||
|
||||
`absent_over_time(series_selector[d])` returns 1 if the given lookbehind window `d` doesn't contain raw samples. Otherwise it returns an empty result. This function is supported by PromQL. See also [present_over_time](#present_over_time).
|
||||
|
||||
#### aggr_over_time
|
||||
|
||||
`aggr_over_time(("rollup_func1", "rollup_func2", ...), series_selector[d])` calculates all the listed `rollup_func*` for raw samples on the given lookbehind window `d`. The calculations are perfomed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). `rollup_func*` can contain any rollup function. For instance, `aggr_over_time(("min_over_time", "max_over_time", "rate"), m[d])` would calculate [min_over_time](#min_over_time), [max_over_time](#max_over_time) and [rate](#rate) for `m[d]`.
|
||||
|
||||
#### ascent_over_time
|
||||
|
||||
`ascent_over_time(series_selector[d])` calculates ascent of raw sample values on the given lookbehind window `d`. The calculations are performed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Useful for tracking height gains in GPS tracking. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [descent_over_time](#descent_over_time).
|
||||
|
||||
#### avg_over_time
|
||||
|
||||
`avg_over_time(series_selector[d])` calculates the average value over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). This function is supported by PromQL. See also [median_over_time](#median_over_time).
|
||||
|
||||
#### changes
|
||||
|
||||
`changes(series_selector[d])` calculates the number of times the raw samples changed on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Unlike `changes()` in Prometheus it takes into account the change from the last sample before the given lookbehind window `d`. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [changes_prometheus](#changes_prometheus).
|
||||
|
||||
#### changes_prometheus
|
||||
|
||||
`changes_prometheus(series_selector[d])` calculates the number of times the raw samples changed on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It doesn't take into account the change from the last sample before the given lookbehind window `d` in the same way as Prometheus does. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [changes](#changes).
|
||||
|
||||
#### count_eq_over_time
|
||||
|
||||
`count_eq_over_time(series_selector[d], eq)` calculates the number of raw samples on the given lookbehind window `d`, which are equal to `eq`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [count_over_time](#count_over_time).
|
||||
|
||||
#### count_gt_over_time
|
||||
|
||||
`count_gt_over_time(series_selector[d], gt)` calculates the number of raw samples on the given lookbehind window `d`, which are bigger than `gt`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [count_over_time](#count_over_time).
|
||||
|
||||
#### count_le_over_time
|
||||
|
||||
`count_le_over_time(series_selector[d], le)` calculates the number of raw samples on the given lookbehind window `d`, which don't exceed `le`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [count_over_time](#count_over_time).
|
||||
|
||||
#### count_ne_over_time
|
||||
|
||||
`count_ne_over_time(series_selector[d], ne)` calculates the number of raw samples on the given lookbehind window `d`, which aren't equal to `ne`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [count_over_time](#count_over_time).
|
||||
|
||||
#### count_over_time
|
||||
|
||||
`count_over_time(series_selector[d])` calculates the number of raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [count_le_over_time](#count_le_over_time), [count_gt_over_time](#count_gt_over_time), [count_eq_over_time](#count_eq_over_time) and [count_ne_over_time](#count_ne_over_time).
|
||||
|
||||
#### decreases_over_time
|
||||
|
||||
`decreases_over_time(series_selector[d])` calculates the number of raw sample value decreases over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [increases_over_time](#increases_over_time).
|
||||
|
||||
#### default_rollup
|
||||
|
||||
`default_rollup(series_selector[d])` returns the last raw sample value on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors).
|
||||
|
||||
#### delta
|
||||
|
||||
`delta(series_selector[d])` calculates the difference between the last sample before the given lookbehind window `d` and the last sample at the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). The behaviour of `delta()` function in MetricsQL is slighly different to the behaviour of `delta()` function in Prometheus. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [increase](#increase) and [delta_prometheus](#delta_prometheus).
|
||||
|
||||
#### delta_prometheus
|
||||
|
||||
`delta_prometheus(series_selector[d])` calculates the difference between the first and the last samples at the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). The behaviour of `delta_prometheus()` is close to the behaviour of `delta()` function in Prometheus. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [delta](#delta).
|
||||
|
||||
#### deriv
|
||||
|
||||
`deriv(series_selector[d])` calculates per-second derivative over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). The derivative is calculated using linear regression. Metric names are stripped from the resulting rollups. This function is supported by PromQL. Add `keep_metric_names` modifier in order to keep metric names. See also [deriv_fast](#deriv_fast) and [ideriv](#ideriv).
|
||||
|
||||
#### deriv_fast
|
||||
|
||||
`deriv_fast(series_selector[d])` calculates per-second derivative using the first and the last raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [deriv](#deriv) and [ideriv](#ideriv).
|
||||
|
||||
#### descent_over_time
|
||||
|
||||
`descent_over_time(series_selector[d])` calculates descent of raw sample values on the given lookbehind window `d`. The calculations are performed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Useful for tracking height loss in GPS tracking. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [ascent_over_time](#ascent_over_time).
|
||||
|
||||
#### distinct_over_time
|
||||
|
||||
`distinct_over_time(series_selector[d])` returns the number of distinct raw sample values on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### duration_over_time
|
||||
|
||||
`duration_over_time(series_selector[d], max_interval)` returns the duration in seconds when time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) were present over the given lookbehind window `d`. It is expected that intervals between adjacent samples per each series don't exceed the `max_interval`. Otherwise such intervals are considered as gaps and aren't counted. See also [lifetime](#lifetime) and [lag](#lag).
|
||||
|
||||
#### first_over_time
|
||||
|
||||
`first_over_time(series_selector[d])` returns the first raw sample value on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). See also [last_over_time](#last_over_time) and [tfirst_over_time](#tfirst_over_time).
|
||||
|
||||
#### geomean_over_time
|
||||
|
||||
`geomean_over_time(series_selector[d])` calculates [geometric mean](https://en.wikipedia.org/wiki/Geometric_mean) over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### histogram_over_time
|
||||
|
||||
`histogram_over_time(series_selector[d])` calculates [VictoriaMetrics histogram](https://godoc.org/github.com/VictoriaMetrics/metrics#Histogram) over raw samples on the given lookbehind window `d`. It is calculated individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). The resulting histograms are useful to pass to [histogram_quantile](#histogram_quantile) for calculating quantiles over multiple gauges. For example, the following query calculates median temperature by country over the last 24 hours: `histogram_quantile(0.5, sum(histogram_over_time(temperature[24h])) by (vmrange,country))`.
|
||||
|
||||
#### hoeffding_bound_lower
|
||||
|
||||
`hoeffding_bound_lower(phi, series_selector[d])` calculates lower [Hoeffding bound](https://en.wikipedia.org/wiki/Hoeffding%27s_inequality) for the given `phi` in the range `[0...1]`. See also [hoeffding_bound_upper](#hoeffding_bound_upper).
|
||||
|
||||
#### hoeffding_bound_upper
|
||||
|
||||
`hoeffding_bound_upper(phi, series_selector[d])` calculates upper [Hoeffding bound](https://en.wikipedia.org/wiki/Hoeffding%27s_inequality) for the given `phi` in the range `[0...1]`. See also [hoeffding_bound_lower](#hoeffding_bound_lower).
|
||||
|
||||
#### holt_winters
|
||||
|
||||
`holt_winters(series_selector[d], sf, tf)` calculates Holt-Winters value (aka [double exponential smoothing](https://en.wikipedia.org/wiki/Exponential_smoothing#Double_exponential_smoothing)) for raw samples over the given lookbehind window `d` using the given smoothing factor `sf` and the given trend factor `tf`. Both `sf` and `tf` must be in the range `[0...1]`. It is expected that the [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) returns time series of [gauge type](https://prometheus.io/docs/concepts/metric_types/#gauge). This function is supported by PromQL.
|
||||
|
||||
#### idelta
|
||||
|
||||
`idelta(series_selector[d])` calculates the difference between the last two raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### ideriv
|
||||
|
||||
`ideriv(series_selector[d])` calculates the per-second derivative based on the last two raw samples over the given lookbehind window `d`. The derivative is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [deriv](#deriv).
|
||||
|
||||
#### increase
|
||||
|
||||
`increase(series_selector[d])` calculates the increase over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It is expected that the `series_selector` returns time series of [counter type](https://prometheus.io/docs/concepts/metric_types/#counter). Unlike Prometheus it takes into account the last sample before the given lookbehind window `d` when calculating the result. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [increase_pure](#increase_pure), [increase_prometheus](#increase_prometheus) and [delta](#delta).
|
||||
|
||||
#### increase_prometheus
|
||||
|
||||
`increase_prometheus(series_selector[d])` calculates the increase over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It is expected that the `series_selector` returns time series of [counter type](https://prometheus.io/docs/concepts/metric_types/#counter). It doesn't take into account the last sample before the given lookbehind window `d` when calculating the result in the same way as Prometheus does. See [this article](https://medium.com/@romanhavronenko/victoriametrics-promql-compliance-d4318203f51e) for details. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [increase_pure](#increase_pure) and [increase](#increase).
|
||||
|
||||
#### increase_pure
|
||||
|
||||
`increase_pure(series_selector[d])` works the same as [increase](#increase) except of the following corner case - it assumes that [counters](https://prometheus.io/docs/concepts/metric_types/#counter) always start from 0, while [increase](#increase) ignores the first value in a series if it is too big.
|
||||
|
||||
#### increases_over_time
|
||||
|
||||
`increases_over_time(series_selector[d])` calculates the number of raw sample value increases over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [decreases_over_time](#decreases_over_time).
|
||||
|
||||
#### integrate
|
||||
|
||||
`integrate(series_selector[d])` calculates the integral over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### irate
|
||||
|
||||
`irate(series_selector[d])` calculates the "instant" per-second increase rate over the last two raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It is expected that the `series_selector` returns time series of [counter type](https://prometheus.io/docs/concepts/metric_types/#counter). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [rate](#rate).
|
||||
|
||||
#### lag
|
||||
|
||||
`lag(series_selector[d])` returns the duration in seconds between the last sample on the given lookbehind window `d` and the timestamp of the current point. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [lifetime](#lifetime) and [duration_over_time](#duration_over_time).
|
||||
|
||||
#### last_over_time
|
||||
|
||||
`last_over_time(series_selector[d])` returns the last raw sample value on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). This function is supported by PromQL. See also [first_over_time](#first_over_time) and [tlast_over_time](#tlast_over_time).
|
||||
|
||||
#### lifetime
|
||||
|
||||
`lifetime(series_selector[d])` returns the duration in seconds between the last and the first sample on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [duration_over_time](#duration_over_time) and [lag](#lag).
|
||||
|
||||
#### max_over_time
|
||||
|
||||
`max_over_time(series_selector[d])` calculates the maximum value over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). This function is supported by PromQL. See also [tmax_over_time](#tmax_over_time).
|
||||
|
||||
#### median_over_time
|
||||
|
||||
`median_over_time(series_selector[d])` calculates median value over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). See also [avg_over_time](#avg_over_time).
|
||||
|
||||
#### min_over_time
|
||||
|
||||
`min_over_time(series_selector[d])` calculates the minimum value over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). This function is supported by PromQL. See also [tmin_over_time](#tmin_over_time).
|
||||
|
||||
#### mode_over_time
|
||||
|
||||
`mode_over_time(series_selector[d])` calculates [mode](https://en.wikipedia.org/wiki/Mode_(statistics)) for raw samples on the given lookbehind window `d`. It is calculated individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It is expected that raw sample values are discrete.
|
||||
|
||||
#### predict_linear
|
||||
|
||||
`predict_linear(series_selector[d], t)` calculates the value `t` seconds in the future using linear interpolation over raw samples on the given lookbehind window `d`. The predicted value is calculated individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). This function is supported by PromQL.
|
||||
|
||||
#### present_over_time
|
||||
|
||||
`present_over_time(series_selector[d])` returns 1 if there is at least a single raw sample on the given lookbehind window `d`. Otherwise an empty result is returned. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### quantile_over_time
|
||||
|
||||
`quantile_over_time(phi, series_selector[d])` calculates `phi`-quantile over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). The `phi` value must be in the range `[0...1]`. This function is supported by PromQL. See also [quantiles_over_time](#quantiles_over_time).
|
||||
|
||||
#### quantiles_over_time
|
||||
|
||||
`quantiles_over_time("phiLabel", phi1, ..., phiN, series_selector[d])` calculates `phi*`-quantiles over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). The function returns individual series per each `phi*` with `{phiLabel="phi*"}` label. `phi*` values must be in the range `[0...1]`. See also [quantile_over_time](#quantile_over_time).
|
||||
|
||||
#### range_over_time
|
||||
|
||||
`range_over_time(series_selector[d])` calculates value range over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). E.g. it calculates `max_over_time(series_selector[d]) - min_over_time(series_selector[d])`. Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### rate
|
||||
|
||||
`rate(series_selector[d])` calculates the average per-second increase rate over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It is expected that the `series_selector` returns time series of [counter type](https://prometheus.io/docs/concepts/metric_types/#counter). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### rate_over_sum
|
||||
|
||||
`rate_over_sum(series_selector[d])` calculates per-second rate over the sum of raw samples on the given lookbehind window `d`. The calculations are performed indiviually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### resets
|
||||
|
||||
`resets(series_selector[d])` returns the number of [counter](https://prometheus.io/docs/concepts/metric_types/#counter) resets over the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). It is expected that the `series_selector` returns time series of [counter type](https://prometheus.io/docs/concepts/metric_types/#counter). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### rollup
|
||||
|
||||
`rollup(series_selector[d])` calculates `min`, `max` and `avg` values for raw samples on the given lookbehind window `d`. These values are calculated individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors).
|
||||
|
||||
#### rollup_candlestick
|
||||
|
||||
`rollup_candlestick(series_selector[d])` calculates `open`, `high`, `low` and `close` values (aka OHLC) over raw samples on the given lookbehind window `d`. The calculations are perfomed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). This function is useful for financial applications.
|
||||
|
||||
#### rollup_delta
|
||||
|
||||
`rollup_delta(series_selector[d])` calculates differences between adjancent raw samples on the given lookbehind window `d` and returns `min`, `max` and `avg` values for the calculated differences. The calculations are performed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [rollup_increase](#rollup_increase).
|
||||
|
||||
#### rollup_deriv
|
||||
|
||||
`rollup_deriv(series_selector[d])` calculates per-second derivatives for adjancent raw samples on the given lookbehind window `d` and returns `min`, `max` and `avg` values for the calculated per-second derivatives. The calculations are performed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### rollup_increase
|
||||
|
||||
`rollup_increase(series_selector[d])` calculates increases for adjancent raw samples on the given lookbehind window `d` and returns `min`, `max` and `avg` values for the calculated increases. The calculations are performed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [rollup_delta](#rollup_delta).
|
||||
|
||||
#### rollup_rate
|
||||
|
||||
`rollup_rate(series_selector[d])` calculates per-second change rates for adjancent raw samples on the given lookbehind window `d` and returns `min`, `max` and `avg` values for the calculated per-second change rates. The calculations are perfomed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### rollup_scrape_interval
|
||||
|
||||
`rollup_scrape_interval(series_selector[d])` calculates the interval in seconds between adjancent raw samples on the given lookbehind window `d` and returns `min`, `max` and `avg` values for the calculated interval. The calculations are perfomed individually per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [scrape_interval](#scrape_interval).
|
||||
|
||||
#### scrape_interval
|
||||
|
||||
`scrape_interval(series_selector[d])` calculates the average interval in seconds between raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [rollup_scrape_interval](#rollup_scrape_interval).
|
||||
|
||||
#### share_gt_over_time
|
||||
|
||||
`share_gt_over_time(series_selector[d], gt)` returns share (in the range `[0...1]`) of raw samples on the given lookbehind window `d`, which are bigger than `gt`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. Useful for calculating SLI and SLO. Example: `share_gt_over_time(up[24h], 0)` - returns service availability for the last 24 hours. See also [share_le_over_time](#share_le_over_time).
|
||||
|
||||
#### share_le_over_time
|
||||
|
||||
`share_le_over_time(series_selector[d], le)` returns share (in the range `[0...1]`) of raw samples on the given lookbehind window `d`, which are smaller or equal to `le`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. Useful for calculating SLI and SLO. Example: `share_le_over_time(memory_usage_bytes[24h], 100*1024*1024)` returns the share of time series values for the last 24 hours when memory usage was below or equal to 100MB. See also [share_gt_over_time](#share_gt_over_time).
|
||||
|
||||
#### stale_samples_over_time
|
||||
|
||||
`stale_samples_over_time(series_selector[d])` calculates the number of [staleness markers](https://docs.victoriametrics.com/vmagent.html#prometheus-staleness-markers) on the given lookbehind window `d` per each time series matching the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### stddev_over_time
|
||||
|
||||
`stddev_over_time(series_selector[d])` calculates standard deviation over raw samples on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [stdvar_over_time](#stdvar_over_time).
|
||||
|
||||
#### stdvar_over_time
|
||||
|
||||
`stdvar_over_time(series_selector[d])` calculates stadnard variance over raw samples on the given lookbheind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [stddev_over_time](#stddev_over_time).
|
||||
|
||||
#### sum_over_time
|
||||
|
||||
`sum_over_time(series_selector[d])` calculates the sum of raw sample values on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### sum2_over_time
|
||||
|
||||
`sum2_over_time(series_selector[d])` calculates the sum of squares for raw sample values on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### timestamp
|
||||
|
||||
`timestamp(series_selector[d])` returns the timestamp in seconds for the last raw sample on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [timestamp_with_name](#timestamp_with_name).
|
||||
|
||||
#### timestamp_with_name
|
||||
|
||||
`timestamp_with_name(series_selector[d])` returns the timestamp in seconds for the last raw sample on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are preserved in the resulting rollups. See also [timestamp](#timestamp).
|
||||
|
||||
#### tfirst_over_time
|
||||
|
||||
`tfirst_over_time(series_selector[d])` returns the timestamp in seconds for the first raw sample on the given lookbehind window `d` per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [first_over_time](#first_over_time).
|
||||
|
||||
#### tlast_over_time
|
||||
|
||||
`tlast_over_time(series_selector[d])` is an alias for [timestamp](#timestamp).
|
||||
|
||||
#### tmax_over_time
|
||||
|
||||
`tmax_over_time(series_selector[d])` returns the timestamp in seconds for the raw sample with the maximum value on the given lookbehind window `d`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [max_over_time](#max_over_time).
|
||||
|
||||
#### tmin_over_time
|
||||
|
||||
`tmin_over_time(series_selector[d])` returns the timestamp in seconds for the raw sample with the minimum value on the given lookbehind window `d`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names. See also [min_over_time](#min_over_time).
|
||||
|
||||
#### zscore_over_time
|
||||
|
||||
`zscore_over_time(series_selector[d])` calculates returns [z-score](https://en.wikipedia.org/wiki/Standard_score) for raw samples on the given lookbehind window `d`. It is calculated independently per each time series returned from the given [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors). Metric names are stripped from the resulting rollups. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
|
||||
### Transform functions
|
||||
|
||||
**Transform functions** calculate transformations over rollup results. For example, `abs(delta(temperature[24h]))` calculates the absolute value for every point of every time series returned from the rollup `delta(temperature[24h])`. Additional details:
|
||||
* If transform function is applied directly to a [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors), then the [default_rollup()](#default_rollup) function is automatically applied before calculating the transformations. For example, `abs(temperature)` is implicitly transformed to `abs(default_rollup(temperature[1i]))`.
|
||||
* All the transform functions accept optional `keep_metric_names` modifier. If it is set, then the function doesn't drop metric names from the resulting time series. For example, `ln({__name__=~"foo|bar"}) keep_metric_names` leaves `foo` and `bar` metric names in results.
|
||||
|
||||
See also [implicit query conversions](#implicit-query-conversions).
|
||||
|
||||
|
||||
#### abs
|
||||
|
||||
`abs(q)` calculates the absolute value for every point of every time series returned by `q`. This function is supported by PromQL.
|
||||
|
||||
#### absent
|
||||
|
||||
`absent(q)` returns 1 if `q` has no points. Otherwise returns an empty result. This function is supported by PromQL.
|
||||
|
||||
#### acos
|
||||
|
||||
`acos(q)` returns [inverse cosine](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [asin](#asin) and [cos](#cos).
|
||||
|
||||
#### acosh
|
||||
|
||||
`acosh(q)` returns [inverse hyperbolic cosine](https://en.wikipedia.org/wiki/Inverse_hyperbolic_functions#Inverse_hyperbolic_cosine) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [sinh](#cosh).
|
||||
|
||||
#### asin
|
||||
|
||||
`asin(q)` returns [inverse sine](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [acos](#acos) and [sin](#sin).
|
||||
|
||||
#### asinh
|
||||
|
||||
`asinh(q)` returns [inverse hyperbolic sine](https://en.wikipedia.org/wiki/Inverse_hyperbolic_functions#Inverse_hyperbolic_sine) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [sinh](#sinh).
|
||||
|
||||
#### atan
|
||||
|
||||
`atan(q)` returns [inverse tangent](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [tan](#tan).
|
||||
|
||||
#### atanh
|
||||
|
||||
`atanh(q)` returns [inverse hyperbolic tangent](https://en.wikipedia.org/wiki/Inverse_hyperbolic_functions#Inverse_hyperbolic_tangent) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [tanh](#tanh).
|
||||
|
||||
#### bitmap_and
|
||||
|
||||
`bitmap_and(q, mask)` - calculates bitwise `v & mask` for every `v` point of every time series returned from `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### bitmap_or
|
||||
|
||||
`bitmap_or(q, mask)` calculates bitwise `v | mask` for every `v` point of every time series returned from `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### bitmap_xor
|
||||
|
||||
`bitmap_xor(q, mask)` calculates bitwise `v ^ mask` for every `v` point of every time series returned from `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### buckets_limit
|
||||
|
||||
`buckets_limit(limit, buckets)` limits the number of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`. See also [prometheus_buckets](#prometheus_buckets) and [histogram_quantile](#histogram_quantile).
|
||||
|
||||
#### ceil
|
||||
|
||||
`ceil(q)` rounds every point for every time series returned by `q` to the upper nearest integer. This function is supported by PromQL. See also [floor](#floor) and [round](#round).
|
||||
|
||||
#### clamp
|
||||
|
||||
`clamp(q, min, max)` clamps every point for every time series returned by `q` with the given `min` and `max` values. This function is supported by PromQL. See also [clamp_min](#clamp_min) and [clamp_max](#clamp_max).
|
||||
|
||||
#### clamp_max
|
||||
|
||||
`clamp_max(q, max)` clamps every point for every time series returned by `q` with the given `max` value. This function is supported by PromQL. See also [clamp](#clamp) and [clamp_min](#clamp_min).
|
||||
|
||||
#### clamp_min
|
||||
|
||||
`clamp_min(q, min)` clamps every pount for every time series returned by `q` with the given `min` value. This function is supported by PromQL. See also [clamp](#clamp) and [clamp_max](#clamp_max).
|
||||
|
||||
#### cos
|
||||
|
||||
`cos(q)` returns `cos(v)` for every `v` point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [sin](#sin).
|
||||
|
||||
#### cosh
|
||||
|
||||
`cosh(q)` returns [hyperbolic cosine](https://en.wikipedia.org/wiki/Hyperbolic_functions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. This function is supported by PromQL. See also [acosh](#acosh).
|
||||
|
||||
#### day_of_month
|
||||
|
||||
`day_of_month(q)` returns the day of month for every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. The returned values are in the range `[1...31]`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### day_of_week
|
||||
|
||||
`day_of_week(q)` returns the day of week for every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. The returned values are in the range `[0...6]`, where `0` means Sunday and `6` means Saturday. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### days_in_month
|
||||
|
||||
`days_in_month(q)` returns the number of days in the month identified by every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. The returned values are in the range `[28...31]`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### deg
|
||||
|
||||
`deg(q)` converts [Radians to degrees](https://en.wikipedia.org/wiki/Radian#Conversions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [rad](#rad).
|
||||
|
||||
#### end
|
||||
|
||||
`end()` returns the unix timestamp in seconds for the last point. See also [start](#start). It is known as `end` query arg passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
|
||||
|
||||
#### exp
|
||||
|
||||
`exp(q)` calculates the `e^v` for every point `v` of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. See also [ln](#ln). This function is supported by PromQL.
|
||||
|
||||
#### floor
|
||||
|
||||
`floor(q)` rounds every point for every time series returned by `q` to the lower nearest integer. See also [ceil](#ceil) and [round](#round). This function is supported by PromQL.
|
||||
|
||||
#### histogram_avg
|
||||
|
||||
`histogram_avg(buckets)` calculates the average value for the given `buckets`. It can be used for calculating the average over the given time range across multiple time series. For exmple, `histogram_avg(sum(histogram_over_time(response_time_duration_seconds[5m])) by (vmrange,job))` would return the average response time per each `job` over the last 5 minutes.
|
||||
|
||||
#### histogram_quantile
|
||||
|
||||
`histogram_quantile(phi, buckets)` calculates `phi`-quantile over the given [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350). `phi` must be in the range `[0...1]`. For example, `histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket[5m]) by (le))` would return median request duration for all the requests during the last 5 minutes. It accepts optional third arg - `boundsLabel`. In this case it returns `lower` and `upper` bounds for the estimated percentile. See [this issue for details](https://github.com/prometheus/prometheus/issues/5706). This function is supported by PromQL (except of the `boundLabel` arg). See also [histogram_quantiles](#histogram_quantiles) and [histogram_share](#histogram_share).
|
||||
|
||||
#### histogram_quantiles
|
||||
|
||||
`histogram_quantiles("phiLabel", phi1, ..., phiN, buckets)` calculates the given `phi*`-quantiles over the given [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350). `phi*` must be in the range `[0...1]`. Each calculated quantile is returned in a separate time series with the corresponding `{phiLabel="phi*"}` label. See also [histogram_quantile](#histogram_quantile).
|
||||
|
||||
#### histogram_share
|
||||
|
||||
`histogram_share(le, buckets)` calculates the share (in the range `[0...1]`) for `buckets` that fall below `le`. Useful for calculating SLI and SLO. This is inverse to [histogram_quantile](#histogram_quantile).
|
||||
|
||||
#### histogram_stddev
|
||||
|
||||
`histogram_stddev(buckets)` calculates standard deviation for the given `buckets`.
|
||||
|
||||
#### histogram_stdvar
|
||||
|
||||
`histogram_stdvar(buckets)` calculates standard variance for the given `buckets`. It can be used for calculating standard deviation over the given time range across multiple time series. For example, `histogram_stdvar(sum(histogram_over_time(temperature[24])) by (vmrange,country))` would return standard deviation for the temperature per each country over the last 24 hours.
|
||||
|
||||
#### hour
|
||||
|
||||
`hour(q)` returns the hour for every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. The returned values are in the range `[0...23]`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### interpolate
|
||||
|
||||
`interpolate(q)` fills gaps with linearly interpolated values calculated from the last and the next non-empty points per each time series returned by `q`. See also [keep_last_value](#keep_last_value) and [keep_next_value](#keep_next_value).
|
||||
|
||||
#### keep_last_value
|
||||
|
||||
`keep_last_value(q)` fills gaps with the value of the last non-empty point in every time series returned by `q`. See also [keep_next_value](#keep_next_value) and [interpolate](#interpolate).
|
||||
|
||||
#### keep_next_value
|
||||
|
||||
`keep_next_value(q)` fills gaps with the value of the next non-empty point in every time series returned by `q`. See also [keep_last_value](#keep_last_value) and [interpolate](#interpolate).
|
||||
|
||||
#### limit_offset
|
||||
|
||||
`limit_offset(limit, offset, q)` skips `offset` time series from series returned by `q` and then returns up to `limit` of the remaining time series per each group. This allows implementing simple paging for `q` time series. See also [limitk](#limitk).
|
||||
|
||||
#### ln
|
||||
|
||||
`ln(q)` calculates `ln(v)` for every point `v` of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [exp](#exp) and [log2](#log2).
|
||||
|
||||
#### log2
|
||||
|
||||
`log2(q)` calculates `log2(v)` for every point `v` of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [log10](#log10) and [ln](#ln).
|
||||
|
||||
#### log10
|
||||
|
||||
`log10(q)` calculates `log10(v)` for every point `v` of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [log2](#log2) and [ln](#ln).
|
||||
|
||||
#### minute
|
||||
|
||||
`minute(q)` returns the minute for every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. The returned values are in the range `[0...59]`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### month
|
||||
|
||||
`month(q)` returns the month for every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. The returned values are in the range `[1...12]`, where `1` means January and `12` means December. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### now
|
||||
|
||||
`now()` returns the current timestamp as a floating-point value in seconds. See also [time](#time).
|
||||
|
||||
#### pi
|
||||
|
||||
`pi()` returns [Pi number](https://en.wikipedia.org/wiki/Pi). This function is supported by PromQL.
|
||||
|
||||
#### rad
|
||||
|
||||
`rad(q)` converts [degrees to Radians](https://en.wikipedia.org/wiki/Radian#Conversions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL. See also [deg](#deg).
|
||||
|
||||
|
||||
#### prometheus_buckets
|
||||
|
||||
`prometheus_buckets(buckets)` converts [VictoriaMetrics histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) with `vmrange` labels to Prometheus histogram buckets with `le` labels. This may be useful for building heatmaps in Grafana. See also [histogram_quantile](#histogram_quantile) and [buckets_limit](#buckets_limit).
|
||||
|
||||
#### rand
|
||||
|
||||
`rand(seed)` returns pseudo-random numbers on the range `[0...1]` with even distribution. Optional `seed` can be used as a seed for pseudo-random number generator. See also [rand_normal](#rand_normal) and [rand_exponential](#rand_exponential).
|
||||
|
||||
#### rand_exponential
|
||||
|
||||
`rand_exponential(seed)` returns pseudo-random numbers with [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). Optional `seed` can be used as a seed for pseudo-random number generator. See also [rand](#rand) and [rand_normal](#rand_normal).
|
||||
|
||||
#### rand_normal
|
||||
|
||||
`rand_normal(seed)` returns pesudo-random numbers with [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution). Optional `seed` can be used as a seed for pseudo-random number generator. See also [rand](#rand) and [rand_exponential](#rand_exponential).
|
||||
|
||||
#### range_avg
|
||||
|
||||
`range_avg(q)` calculates the avg value across points per each time series returned by `q`.
|
||||
|
||||
#### range_first
|
||||
|
||||
`range_first(q)` returns the value for the first point per each time series returned by `q`.
|
||||
|
||||
#### range_last
|
||||
|
||||
`range_last(q)` returns the value for the last point per each time series returned by `q`.
|
||||
|
||||
#### range_max
|
||||
|
||||
`range_max(q)` calculates the max value across points per each time series returned by `q`.
|
||||
|
||||
#### range_median
|
||||
|
||||
`range_median(q)` calculates the median value across points per each time series returned by `q`.
|
||||
|
||||
#### range_min
|
||||
|
||||
`range_min(q)` calculates the min value across points per each time series returned by `q`.
|
||||
|
||||
#### range_quantile
|
||||
|
||||
`range_quantile(phi, q)` returns `phi`-quantile across points per each time series returned by `q`. `phi` must be in the range `[0...1]`.
|
||||
|
||||
#### range_sum
|
||||
|
||||
`range_sum(q)` calculates the sum of points per each time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### remove_resets
|
||||
|
||||
`remove_resets(q)` removes counter resets from time series returned by `q`.
|
||||
|
||||
#### round
|
||||
|
||||
`round(q, nearest)` round every point of every time series returned by `q` to the `nearest` multiple. If `nearest` is missing then the rounding is performed to the nearest integer. This function is supported by PromQL. See also [floor](#floor) and [ceil](#ceil).
|
||||
|
||||
#### ru
|
||||
|
||||
`ru(free, max)` calculates resource utilization in the range `[0%...100%]` for the given `free` and `max` resources. For instance, `ru(node_memory_MemFree_bytes, node_memory_MemTotal_bytes)` returns memory utilization over [node_exporter](https://github.com/prometheus/node_exporter) metrics.
|
||||
|
||||
#### running_avg
|
||||
|
||||
`running_avg(q)` calculates the running avg per each time series returned by `q`.
|
||||
|
||||
#### running_max
|
||||
|
||||
`running_max(q)` calculates the running max per each time series returned by `q`.
|
||||
|
||||
#### running_min
|
||||
|
||||
`running_min(q)` calculates the running min per each time series returned by `q`.
|
||||
|
||||
#### running_sum
|
||||
|
||||
`running_sum(q)` calculates the running sum per each time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names.
|
||||
|
||||
#### scalar
|
||||
|
||||
`scalar(q)` returns `q` if `q` contains only a single time series. Otherwise it returns nothing. This function is supported by PromQL.
|
||||
|
||||
#### sgn
|
||||
|
||||
`sgn(q)` returns `1` if `v>0`, `-1` if `v<0` and `0` if `v==0` for every point `v` of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### sin
|
||||
|
||||
`sin(q)` returns `sin(v)` for every `v` point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by MetricsQL. See also [cos](#cos).
|
||||
|
||||
#### sinh
|
||||
|
||||
`sinh(q)` returns [hyperbolic sine](https://en.wikipedia.org/wiki/Hyperbolic_functions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by MetricsQL. See also [cosh](#cosh).
|
||||
|
||||
#### tan
|
||||
|
||||
`tan(q)` returns `tan(v)` for every `v` point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by MetricsQL. See also [atan](#atan).
|
||||
|
||||
#### tanh
|
||||
|
||||
`tanh(q)` returns [hyperbolic tangent](https://en.wikipedia.org/wiki/Hyperbolic_functions) for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by MetricsQL. See also [atanh](#atanh).
|
||||
|
||||
#### smooth_exponential
|
||||
|
||||
`smooth_exponential(q, sf)` smooths points per each time series returned by `q` using [exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average) with the given smooth factor `sf`.
|
||||
|
||||
#### sort
|
||||
|
||||
`sort(q)` sorts series in ascending order by the last point in every time series returned by `q`. This function is supported by PromQL. See also [sort_desc](#sort_desc).
|
||||
|
||||
#### sort_by_label
|
||||
|
||||
`sort_by_label(q, label1, ... labelN)` sorts series in ascending order by the given set of labels. For example, `sort_by_label(foo, "bar")` would sort `foo` series by values of the label `bar` in these series. See also [sort_by_label_desc](#sort_by_label_desc).
|
||||
|
||||
#### sort_by_label_desc
|
||||
|
||||
`sort_by_label_desc(q, label1, ... labelN)` sorts series in descending order by the given set of labels. For example, `sort_by_label(foo, "bar")` would sort `foo` series by values of the label `bar` in these series. See also [sort_by_label](#sort_by_label).
|
||||
|
||||
#### sort_desc
|
||||
|
||||
`sort_desc(q)` sorts series in descending order by the last point in every time series returned by `q`. This function is supported by PromQL. See also [sort](#sort).
|
||||
|
||||
#### sqrt
|
||||
|
||||
`sqrt(q)` calculates square root for every point of every time series returned by `q`. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
#### start
|
||||
|
||||
`start()` returns unix timestamp in seconds for the first point. See also [end](#end). It is known as `start` query arg passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
|
||||
|
||||
#### step
|
||||
|
||||
`step()` returns the step in seconds (aka interval) between the returned points. It is known as `step` query arg passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
|
||||
|
||||
#### time
|
||||
|
||||
`time()` returns unix timestamp for every returned point. This function is supported by PromQL.
|
||||
|
||||
#### timezone_offset
|
||||
|
||||
`timezone_offset(tz)` returns offset in seconds for the given timezone `tz` relative to UTC. This can be useful when combining with datetime-related functions. For example, `day_of_week(time()+timezone_offset("America/Los_Angeles"))` would return weekdays for `America/Los_Angeles` time zone. Special `Local` time zone can be used for returning an offset for the time zone set on the host where VictoriaMetrics runs. See [the list of supported timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
|
||||
|
||||
#### ttf
|
||||
|
||||
`ttf(free)` estimates the time in seconds needed to exhaust `free` resources. For instance, `ttf(node_filesystem_avail_byte)` returns the time to storage space exhaustion. This function may be useful for capacity planning.
|
||||
|
||||
#### union
|
||||
|
||||
`union(q1, ..., qN)` returns a union of time series returned from `q1`, ..., `qN`. The `union` function name can be skipped - the following queries are quivalent: `union(q1, q2)` and `(q1, q2)`. It is expected that each `q*` query returns time series with unique sets of labels. Otherwise only the first time series out of series with identical set of labels is returned. Use [alias](#alias) and [label_set](#label_set) functions for giving unique labelsets per each `q*` query:
|
||||
|
||||
#### vector
|
||||
|
||||
`vector(q)` returns `q`, e.g. it does nothing in MetricsQL. This function is supported by PromQL.
|
||||
|
||||
#### year
|
||||
|
||||
`year(q)` returns the year for every point of every time series returned by `q`. It is expected that `q` returns unix timestamps. Metric names are stripped from the resulting series. Add `keep_metric_names` modifier in order to keep metric names. This function is supported by PromQL.
|
||||
|
||||
|
||||
### Label manipulation functions
|
||||
|
||||
**Label manipulation functions** perform manipulations with lables on the selected rollup results. Additional details:
|
||||
* If label manipulation function is applied directly to a [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors), then the [default_rollup()](#default_rollup) function is automatically applied before performing the label transformation. For example, `alias(temperature, "foo")` is implicitly transformed to `alias(default_rollup(temperature[1i]), "foo")`.
|
||||
|
||||
See also [implicit query conversions](#implicit-query-conversions).
|
||||
|
||||
|
||||
#### alias
|
||||
|
||||
`alias(q, "name")` sets the given `name` to all the time series returned by `q`. For example, `alias(up, "foobar")` would rename `up` series to `foobar` series.
|
||||
|
||||
#### label_copy
|
||||
|
||||
`label_copy(q, "src_label1", "dst_label1", ..., "src_labelN", "dst_labelN")` copies label values from `src_label*` to `dst_label*` for all the time series returned by `q`. If `src_label` is empty, then the corresponding `dst_label` is left untouched.
|
||||
|
||||
#### label_del
|
||||
|
||||
`label_del(q, "label1", ..., "labelN")` deletes the given `label*` labels from all the time series returned by `q`.
|
||||
|
||||
#### label_graphite_group
|
||||
|
||||
`label_graphite_group(q, groupNum1, ... groupNumN)` replaces metric names returned from `q` with the given Graphite group values concatenated via `.` char. For example, `label_graphite_group({__graphite__="foo*.bar.*"}, 0, 2)` would substitute `foo<any_value>.bar.<other_value>` metric names with `foo<any_value>.<other_value>`. This function is useful for aggregating Graphite metrics with [aggregate functions](#aggregate-functions). For example, the following query would return per-app memory usage:
|
||||
|
||||
```
|
||||
sum by (__name__) (
|
||||
label_graphite_group({__graphite__="app*.host*.memory_usage"}, 0)
|
||||
)
|
||||
```
|
||||
|
||||
#### label_join
|
||||
|
||||
`label_join(q, "dst_label", "separator", "src_label1", ..., "src_labelN")` joins `src_label*` values with the given `separator` and stores the result in `dst_label`. This is performed individually per each time series returned by `q`. For example, `label_join(up{instance="xxx",job="yyy"}, "foo", "-", "instance", "job")` would store `xxx-yyy` label value into `foo` label. This function is supported by PromQL.
|
||||
|
||||
#### label_keep
|
||||
|
||||
`label_keep(q, "label1", ..., "labelN")` deletes all the labels except of the listed `label*` labels in all the time series returned by `q`.
|
||||
|
||||
#### label_lowercase
|
||||
|
||||
`label_lowercase(q, "label1", ..., "labelN")` lowercases values for the given `label*` labels in all the time series returned by `q`.
|
||||
|
||||
#### label_map
|
||||
|
||||
`label_map(q, "label", "src_value1", "dst_value1", ..., "src_valueN", "dst_valueN")` maps `label` values from `src_*` to `dst*` for all the time seires returned by `q`.
|
||||
|
||||
#### label_match
|
||||
|
||||
`label_match(q, "label", "regexp")` drops time series from `q` with `label` not matching the given `regexp`. This function can be useful after [rollup](#rollup)-like functions, which may return multiple time series for every input series. See also [label_mismatch](#label_mismatch).
|
||||
|
||||
#### label_mismatch
|
||||
|
||||
`label_mismatch(q, "label", "regexp")` drops time series from `q` with `label` matching the given `regexp`. This function can be useful after [rollup](#rollup)-like functions, which may return multiple time series for every input series. See also [label_match](#label_match).
|
||||
|
||||
#### label_move
|
||||
|
||||
`label_move(q, "src_label1", "dst_label1", ..., "src_labelN", "dst_labelN")` moves label values from `src_label*` to `dst_label*` for all the time series returned by `q`. If `src_label` is empty, then the corresponding `dst_label` is left untouched.
|
||||
|
||||
#### label_replace
|
||||
|
||||
`label_replace(q, "dst_label", "replacement", "src_label", "regex")` applies the given `regex` to `src_label` and stores the `replacement` in `dst_label` if the given `regex` matches `src_label`. The `replacement` may contain references to regex captures such as `$1`, `$2`, etc. These references are substituted by the corresponding regex captures. For example, `label_replace(up{job="node-exporter"}, "foo", "bar-$1", "job", "node-(.+)")` would store `bar-node-exporter` label value into `foo` label. This function is supported by PromQL.
|
||||
|
||||
#### label_set
|
||||
|
||||
`label_set(q, "label1", "value1", ..., "labelN", "valueN")` sets `{label1="value1", ..., labelN="valueN"}` labels to all the time series returned by `q`.
|
||||
|
||||
#### label_transform
|
||||
|
||||
`label_transform(q, "label", "regexp", "replacement")` substitutes all the `regexp` occurences by the given `replacement` in the given `label`.
|
||||
|
||||
#### label_uppercase
|
||||
|
||||
`label_uppercase(q, "label1", ..., "labelN")` uppercases values for the given `label*` labels in all the time series returned by `q`.
|
||||
|
||||
#### label_value
|
||||
|
||||
`label_value(q, "label")` returns number values for the given `label` for every time series returned by `q`. For example, if `label_value(foo, "bar")` is applied to `foo{bar="1.234"}`, then it will return a time series `foo{bar="1.234"}` with `1.234` value.
|
||||
|
||||
|
||||
### Aggregate functions
|
||||
|
||||
**Aggregate functions** calculate aggregates over groups of rollup results. Additional details:
|
||||
* By default a single group is used for aggregation. Multiple independent groups can be set up by specifying grouping labels in `by` and `without` modifiers. For example, `count(up) by (job)` would group rollup results by `job` label value and calculate the [count](#count) aggregate function independently per each group, while `count(up) without (instance)` would group rollup results by all the labels except `instance` before calculating [count](#count) aggregate function independently per each group. Multiple labels can be put in `by` and `without` modifiers.
|
||||
* If the aggregate function is applied directly to a [series_selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors), then the [default_rollup()](#default_rollup) function is automatically applied before cacluating the aggregate. For example, `count(up)` is implicitly transformed to `count(default_rollup(up[1i]))`.
|
||||
* Aggregate functions accept arbitrary number of args. For example, `avg(q1, q2, q3)` would return the average values for every point across time series returned by `q1`, `q2` and `q3`.
|
||||
* Aggregate functions support optional `limit N` suffix, which can be used for limiting the number of output groups. For example, `sum(x) by (y) limit 3` limits the number of groups for the aggregation to 3. All the other groups are ignored.
|
||||
|
||||
See also [implicit query conversions](#implicit-query-conversions).
|
||||
|
||||
|
||||
#### any
|
||||
|
||||
`any(q) by (group_labels)` returns a single series per `group_labels` out of time series returned by `q`. See also [group](#group).
|
||||
|
||||
#### avg
|
||||
|
||||
`avg(q) by (group_labels)` returns the average value per `group_labels` for time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### bottomk
|
||||
|
||||
`bottomk(k, q)` returns up to `k` points with the smallest values across all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL. See also [topk](#topk).
|
||||
|
||||
#### bottomk_avg
|
||||
|
||||
`bottomk_avg(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the smallest averages. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `bottomk_avg(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the smallest averages plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [topk_avg](#topk_avg).
|
||||
|
||||
#### bottomk_last
|
||||
|
||||
`bottomk_last(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the smallest last values. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `bottomk_max(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the smallest maximums plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [topk_last](#topk_last).
|
||||
|
||||
#### bottomk_max
|
||||
|
||||
`bottomk_max(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the smallest maximums. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `bottomk_max(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the smallest maximums plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [topk_max](#topk_max).
|
||||
|
||||
#### bottomk_median
|
||||
|
||||
`bottomk_median(k, q, "other_label=other_value")` returns up to `k` time series from `q with the smallest medians. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `bottomk_median(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the smallest medians plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [topk_median](#topk_median).
|
||||
|
||||
#### bottomk_min
|
||||
|
||||
`bottomk_min(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the smallest minimums. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `bottomk_min(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the smallest minimums plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [topk_min](#topk_min).
|
||||
|
||||
#### count
|
||||
|
||||
`count(q) by (group_labels)` returns the number of non-empty points per `group_labels` for time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### count_values
|
||||
|
||||
`count_values("label", q)` counts the number of points with the same value and stores the counts in a time series with an additional `label`, wich contains each initial value. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### distinct
|
||||
|
||||
`distinct(q)` calculates the number of unique values per each group of points with the same timestamp.
|
||||
|
||||
#### geomean
|
||||
|
||||
`geomean(q)` calculates geometric mean per each group of points with the same timestamp.
|
||||
|
||||
#### group
|
||||
|
||||
`group(q) by (group_labels)` returns `1` per each `group_labels` for time series returned by `q`. This function is supported by PromQL. See also [any](#any).
|
||||
|
||||
#### histogram
|
||||
|
||||
`histogram(q)` calculates [VictoriaMetrics histogram](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) per each group of points with the same timestamp. Useful for visualizing big number of time series via a heatmap. See [this article](https://medium.com/@valyala/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) for more details.
|
||||
|
||||
#### limitk
|
||||
|
||||
`limitk(k, q) by (group_labels)` returns up to `k` time series per each `group_labels` out of time series returned by `q`. The returned set of time series remain the same across calls. See also [limit_offset](#limit_offset).
|
||||
|
||||
#### mad
|
||||
|
||||
`mad(q) by (group_labels)` returns the [Median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. See also [outliers_mad](#outliers_mad) and [stddev](#stddev).
|
||||
|
||||
#### max
|
||||
|
||||
`max(q) by (group_labels)` returns the maximum value per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### median
|
||||
|
||||
`median(q) by (group_labels)` returns the median value per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
|
||||
|
||||
#### min
|
||||
|
||||
`min(q) by (group_labels)` returns the minimum value per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### mode
|
||||
|
||||
`mode(q) by (group_labels)` returns [mode](https://en.wikipedia.org/wiki/Mode_(statistics)) per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
|
||||
|
||||
#### outliers_mad
|
||||
|
||||
`outliers_mad(tolerance, q)` returns time series from `q` with at least a single point outside [Median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) (aka MAD) multiplied by `tolerance`. E.g. it returns time series with at least a single point below `median(q) - mad(q)` or a single point above `median(q) + mad(q)`. See also [outliersk](#outliersk) and [mad](#mad).
|
||||
|
||||
#### outliersk
|
||||
|
||||
`outliersk(k, q)` returns up to `k` time series with the biggest standard deviation (aka outliers) out of time series returned by `q`. See also [outliers_mad](#outliers_mad).
|
||||
|
||||
#### quantile
|
||||
|
||||
`quantile(phi, q) by (group_labels)` calculates `phi`-quantile per each `group_labels` for all the time series returned by `q`. `phi` must be in the range `[0...1]`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL. See also [quantiles](#quantiles).
|
||||
|
||||
#### quantiles
|
||||
|
||||
`quantiles("phiLabel", phi1, ..., phiN, q)` calculates `phi*`-quantiles for all the time series returned by `q` and return them in time series with `{phiLabel="phi*"}` label. `phi*` must be in the range `[0...1]`. The aggregate is calculated individually per each group of points with the same timestamp. See also [quantile](#quantile).
|
||||
|
||||
#### stddev
|
||||
|
||||
`stddev(q) by (group_labels)` calculates standard deviation per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### stdvar
|
||||
|
||||
`stdvar(q) by (group_labels)` calculates standard variance per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### sum
|
||||
|
||||
`sum(q) by (group_labels)` returns the sum per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL.
|
||||
|
||||
#### sum2
|
||||
|
||||
`sum2(q) by (group_labels)` calculates the sum of squares per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp.
|
||||
|
||||
#### topk
|
||||
|
||||
`topk(k, q)` returns up to `k` points with the biggest values across all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. This function is supported by PromQL. See also [bottomk](#bottomk).
|
||||
|
||||
#### topk_avg
|
||||
|
||||
`topk_avg(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the biggest averages. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `topk_avg(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the biggest averages plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [bottomk_avg](#bottomk_avg).
|
||||
|
||||
#### topk_last
|
||||
|
||||
`topk_last(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the biggest last values. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `topk_max(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the biggest amaximums plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [bottomk_last](#bottomk_last).
|
||||
|
||||
#### topk_max
|
||||
|
||||
`topk_max(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the biggest maximums. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `topk_max(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the biggest amaximums plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [bottomk_max](#bottomk_max).
|
||||
|
||||
#### topk_median
|
||||
|
||||
`topk_median(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the biggest medians. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `topk_median(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the biggest medians plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [bottomk_median](#bottomk_median).
|
||||
|
||||
#### topk_min
|
||||
|
||||
`topk_min(k, q, "other_label=other_value")` returns up to `k` time series from `q` with the biggest minimums. If an optional `other_label=other_value` arg is set, then the sum of the remaining time series is returned with the given label. For example, `topk_min(3, sum(process_resident_memory_bytes) by (job), "job=other")` would return up to 3 time series with the biggest minimums plus a time series with `{job="other"}` label with the sum of the remaining series if any. See also [bottomk_min](#bottomk_min).
|
||||
|
||||
#### zscore
|
||||
|
||||
`zscore(q) by (group_labels)` returns [z-score](https://en.wikipedia.org/wiki/Standard_score) values per each `group_labels` for all the time series returned by `q`. The aggregate is calculated individually per each group of points with the same timestamp. Useful for detecting anomalies in the group of related time series.
|
||||
|
||||
|
||||
## Subqueries
|
||||
|
||||
MetricsQL supports and extends PromQL subqueries. See [this article](https://valyala.medium.com/prometheus-subqueries-in-victoriametrics-9b1492b720b3) for details. Any [rollup function](#rollup-functions) for something other than [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) form a subquery. Nested rollup functions can be implicit thanks to the [implicit query conversions](#implicit-query-conversions). For example, `delta(sum(m))` is implicitly converted to `delta(sum(default_rollup(m[1i]))[1i:1i])`, so it becomes a subquery, since it contains [default_rollup](#default_rollup) nested into [delta](#delta).
|
||||
|
||||
VictoriaMetrics performs subqueries in the following way:
|
||||
|
||||
* It calculates the inner rollup function using the `step` value from the outer rollup function. For example, for expression `max_over_time(rate(http_requests_total[5m])[1h:30s])` the inner function `rate(http_requests_total[5m])` is calculated with `step=30s`. The resulting data points are aligned by the `step`.
|
||||
* It calculates the outer rollup function over the results of the inner rollup function using the `step` value passed by Grafana to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
|
||||
|
||||
|
||||
## Implicit query conversions
|
||||
|
||||
VictoriaMetrics performs the following implicit conversions for incoming queries before starting the calculations:
|
||||
|
||||
* If lookbehind window in square brackets is missing inside [rollup function](#rollup-functions), then `[1i]` is automatically added there. The `[1i]` means one `step` value, which is passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries). It is also known as `$__interval` in Grafana. For example, `rate(http_requests_count)` is automatically transformed to `rate(http_requests_count[1i])`.
|
||||
* All the [series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors), which aren't wrapped into [rollup functions](#rollup-functions), are automatically wrapped into [default_rollup](#default_rollup) function. Examples:
|
||||
* `foo` is transformed to `default_rollup(foo[1i])`
|
||||
* `foo + bar` is transformed to `default_rollup(foo[1i]) + default_rollup(bar[1i])`
|
||||
* `count(up)` is transformed to `count(default_rollup(up[1i]))`, because [count](#count) isn't a [rollup function](#rollup-functions) - it is [aggregate function](#aggregate-functions)
|
||||
* `abs(temperature)` is transformed to `abs(default_rollup(temperature[1i]))`, because [abs](#abs) isn't a [rollup function](#rollup-functions) - it is [transform function](#transform-functions)
|
||||
* If `step` in square brackets is missing inside [subquery](#subqueries), then `1i` step is automatically added there. For example, `avg_over_time(rate(http_requests_total[5m])[1h])` is automatically converted to `avg_over_time(rate(http_requests_total[5m])[1h:1i])`.
|
||||
* If something other than [series selector](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) is passed to [rollup function](#rollup-functions), then a [subquery](#subqueries) with `1i` lookbehind window and `1i` step is automatically formed. For example, `rate(sum(up))` is automatically converted to `rate((sum(default_rollup(up[1i])))[1i:1i])`.
|
||||
BIN
PerTenantStatistic-stats.jpg
Normal file
|
After Width: | Height: | Size: 83 KiB |
81
PerTenantStatistic.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
sort: 18
|
||||
---
|
||||
|
||||
# VictoriaMetrics Cluster Per Tenant Statistic
|
||||
|
||||
***The per-tenant statistic is a part of [enterprise package](https://victoriametrics.com/products/enterprise/). It is available for download and evaluation at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)***
|
||||
|
||||
<img alt="cluster-per-tenant-stat" src="PerTenantStatistic-stats.jpg">
|
||||
|
||||
VictoriaMetrics cluster for enterprise provides various metrics and statistics usage per tenant:
|
||||
- `vminsert`
|
||||
* `vm_tenant_inserted_rows_total` - total number of inserted rows. Find out which tenant
|
||||
puts the most of the pressure on the storage.
|
||||
|
||||
- `vmselect`
|
||||
* `vm_tenant_select_requests_duration_ms_total` - query latency.
|
||||
Helps to identify tenants with the heaviest queries.
|
||||
* `vm_tenant_select_requests_total` - total number of requests.
|
||||
Discover which tenant sends the most of the queries and how it changes with time.
|
||||
|
||||
- `vmstorage`
|
||||
* `vm_tenant_active_timeseries` - number of active time series.
|
||||
This metric correlates with memory usage, so can be used to find the most expensive
|
||||
tenant in terms of memory.
|
||||
* `vm_tenant_used_tenant_bytes` - disk space usage. Helps to track disk space usage
|
||||
per tenant.
|
||||
* `vm_tenant_timeseries_created_total` - number of new time series created. Helps to track
|
||||
the churn rate per tenant, or identify inefficient usage of the system.
|
||||
|
||||
Collect the metrics by any scrape agent you like (`vmagent`, `victoriametrics`, Prometheus, etc) and put into TSDB.
|
||||
It is ok to use existing cluster for storing such metrics, but make sure to use a different tenant for it to avoid collisions.
|
||||
Or just run a separate TSDB (VM single, Promethes, etc.) to keep the data isolated from the main cluster.
|
||||
|
||||
Example of the scraping configuration for statistic is the following:
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: cluster
|
||||
scrape_interval: 10s
|
||||
static_configs:
|
||||
- targets: ['vmselect:8481','vmstorage:8482','vminsert:8480']
|
||||
```
|
||||
|
||||
## Visualization
|
||||
|
||||
Visualisation of statistics can be done in Grafana using the following
|
||||
[dashboard](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster/dashboards/clusterbytenant.json).
|
||||
|
||||
|
||||
## Integration with vmgateway
|
||||
|
||||
`vmgateway` supports integration with Per Tenant Statistics data for rate limiting purposes.
|
||||
More information can be found [here](https://docs.victoriametrics.com/vmgateway.html)
|
||||
|
||||
## Integration with vmalert
|
||||
|
||||
You can generate alerts based on each tenant's resource usage and send notifications
|
||||
to prevent limits exhaustion.
|
||||
|
||||
Here is an alert example for high churn rate by the tenant:
|
||||
|
||||
```yaml
|
||||
|
||||
- alert: TooHighChurnRate
|
||||
expr: |
|
||||
(
|
||||
sum(rate(vm_tenant_timeseries_created_total[5m])) by(accountID,projectID)
|
||||
/
|
||||
sum(rate(vm_tenant_inserted_rows_total[5m])) by(accountID,projectID)
|
||||
) > 0.1
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Churn rate is more than 10% for the last 15m"
|
||||
description: "VM constantly creates new time series in the tenant: {{ $labels.accountID }}:{{ $labels.projectID }}.\n
|
||||
This effect is known as Churn Rate.\n
|
||||
High Churn Rate is tightly connected with database performance and may
|
||||
result in unexpected OOM's or slow queries."
|
||||
```
|
||||
20
Quick-Start.md
Normal file
@@ -0,0 +1,20 @@
|
||||
---
|
||||
sort: 12
|
||||
---
|
||||
|
||||
# Quick Start
|
||||
|
||||
The following commands download the latest available [Docker image of VictoriaMetrics](https://hub.docker.com/r/victoriametrics/victoria-metrics) and start it at port 8428, while storing the ingested data at `victoria-metrics-data` subdirectory under the current directory:
|
||||
|
||||
```bash
|
||||
docker pull victoriametrics/victoria-metrics:latest
|
||||
docker run -it --rm -v `pwd`/victoria-metrics-data:/victoria-metrics-data -p 8428:8428 victoriametrics/victoria-metrics:latest
|
||||
```
|
||||
|
||||
Open `http://localhost:8428` in web browser and read [these docs](https://docs.victoriametrics.com/#operation).
|
||||
|
||||
VictoriaMetrics is also available in binaries (see [this page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)) and in source code (see [how to build VictoriaMetrics from sources](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-build-from-sources)).
|
||||
|
||||
There are also the following versions of VictoriaMetrics available:
|
||||
* [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) - horizontally scalable VictoriaMetrics, which scales to multiple nodes.
|
||||
* [Managed VictoriaMetrics at AWS](https://aws.amazon.com/marketplace/pp/prodview-4tbfq5icmbmyc).
|
||||
72
Release-Guide.md
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
sort: 17
|
||||
---
|
||||
|
||||
# Release process guidance
|
||||
|
||||
## Release version and Docker images
|
||||
|
||||
0. Document all the changes for new release in [CHANGELOG.md](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/CHANGELOG.md).
|
||||
1. Create the following release tags:
|
||||
* `git tag v1.xx.y` in `master` branch
|
||||
* `git tag v1.xx.y-cluster` in `cluster` branch
|
||||
* `git tag v1.xx.y-enterprise` in `enterprise` branch
|
||||
* `git tag v1.xx.y-enterprise-cluster` in `enterprise-cluster` branch
|
||||
2. Run `TAG=v1.xx.y make publish-release`. It will create `*.tar.gz` release archives with the corresponding `_checksums.txt` files inside `bin` directory and publish Docker images for the given `TAG`, `TAG-cluster`, `TAG-enterprise` and `TAG-enterprise-cluster`.
|
||||
5. Push release tag to https://github.com/VictoriaMetrics/VictoriaMetrics : `git push origin v1.xx.y`.
|
||||
6. Go to https://github.com/VictoriaMetrics/VictoriaMetrics/releases , create new release from the pushed tag on step 5 and upload `*.tar.gz` archive with the corresponding `_checksums.txt` from step 2.
|
||||
|
||||
## Building snap package.
|
||||
|
||||
pre-requirements:
|
||||
- snapcraft binary, can be installed with commands:
|
||||
for MacOS `brew install snapcraft` and [install mutipass](https://discourse.ubuntu.com/t/installing-multipass-on-macos/8329),
|
||||
for Ubuntu - `sudo snap install snapcraft --classic`
|
||||
- exported snapcraft login to `~/.snap/login.json` with `snapcraft export-login login.json && mkdir -p ~/.snap && mv login.json ~/.snap/`
|
||||
- already created release at github (it operates `git describe` version, so git tag must be annotated).
|
||||
|
||||
0. checkout to the latest git tag for single-node version.
|
||||
1. execute `make release-snap` - it must build and upload snap package.
|
||||
2. promote release to current, if needed manually at release page [snapcraft-releases](https://snapcraft.io/victoriametrics/releases)
|
||||
|
||||
### Public Announcement
|
||||
|
||||
- Publish message in Slack at https://victoriametrics.slack.com
|
||||
- Post at Twitter at https://twitter.com/MetricsVictoria
|
||||
- Post in Reddit at https://www.reddit.com/r/VictoriaMetrics/
|
||||
- Post in Linkedin at https://www.linkedin.com/company/victoriametrics/
|
||||
- Publish message in Telegram at https://t.me/VictoriaMetrics_en and https://t.me/VictoriaMetrics_ru1
|
||||
- Publish message in google groups at https://groups.google.com/forum/#!forum/victorametrics-users
|
||||
|
||||
## Helm Charts
|
||||
|
||||
The helm chart repository [https://github.com/VictoriaMetrics/helm-charts/](https://github.com/VictoriaMetrics/helm-charts/)
|
||||
|
||||
|
||||
### Bump the version of images.
|
||||
In that case, don't need to bump the helm chart version
|
||||
|
||||
1. Need to update [`values.yaml`](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-cluster/values.yaml), bump version for `vmselect`, `vminsert` and `vmstorage`
|
||||
2. Specify the correct version in [`Chart.yaml`](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-cluster/Chart.yaml)
|
||||
3. Update version [README.md](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-cluster/README.md), specify the new version in the documentation
|
||||
4. Push changes to master. `master` is a source of truth
|
||||
5. Rebase `master` into `gh-pages` branch
|
||||
6. Run `make package` which creates or updates zip file with the packed chart
|
||||
7. Run `make merge`. It creates or updates metadata for charts in index.yaml
|
||||
8. Push the changes to `gh-pages` branch
|
||||
|
||||
### Updating the chart.
|
||||
1. Update chart version in [`Chart.yaml`](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-cluster/Chart.yaml)
|
||||
2. Update [README.md](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-cluster/README.md) file, reflect changes in the documentation.
|
||||
3. Repeat the procedure from step _4_ previous section.
|
||||
|
||||
|
||||
## Wiki pages
|
||||
|
||||
All changes from `docs` folder and `.md` extension automatically push to Wiki
|
||||
|
||||
**_Note_**: no vice versa, direct changes on Wiki will be overitten after any changes in `docs/*.md`
|
||||
|
||||
## Github pages
|
||||
|
||||
All changes in `README.md`, `docs` folder and `.md` extension automatically push to Wiki
|
||||
1878
Single-server-VictoriaMetrics.md
Normal file
18
_config.yml
Normal file
@@ -0,0 +1,18 @@
|
||||
title: VictoriaMetrics
|
||||
author: VictoriaMetrics
|
||||
email: info@victoriametrics.com
|
||||
description: >
|
||||
The High Performance Open Source Time Series Database & Monitoring Solution
|
||||
|
||||
remote_theme: rundocs/jekyll-rtd-theme
|
||||
|
||||
readme_index:
|
||||
with_frontmatter: true
|
||||
|
||||
google:
|
||||
gtag: UA-129683199-1
|
||||
|
||||
plugins:
|
||||
- github-pages
|
||||
|
||||
|
||||
2
_includes/extra/head.html
Normal file
@@ -0,0 +1,2 @@
|
||||
<script src="/assets/js/clipboard.min.js"></script>
|
||||
<link rel="stylesheet" href="/assets/css/clipboard.css">
|
||||
35
_includes/extra/script.js
Normal file
@@ -0,0 +1,35 @@
|
||||
/* When in mobile layout, the anchor navigation for submenus
|
||||
* doesn't work due to fixed body height when menu is toggled.
|
||||
* This script intercepts clicks on links, toggles the menu off
|
||||
* and performs the anchor navigation. */
|
||||
$(document).on("click", '.shift li.toc a', function(e) {
|
||||
let segments = this.href.split('#');
|
||||
if (segments.length < 2) {
|
||||
/* ignore links without anchor */
|
||||
return true;
|
||||
}
|
||||
|
||||
e.preventDefault();
|
||||
$("#toggle").click();
|
||||
setTimeout(function () {
|
||||
location.hash = segments.pop();
|
||||
},1)
|
||||
});
|
||||
|
||||
/* Clipboard-copy snippet from https://github.com/marcoaugustoandrade/jekyll-clipboardjs/blob/master/copy.js */
|
||||
let codes = document.querySelectorAll('.with-copy .highlight > pre > code');
|
||||
let countID = 0;
|
||||
codes.forEach((code) => {
|
||||
|
||||
code.setAttribute("id", "code" + countID);
|
||||
|
||||
let btn = document.createElement('button');
|
||||
btn.innerHTML = "Copy";
|
||||
btn.className = "btn-copy";
|
||||
btn.setAttribute("data-clipboard-action", "copy");
|
||||
btn.setAttribute("data-clipboard-target", "#code" + countID);
|
||||
code.before(btn);
|
||||
countID++;
|
||||
});
|
||||
|
||||
let clipboard = new ClipboardJS('.btn-copy');
|
||||
46
_includes/extra/styles.scss
Normal file
@@ -0,0 +1,46 @@
|
||||
* {
|
||||
box-sizing: border-box !important;
|
||||
font-family: 'Lato', sans-serif;
|
||||
}
|
||||
|
||||
@media (min-width: 1280px) {
|
||||
.content-wrap {
|
||||
max-width: none;
|
||||
}
|
||||
}
|
||||
|
||||
// Logo
|
||||
.fa-home{
|
||||
vertical-align: middle;
|
||||
width: 14px;
|
||||
}
|
||||
|
||||
.fa-home:before {
|
||||
content: url(../assets/images/vm_logo.svg);
|
||||
}
|
||||
|
||||
// Navigation
|
||||
.header {
|
||||
background: black;
|
||||
}
|
||||
|
||||
.toctree .caption {
|
||||
color: #fc5d20 !important;
|
||||
}
|
||||
|
||||
.toctree .fa {
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
// Footer
|
||||
.status .branch .name{
|
||||
color: #b35cec !important;
|
||||
}
|
||||
|
||||
// Content
|
||||
.markdown-body h1,
|
||||
.markdown-body h2,
|
||||
.markdown-body h3,
|
||||
.markdown-body h4 {
|
||||
font-family: 'Lato', sans-serif !important;
|
||||
}
|
||||
23
assets/css/clipboard.css
Normal file
@@ -0,0 +1,23 @@
|
||||
/* modification for button copy */
|
||||
.btn-copy {
|
||||
position: absolute;
|
||||
color: #FFF;
|
||||
background: rgba(0, 0, 0, .15);
|
||||
border: 0;
|
||||
padding: 2px 10px;
|
||||
cursor: pointer;
|
||||
right: 0px;
|
||||
top: 0px;
|
||||
}
|
||||
|
||||
.btn-copy:hover {
|
||||
background: rgba(0, 0, 0, .25);
|
||||
}
|
||||
|
||||
/* shift snippet syntax type label if Copy button is enabled */
|
||||
.with-copy div.highlighter-rouge:after {
|
||||
right: 50px !important;
|
||||
}
|
||||
.markdown-body .highlight pre, .markdown-body pre {
|
||||
padding: 20px !important;
|
||||
}
|
||||
30
assets/images/favicon.svg
Normal file
@@ -0,0 +1,30 @@
|
||||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<!-- Generator: Adobe Illustrator 24.3.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
|
||||
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
|
||||
viewBox="0 0 16 16" style="enable-background:new 0 0 16 16;" xml:space="preserve">
|
||||
<style type="text/css">
|
||||
.st0{fill:#110F0F;}
|
||||
</style>
|
||||
<g>
|
||||
<g>
|
||||
<path class="st0" d="M6.81,7.04c0.27,0.23,0.72,0.4,1.17,0.41v0c0.01,0,0.02,0,0.02,0c0.01,0,0.02,0,0.02,0v0
|
||||
c0.45-0.01,0.9-0.18,1.17-0.41c1.41-1.2,5.5-4.95,5.5-4.95c1.1-1.02-1.95-2.03-6.67-2.04v0c0,0-0.01,0-0.01,0
|
||||
c-0.01,0-0.01,0-0.02,0c-0.01,0-0.01,0-0.02,0c0,0-0.01,0-0.01,0v0C3.26,0.06,0.21,1.07,1.31,2.09C1.31,2.09,5.39,5.84,6.81,7.04z
|
||||
"/>
|
||||
</g>
|
||||
<g>
|
||||
<path class="st0" d="M9.19,9.07C8.93,9.3,8.48,9.47,8.02,9.48v0c-0.01,0-0.02,0-0.02,0v0c0,0,0,0,0,0c0,0,0,0,0,0v0
|
||||
c-0.01,0-0.02,0-0.02,0v0C7.52,9.47,7.07,9.3,6.81,9.07C5.83,8.24,2.38,5.11,1.09,3.93c0,0.77,0,1.41,0,1.81
|
||||
c0,0.2,0.08,0.46,0.21,0.59c0.88,0.81,4.28,3.92,5.51,4.97c0.27,0.23,0.72,0.4,1.17,0.41v0c0.01,0,0.02,0,0.02,0
|
||||
c0.01,0,0.02,0,0.02,0v0c0.45-0.01,0.9-0.18,1.17-0.41c1.22-1.04,4.63-4.16,5.51-4.97c0.14-0.12,0.21-0.39,0.21-0.59
|
||||
c0-0.4,0-1.04,0-1.81C13.62,5.11,10.17,8.24,9.19,9.07z"/>
|
||||
</g>
|
||||
<g>
|
||||
<path class="st0" d="M9.19,13.32c-0.27,0.23-0.72,0.4-1.17,0.41v0c-0.01,0-0.02,0-0.02,0v0c0,0,0,0,0,0c0,0,0,0,0,0v0
|
||||
c-0.01,0-0.02,0-0.02,0v0c-0.45-0.01-0.9-0.18-1.17-0.41c-0.98-0.83-4.42-3.96-5.72-5.14c0,0.77,0,1.41,0,1.81
|
||||
c0,0.2,0.08,0.46,0.21,0.59c0.88,0.81,4.28,3.92,5.51,4.97c0.27,0.23,0.72,0.4,1.17,0.41v0c0.01,0,0.02,0,0.02,0
|
||||
c0.01,0,0.02,0,0.02,0v0c0.45-0.01,0.9-0.18,1.17-0.41c1.22-1.04,4.63-4.16,5.51-4.97c0.14-0.12,0.21-0.39,0.21-0.59
|
||||
c0-0.4,0-1.04,0-1.81C13.62,9.36,10.17,12.48,9.19,13.32z"/>
|
||||
</g>
|
||||
</g>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 1.8 KiB |
1
assets/images/vm_logo.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg id="VM_logo" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 464.61 533.89"><defs><style>.cls-1{fill:#fff;}</style></defs><path class="cls-1" d="M459.86,467.77c9,7.67,24.12,13.49,39.3,13.69v0h1.68v0c15.18-.2,30.31-6,39.3-13.69,47.43-40.45,184.65-166.24,184.65-166.24,36.84-34.27-65.64-68.28-223.95-68.47h-1.68c-158.31.19-260.79,34.2-224,68.47C275.21,301.53,412.43,427.32,459.86,467.77Z" transform="translate(-267.7 -233.05)"/><path class="cls-1" d="M540.1,535.88c-9,7.67-24.12,13.5-39.3,13.7h-1.6c-15.18-.2-30.31-6-39.3-13.7-32.81-28-148.56-132.93-192.16-172.7v60.74c0,6.67,2.55,15.52,7.09,19.68,29.64,27.18,143.94,131.8,185.07,166.88,9,7.67,24.12,13.49,39.3,13.69v0h1.6v0c15.18-.2,30.31-6,39.3-13.69,41.13-35.08,155.43-139.7,185.07-166.88,4.54-4.16,7.09-13,7.09-19.68V363.18C688.66,403,572.91,507.9,540.1,535.88Z" transform="translate(-267.7 -233.05)"/><path class="cls-1" d="M540.1,678.64c-9,7.67-24.12,13.49-39.3,13.69v0h-1.6v0c-15.18-.2-30.31-6-39.3-13.69-32.81-28-148.56-132.94-192.16-172.7v60.73c0,6.67,2.55,15.53,7.09,19.69,29.64,27.17,143.94,131.8,185.07,166.87,9,7.67,24.12,13.5,39.3,13.7h1.6c15.18-.2,30.31-6,39.3-13.7,41.13-35.07,155.43-139.7,185.07-166.87,4.54-4.16,7.09-13,7.09-19.69V505.94C688.66,545.7,572.91,650.66,540.1,678.64Z" transform="translate(-267.7 -233.05)"/></svg>
|
||||
|
After Width: | Height: | Size: 1.3 KiB |
7
assets/js/clipboard.min.js
vendored
Normal file
1
googlec3812dcf278679ec.html
Normal file
@@ -0,0 +1 @@
|
||||
google-site-verification: googlec3812dcf278679ec.html
|
||||
11
guides/README.md
Normal file
@@ -0,0 +1,11 @@
|
||||
---
|
||||
sort: 21
|
||||
---
|
||||
|
||||
# Guides
|
||||
|
||||
1. [K8s monitoring via VM Single](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-single.html)
|
||||
2. [K8s monitoring via VM Cluster](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html)
|
||||
3. [HA monitoring setup in K8s via VM Cluster](https://docs.victoriametrics.com/guides/k8s-ha-monitoring-via-vm-cluster.html)
|
||||
4. [Getting started with VM Operator](https://docs.victoriametrics.com/guides/getting-started-with-vm-operator.html)
|
||||
5. [Multi Retention Setup within VictoriaMetrics Cluster](https://docs.victoriametrics.com/guides/guide-vmcluster-multiple-retention-setup.html)
|
||||
317
guides/getting-started-with-vm-operator.md
Normal file
@@ -0,0 +1,317 @@
|
||||
# Getting started with VM Operator
|
||||
|
||||
**The guide covers:**
|
||||
|
||||
* The setup of a [VM Operator](https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-operator) via Helm in [Kubernetes](https://kubernetes.io/) with Helm charts.
|
||||
* The setup of a [VictoriaMetrics Cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) via [VM Operator](https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-operator).
|
||||
* How to add CRD for a [VictoriaMetrics Cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) via [VM Operator](https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-operator).
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com)
|
||||
|
||||
**Preconditions**
|
||||
|
||||
* [Kubernetes cluster 1.20.9-gke.1001](https://cloud.google.com/kubernetes-engine). We use a GKE cluster from [GCP](https://cloud.google.com/) but this guide also applies to any Kubernetes cluster. For example, [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3](https://helm.sh/docs/intro/install).
|
||||
* [kubectl 1.21+](https://kubernetes.io/docs/tasks/tools/install-kubectl).
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
See how to work with a [VictoriaMetrics Helm repository in previous guide](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html#1-victoriametrics-helm-repository).
|
||||
|
||||
## 2. Install the VM Operator from the Helm chart
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm install operator vm/victoria-metrics-operator
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME: vmoperator
|
||||
LAST DEPLOYED: Thu Sep 30 17:30:30 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
victoria-metrics-operator has been installed. Check its status by running:
|
||||
kubectl --namespace default get pods -l "app.kubernetes.io/instance=vmoperator"
|
||||
|
||||
Get more information on https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-operator.
|
||||
See "Getting started guide for VM Operator" on https://docs.victoriametrics.com/guides/getting-started-with-vm-operator.html.
|
||||
```
|
||||
|
||||
Run the following command to check that VM Operator is up and running:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl --namespace default get pods -l "app.kubernetes.io/instance=vmoperator"
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
vmoperator-victoria-metrics-operator-67cff44cd6-s47n6 1/1 Running 0 77s
|
||||
```
|
||||
|
||||
## 3. Install VictoriaMetrics Cluster
|
||||
|
||||
> For this example we will use default value for `name: example-vmcluster-persistent`. Change it value up to your needs.
|
||||
|
||||
Run the following command to install [VictoriaMetrics Cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) via [VM Operator](https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-operator):
|
||||
|
||||
<div class="with-copy" markdown="1" id="example-cluster-config">
|
||||
|
||||
```bash
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMCluster
|
||||
metadata:
|
||||
name: example-vmcluster-persistent
|
||||
spec:
|
||||
# Add fields here
|
||||
retentionPeriod: "12"
|
||||
vmstorage:
|
||||
replicaCount: 2
|
||||
vmselect:
|
||||
replicaCount: 2
|
||||
vminsert:
|
||||
replicaCount: 2
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
|
||||
```bash
|
||||
vmcluster.operator.victoriametrics.com/example-vmcluster-persistent created
|
||||
```
|
||||
|
||||
* By applying this CRD we install the [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) to the default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) of your k8s cluster with following params:
|
||||
* `retentionPeriod: "12"` defines the [retention](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#retention) to 12 months.
|
||||
* `replicaCount: 2` creates two replicas of vmselect, vminsert and vmstorage.
|
||||
|
||||
Please note that it may take some time for the pods to start. To check that the pods are started, run the following command:
|
||||
<div class="with-copy" markdown="1" id="example-cluster-config">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmcluster
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
vminsert-example-vmcluster-persistent-845849cb84-9vb6f 1/1 Running 0 5m15s
|
||||
vminsert-example-vmcluster-persistent-845849cb84-r7mmk 1/1 Running 0 5m15s
|
||||
vmselect-example-vmcluster-persistent-0 1/1 Running 0 5m21s
|
||||
vmselect-example-vmcluster-persistent-1 1/1 Running 0 5m21s
|
||||
vmstorage-example-vmcluster-persistent-0 1/1 Running 0 5m25s
|
||||
vmstorage-example-vmcluster-persistent-1 1/1 Running 0 5m25s
|
||||
```
|
||||
|
||||
There is an extra command to get information about the cluster state:
|
||||
<div class="with-copy" markdown="1" id="services">
|
||||
|
||||
```bash
|
||||
kubectl get vmclusters
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
```bash
|
||||
NAME INSERT COUNT STORAGE COUNT SELECT COUNT AGE STATUS
|
||||
example-vmcluster-persistent 2 2 2 5m53s operational
|
||||
```
|
||||
|
||||
Internet traffic goes through the Kubernetes Load balancer which use the set of Pods targeted by a [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service/). The service in [VictoriaMetrics Cluster architecture](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview) which accepts the ingested data named `vminsert` and in Kubernetes it is a `vminsert ` service. So we need to use it for remote_write url.
|
||||
|
||||
To get the name of `vminsert` services, please run the following command:
|
||||
|
||||
<div class="with-copy" markdown="1" id="services">
|
||||
|
||||
```bash
|
||||
kubectl get svc | grep vminsert
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
|
||||
```bash
|
||||
vminsert-example-vmcluster-persistent ClusterIP 10.107.47.136 <none> 8480/TCP 5m58s
|
||||
```
|
||||
|
||||
To scrape metrics from Kubernetes with a VictoriaMetrics Cluster we will need to install [VMAgent](https://docs.victoriametrics.com/vmagent.html) with some additional configurations.
|
||||
Copy `vminsert-example-vmcluster-persistent` (or whatever user put into metadata.name field [https://docs.victoriametrics.com/guides/getting-started-with-vm-operator.html#example-cluster-config](https://docs.victoriametrics.com/guides/getting-started-with-vm-operator.html#example-cluster-config)) service name and add it to the `remoteWrite` URL from [quick-start example](https://github.com/VictoriaMetrics/operator/blob/master/docs/quick-start.MD#vmagent).
|
||||
Here is an example of the full configuration that we need to apply:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
nodeScrapeSelector: {}
|
||||
nodeScrapeNamespaceSelector: {}
|
||||
staticScrapeSelector: {}
|
||||
staticScrapeNamespaceSelector: {}
|
||||
replicaCount: 1
|
||||
remoteWrite:
|
||||
- url: "http://vminsert-example-vmcluster-persistent.default.svc.cluster.local:8480/insert/0/prometheus/api/v1/write"
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
|
||||
The expected output:
|
||||
```bash
|
||||
vmagent.operator.victoriametrics.com/example-vmagent created
|
||||
```
|
||||
|
||||
>`remoteWrite.url` for VMAgent consists of the following parts:
|
||||
> "service_name.VMCluster_namespace.svc.kubernetes_cluster_domain" that in our case will look like vminsert-example-vmcluster-persistent.default.svc.cluster.local
|
||||
|
||||
Verify that `VMAgent` is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmagent
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-example-vmagent-7996844b5f-b5rzs 2/2 Running 0 9s
|
||||
```
|
||||
|
||||
> There are two containers for VMagent: the first one is a VMagent and the second one is a sidecard with a secret. VMagent use a secret with configuration wich is mounted to the special sidecar. It observes the changes with configuration and send a signal to reload configuration for the VMagent.
|
||||
|
||||
Run the following command to make `VMAgent`'s port accessible from the local machine:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
</div>
|
||||
|
||||
```bash
|
||||
kubectl port-forward svc/vmagent-example-vmagent 8429:8429
|
||||
```
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
Forwarding from 127.0.0.1:8429 -> 8429
|
||||
Forwarding from [::1]:8429 -> 8429
|
||||
```
|
||||
|
||||
To check that `VMAgent` collects metrics from the k8s cluster open in the browser [http://127.0.0.1:8429/targets](http://127.0.0.1:8429/targets) .
|
||||
You will see something like this:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-via-vm-operator.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
`VMAgent` connects to [kubernetes service discovery](https://kubernetes.io/docs/concepts/services-networking/service/) and gets targets which needs to be scraped. This service discovery is controlled by [VictoriaMetrics Operator](https://github.com/VictoriaMetrics/operator)
|
||||
|
||||
## 4. Verifying VictoriaMetrics cluster
|
||||
|
||||
See [how to install and connect Grafana to VictoriaMetrics](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html#4-install-and-connect-grafana-to-victoriametrics-with-helm) but with one addition - we should get the name of `vmselect` service from the freshly installed VictoriaMetrics Cluster because it will now be different.
|
||||
|
||||
To get the new service name, please run the following command:
|
||||
|
||||
<div class="with-copy" markdown="1" id="services">
|
||||
|
||||
```bash
|
||||
kubectl get svc | grep vmselect
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
|
||||
```bash
|
||||
vmselect-example-vmcluster-persistent ClusterIP None <none> 8481/TCP 7m
|
||||
```
|
||||
|
||||
The final config will look like this:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: victoriametrics
|
||||
type: prometheus
|
||||
orgId: 1
|
||||
url: http://vmselect-example-vmcluster-persistent.default.svc.cluster.local:8481/select/0/prometheus/
|
||||
access: proxy
|
||||
isDefault: true
|
||||
updateIntervalSeconds: 10
|
||||
editable: true
|
||||
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: true
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/default
|
||||
|
||||
dashboards:
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 11176
|
||||
revision: 17
|
||||
datasource: victoriametrics
|
||||
vmagent:
|
||||
gnetId: 12683
|
||||
revision: 7
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
|
||||
## 5. Check the result you obtained in your browser
|
||||
|
||||
To check that [VictoriaMetrics](https://victoriametrics.com) collecting metrics from the k8s cluster open in your browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose the `VictoriaMetrics - cluster` dashboard. Use `admin` for login and the `password` that you previously got from kubectl.
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-via-vm-operator-grafana1.png" width="800" alt="grafana dashboards">
|
||||
</p>
|
||||
|
||||
The expected output is:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-via-vm-operator-grafana2.png" width="800" alt="grafana dashboards">
|
||||
</p>
|
||||
|
||||
## 6. Summary
|
||||
|
||||
* We set up Kubernetes Operator for VictoriaMetrics with using CRD.
|
||||
* We collected metrics from all running services and stored them in the VictoriaMetrics database.
|
||||
BIN
guides/guide-vmcluster-dashes-agent.png
Normal file
|
After Width: | Height: | Size: 51 KiB |
BIN
guides/guide-vmcluster-grafana-dash.png
Normal file
|
After Width: | Height: | Size: 113 KiB |
BIN
guides/guide-vmcluster-k8s-dashboard.png
Normal file
|
After Width: | Height: | Size: 149 KiB |
BIN
guides/guide-vmcluster-k8s-ha-explore-count-up-graph.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
BIN
guides/guide-vmcluster-k8s-ha-explore-count-up-graph2.png
Normal file
|
After Width: | Height: | Size: 59 KiB |
BIN
guides/guide-vmcluster-k8s-ha-explore-count-up.png
Normal file
|
After Width: | Height: | Size: 44 KiB |
BIN
guides/guide-vmcluster-k8s-ha-explore.png
Normal file
|
After Width: | Height: | Size: 46 KiB |
BIN
guides/guide-vmcluster-k8s-scheme.png
Normal file
|
After Width: | Height: | Size: 127 KiB |
BIN
guides/guide-vmcluster-k8s-via-vm-operator-grafana1.png
Normal file
|
After Width: | Height: | Size: 55 KiB |
BIN
guides/guide-vmcluster-k8s-via-vm-operator-grafana2.png
Normal file
|
After Width: | Height: | Size: 236 KiB |
BIN
guides/guide-vmcluster-k8s-via-vm-operator.png
Normal file
|
After Width: | Height: | Size: 204 KiB |
BIN
guides/guide-vmcluster-multiple-retention-scheme.png
Normal file
|
After Width: | Height: | Size: 83 KiB |
40
guides/guide-vmcluster-multiple-retention-setup.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Multi Retention Setup within VictoriaMetrics Cluster
|
||||
|
||||
|
||||
**Objective**
|
||||
|
||||
Setup Victoria Metrics TSDB with support of multiple retention periods within one installation.
|
||||
|
||||
**Challenge**
|
||||
|
||||
VictoriaMetrics instance (single node or vmstorage node) supports only one retention period.
|
||||
|
||||
|
||||
**Solution**
|
||||
|
||||
A multi-retention setup can be implemented by dividing a [victoriametrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) into logical groups with different retentions.
|
||||
|
||||
Example:
|
||||
Setup should handle 3 different retention groups 3months, 1year and 3 years.
|
||||
Solution contains 3 groups of vmstorages + vminserst and one group of vmselects. Routing is done by [vmagent](https://docs.victoriametrics.com/vmagent.html) and [relabeling configuration](https://docs.victoriametrics.com/vmagent.html#relabeling)
|
||||
|
||||
The diagram below shows a proposed solution
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-multiple-retention-scheme.png" width="800">
|
||||
</p>
|
||||
|
||||
**Implementation Details**
|
||||
1. Groups of vminserts A know about only vmstorages A and this is explicitly specified in [-storageNode configuration](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#cluster-setup).
|
||||
2. Groups of vminserts B know about only vmstorages B and this is explicitly specified in `-storageNode` configuration.
|
||||
3. Groups of vminserts C know about only vmstorages A and this is explicitly specified in `-storageNode` configuration.
|
||||
4. Vmselect reads data from all vmstorage nodes.
|
||||
5. Vmagent routes incoming metrics to the given set of `vminsert` nodes using relabeling rules specified at `-remoteWrite.urlRelabelConfig`. See [these docs](https://docs.victoriametrics.com/vmagent.html#relabeling).
|
||||
|
||||
**Multi-Tenant Setup**
|
||||
|
||||
Every group of vmstorages can handle one tenant or multiple one. Different groups can have overlapping tenants. As vmselect reads from all vmstorage nodes, the data is aggregated on its level.
|
||||
|
||||
**Additional Enhancements**
|
||||
|
||||
You can set up [vmauth](https://docs.victoriametrics.com/vmauth.html) for routing data to the given vminsert group depending on the needed retention.
|
||||
BIN
guides/guide-vmcluster-vmagent-grafana-dash.png
Normal file
|
After Width: | Height: | Size: 104 KiB |
222
guides/guide-vmcluster-vmagent-values.yaml
Normal file
@@ -0,0 +1,222 @@
|
||||
remoteWriteUrls:
|
||||
- http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 10s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
- job_name: "kubernetes-service-endpoints"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-service-endpoints-slow"
|
||||
scrape_interval: 5m
|
||||
scrape_timeout: 30s
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-services"
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
kubernetes_sd_configs:
|
||||
- role: service
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_probe]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- target_label: __address__
|
||||
replacement: blackbox
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
- job_name: "kubernetes-pods"
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||||
action: replace
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
target_label: __address__
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_pod_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_pod_name]
|
||||
action: replace
|
||||
target_label: kubernetes_pod_name
|
||||
BIN
guides/guide-vmsingle-grafana-dashboards.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
guides/guide-vmsingle-grafana-k8s-dashboard.png
Normal file
|
After Width: | Height: | Size: 171 KiB |
BIN
guides/guide-vmsingle-grafana.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
guides/guide-vmsingle-k8s-scheme.png
Normal file
|
After Width: | Height: | Size: 43 KiB |
81
guides/guide-vmsingle-values.yaml
Normal file
@@ -0,0 +1,81 @@
|
||||
server:
|
||||
scrape:
|
||||
enabled: true
|
||||
configMap: ""
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
scrape_configs:
|
||||
- job_name: victoriametrics
|
||||
static_configs:
|
||||
- targets: [ "localhost:8428" ]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
434
guides/k8s-ha-monitoring-via-vm-cluster.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# HA monitoring setup in Kubernetes via VictoriaMetrics Cluster
|
||||
|
||||
|
||||
**The guide covers:**
|
||||
|
||||
* High availability monitoring via [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) in [Kubernetes](https://kubernetes.io/) with Helm charts
|
||||
* How to store metrics
|
||||
* How to scrape metrics from k8s components using a service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com)
|
||||
|
||||
**Preconditions**
|
||||
|
||||
* [Kubernetes cluster 1.19.12-gke.2100](https://cloud.google.com/kubernetes-engine). We use GKE cluster from [GCP](https://cloud.google.com/) but this guide also applies to any Kubernetes cluster. For example, [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
* [jq](https://stedolan.github.io/jq/download/) tool
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
Please see the relevant [VictoriaMetrics Helm repository](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html#1-victoriametrics-helm-repository) section in previous guides.
|
||||
|
||||
|
||||
## 2. Install VictoriaMetrics Cluster from the Helm chart
|
||||
|
||||
Execute the following command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install vmcluster vm/victoria-metrics-cluster -f -
|
||||
vmselect:
|
||||
extraArgs:
|
||||
dedup.minScrapeInterval: 1ms
|
||||
replicationFactor: 2
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8481"
|
||||
replicaCount: 3
|
||||
|
||||
vminsert:
|
||||
extraArgs:
|
||||
replicationFactor: 2
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8480"
|
||||
replicaCount: 3
|
||||
|
||||
vmstorage:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8482"
|
||||
replicaCount: 3
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
* The `Helm install vmcluster vm/victoria-metrics-cluster` command installs [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) to the default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
|
||||
* `dedup.minScrapeInterval: 1ms` configures [de-deplication](https://docs.victoriametrics.com/#deduplication) for the cluster that de-duplicates data points in the same time series if they fall within the same discrete 1s bucket. The earliest data point will be kept. In the case of equal timestamps, an arbitrary data point will be kept.
|
||||
* `replicationFactor: 2` Replication factor for the ingested data, i.e. how many copies should be made among distinct `-storageNode` instances. If the replication factor is greater than one, the deduplication must be enabled on the remote storage side.
|
||||
* `podAnnotations: prometheus.io/scrape: "true"` enables the scraping of metrics from the vmselect, vminsert and vmstorage pods.
|
||||
* `podAnnotations:prometheus.io/port: "some_port" ` enables the scraping of metrics from the vmselect, vminsert and vmstorage pods from corresponding ports.
|
||||
* `replicaCount: 3` creates three replicas of vmselect, vminsert and vmstorage.
|
||||
|
||||
|
||||
The expected result of the command execution is the following:
|
||||
|
||||
```bash
|
||||
NAME: vmcluster
|
||||
LAST DEPLOYED: Thu Jul 29 13:33:51 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
Write API:
|
||||
|
||||
The VictoriaMetrics write api can be accessed via port 8480 via the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local
|
||||
|
||||
Get the VictoriaMetrics insert service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vminsert" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8480
|
||||
|
||||
You need to update your Prometheus configuration file and add the following lines to it:
|
||||
|
||||
prometheus.yml
|
||||
|
||||
remote_write:
|
||||
- url: "http://<insert-service>/insert/0/prometheus/"
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
remote_write:
|
||||
- url: "http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/"
|
||||
Read API:
|
||||
|
||||
The VictoriaMetrics read api can be accessed via port 8481 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local
|
||||
|
||||
Get the VictoriaMetrics select service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vmselect" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8481
|
||||
|
||||
You need to specify select service URL into your Grafana:
|
||||
NOTE: you need to use the Prometheus Data Source
|
||||
|
||||
Input this URL field into Grafana
|
||||
|
||||
http://<select-service>/select/0/prometheus/
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/"
|
||||
|
||||
```
|
||||
|
||||
Verify that the VictoriaMetrics cluster pods are up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmcluster
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-4mh9d 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-4ppl7 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-782qk 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-4v4ws 1/1 Running 0 2m27s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-kwc7q 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-v7pmk 1/1 Running 0 2m28s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 2m27s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 2m3s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-2 1/1 Running 0 99s
|
||||
```
|
||||
|
||||
## 3. Install vmagent from the Helm chart
|
||||
|
||||
To scrape metrics from Kubernetes with a VictoriaMetrics Cluster we will need to install [vmagent](https://docs.victoriametrics.com/vmagent.html) with some additional configurations. To do so, please run the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
helm install vmagent vm/victoria-metrics-agent -f https://docs.victoriametrics.com/guides/guide-vmcluster-vmagent-values.yaml
|
||||
```
|
||||
</div>
|
||||
|
||||
Here is full file content `guide-vmcluster-vmagent-values.yaml`
|
||||
|
||||
```yaml
|
||||
remoteWriteUrls:
|
||||
- http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
```
|
||||
|
||||
* `remoteWriteUrls: - http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/` configures `vmagent` to write scraped metrics to the `vmselect service`.
|
||||
* The `metric_relabel_configs` section allows you to process Kubernetes metrics for the Grafana dashboard.
|
||||
|
||||
|
||||
Verify that `vmagent`'s pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmagent
|
||||
```
|
||||
</div>
|
||||
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-victoria-metrics-agent-57ddbdc55d-h4ljb 1/1 Running 0 13s
|
||||
```
|
||||
|
||||
## 4. Verifying HA of VictoraMetrics Cluster
|
||||
|
||||
Run the following command to check that VictoriaMetrics services are up and running:
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep victoria-metrics
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-victoria-metrics-agent-57ddbdc55d-h4ljb 1/1 Running 0 75s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-s8v7x 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-xlm9d 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vminsert-78b84d8cd9-xqxrh 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-7dg95 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-ck7qb 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmselect-69c5f48bc6-jjqsl 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 89s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 63s
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-2 1/1 Running 0 34s
|
||||
```
|
||||
|
||||
To verify that metrics are present in the VictoriaMetrics send a curl request to the `vmselect` service from kubernetes or setup Grafana and check it via the web interface.
|
||||
|
||||
Run the following command to see the list of services:
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
k get svc | grep vmselect
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output:
|
||||
|
||||
```bash
|
||||
vmcluster-victoria-metrics-cluster-vmselect ClusterIP 10.88.2.69 <none> 8481/TCP 1m
|
||||
```
|
||||
|
||||
Run the following command to make `vmselect`'s port accessable from the local machine:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl port-forward svc/vmcluster-victoria-metrics-cluster-vmselect 8481:8481
|
||||
```
|
||||
</div>
|
||||
|
||||
Execute the following command to get metrics via `curl`:
|
||||
|
||||
```bash
|
||||
curl -sg 'http://127.0.0.1:8481/select/0/prometheus/api/v1/query_range?query=count(up{kubernetes_pod_name=~".*vmselect.*"})&start=-10m&step=1m' | jq
|
||||
```
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
{
|
||||
"status": "success",
|
||||
"isPartial": false,
|
||||
"data": {
|
||||
"resultType": "matrix",
|
||||
"result": [
|
||||
{
|
||||
"metric": {},
|
||||
"values": [
|
||||
[
|
||||
1628065480.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065540.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065600.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065660.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065720.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065780.657,
|
||||
"3"
|
||||
],
|
||||
[
|
||||
1628065840.657,
|
||||
"3"
|
||||
]
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
* Query `http://127.0.0.1:8481/select/0/prometheus/api/v1/query_range` uses [VictoriaMetrics querying API](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) to fetch previously stored data points;
|
||||
* Argument `query=count(up{kubernetes_pod_name=~".*vmselect.*"})` specifies the query we want to execute. Specifically, we calculate the number of `vmselect` pods.
|
||||
* Additional arguments `start=-10m&step=1m'` set requested time range from -10 minutes (10 minutes ago) to now (default value if `end` argument is omitted) and step (the distance between returned data points) of 1 minute;
|
||||
* By adding `| jq` we pass the output to the jq utility which outputs information in json format
|
||||
|
||||
The expected result of the query `count(up{kubernetes_pod_name=~".*vmselect.*"})` should be equal to `3` - the number of replicas we set via `replicaCount` parameter.
|
||||
|
||||
|
||||
To test via Grafana, we need to install it first. [Install and connect Grafana to VictoriaMetrics](https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster.html#4-install-and-connect-grafana-to-victoriametrics-with-helm), login into Grafana and open the metrics [Explore](http://127.0.0.1:3000/explore) page.
|
||||
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore.png" width="800" alt="grafana explore">
|
||||
</p>
|
||||
|
||||
Choose `victoriametrics` from the list of datasources and enter `count(up{kubernetes_pod_name=~".*vmselect.*"})` to the **Metric browser** field as shown on the screenshot, then press **Run query** button:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
The expected output is:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up-graph.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
## 5. High Availability
|
||||
|
||||
To test if High Availability works, we need to shutdown one of the `vmstorages`. To do this, run the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl scale sts vmcluster-victoria-metrics-cluster-vmstorage --replicas=2
|
||||
```
|
||||
</div>
|
||||
|
||||
Verify that now we have two running `vmstorages` in the cluster by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmstorage
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
```bash
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 44m
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 43m
|
||||
```
|
||||
|
||||
Return to Grafana Explore and press the **Run query** button again.
|
||||
|
||||
The expected output is:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up-graph.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
As you can see, after we scaled down the `vmstorage` replicas number from three to two pods, metrics are still available and correct. The response is not partial as it was before scaling. Also we see that query `count(up{kubernetes_pod_name=~".*vmselect.*"})` returns the same value as before.
|
||||
|
||||
To confirm that the number of `vmstorage` pods is equivalent to two, execute the following request in Grafana Explore:
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-ha-explore-count-up-graph2.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
|
||||
## 6. Final thoughts
|
||||
|
||||
* We set up VictoriaMetrics for Kubernetes cluster with HA.
|
||||
* We collected metrics from running services and stored them in the VictoriaMetrics database.
|
||||
* We configured `dedup.minScrapeInterval` and `replicationFactor: 2` for VictoriaMetrics cluster for high availability purposes.
|
||||
* We tested and made sure that metrics are available even if one of `vmstorages` nodes was turned off.
|
||||
551
guides/k8s-monitoring-via-vm-cluster.md
Normal file
@@ -0,0 +1,551 @@
|
||||
# Kubernetes monitoring with VictoriaMetrics Cluster
|
||||
|
||||
|
||||
**This guide covers:**
|
||||
|
||||
* The setup of a [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) in [Kubernetes](https://kubernetes.io/) via Helm charts
|
||||
* How to scrape metrics from k8s components using service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com) tsdb
|
||||
|
||||
**Precondition**
|
||||
|
||||
We will use:
|
||||
* [Kubernetes cluster 1.19.9-gke.1900](https://cloud.google.com/kubernetes-engine)
|
||||
> We use GKE cluster from [GCP](https://cloud.google.com/) but this guide also applies on any Kubernetes cluster. For example [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-scheme.png" width="800" alt="VictoriaMetrics Cluster on Kubernetes cluster">
|
||||
</p>
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
> For this guide we will use Helm 3 but if you already use Helm 2 please see this [https://github.com/VictoriaMetrics/helm-charts#for-helm-v2](https://github.com/VictoriaMetrics/helm-charts#for-helm-v2)
|
||||
|
||||
You need to add the VictoriaMetrics Helm repository to install VictoriaMetrics components. We’re going to use [VictoriaMetrics Cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html). You can do this by running the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add vm https://victoriametrics.github.io/helm-charts/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Update Helm repositories:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
To verify that everything is set up correctly you may run this command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm search repo vm/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-metrics-agent 0.7.20 v1.62.0 Victoria Metrics Agent - collects metrics from ...
|
||||
vm/victoria-metrics-alert 0.3.34 v1.62.0 Victoria Metrics Alert - executes a list of giv...
|
||||
vm/victoria-metrics-auth 0.2.23 1.62.0 Victoria Metrics Auth - is a simple auth proxy ...
|
||||
vm/victoria-metrics-cluster 0.8.32 1.62.0 Victoria Metrics Cluster version - high-perform...
|
||||
vm/victoria-metrics-k8s-stack 0.2.9 1.16.0 Kubernetes monitoring on VictoriaMetrics stack....
|
||||
vm/victoria-metrics-operator 0.1.17 0.16.0 Victoria Metrics Operator
|
||||
vm/victoria-metrics-single 0.7.5 1.62.0 Victoria Metrics Single version - high-performa...
|
||||
```
|
||||
|
||||
## 2. Install VictoriaMetrics Cluster from the Helm chart
|
||||
|
||||
Run this command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install vmcluster vm/victoria-metrics-cluster -f -
|
||||
vmselect:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8481"
|
||||
|
||||
vminsert:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8480"
|
||||
|
||||
vmstorage:
|
||||
podAnnotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8482"
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
* By running `Helm install vmcluster vm/victoria-metrics-cluster` we install [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) to default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) inside your cluster.
|
||||
* By adding `podAnnotations: prometheus.io/scrape: "true"` we enable the scraping of metrics from the vmselect, vminsert and vmstorage pods.
|
||||
* By adding `podAnnotations:prometheus.io/port: "some_port" ` we enable the scraping of metrics from the vmselect, vminsert and vmstorage pods from their ports as well.
|
||||
|
||||
|
||||
As a result of this command you will see the following output:
|
||||
|
||||
```bash
|
||||
NAME: vmcluster
|
||||
LAST DEPLOYED: Thu Jul 1 09:41:57 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
Write API:
|
||||
|
||||
The Victoria Metrics write api can be accessed via port 8480 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local
|
||||
|
||||
Get the Victoria Metrics insert service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vminsert" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8480
|
||||
|
||||
You need to update your Prometheus configuration file and add the following lines to it:
|
||||
|
||||
prometheus.yml
|
||||
|
||||
remote_write:
|
||||
- url: "http://<insert-service>/insert/0/prometheus/"
|
||||
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
remote_write:
|
||||
- url: "http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/"
|
||||
Read API:
|
||||
|
||||
The VictoriaMetrics read api can be accessed via port 8481 with the following DNS name from within your cluster:
|
||||
vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local
|
||||
|
||||
Get the VictoriaMetrics select service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=vmselect" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8481
|
||||
|
||||
You will need to specify select service URL in your Grafana:
|
||||
NOTE: you need to use Prometheus Data Source
|
||||
|
||||
Input this URL field in Grafana
|
||||
|
||||
http://<select-service>/select/0/prometheus/
|
||||
|
||||
|
||||
for example - inside the Kubernetes cluster:
|
||||
|
||||
http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/"
|
||||
|
||||
```
|
||||
|
||||
For us it’s important to remember the url for the datasource (copy lines from the output).
|
||||
|
||||
Verify that [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) pods are up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
vmcluster-victoria-metrics-cluster-vminsert-689cbc8f55-95szg 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vminsert-689cbc8f55-f852l 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmselect-977d74cdf-bbgp5 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmselect-977d74cdf-vzp6z 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-0 1/1 Running 0 16m
|
||||
vmcluster-victoria-metrics-cluster-vmstorage-1 1/1 Running 0 16m
|
||||
```
|
||||
|
||||
## 3. Install vmagent from the Helm chart
|
||||
|
||||
To scrape metrics from Kubernetes with a [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) we need to install [vmagent](https://docs.victoriametrics.com/vmagent.html) with additional configuration. To do so, please run these commands in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
helm install vmagent vm/victoria-metrics-agent -f https://docs.victoriametrics.com/guides/guide-vmcluster-vmagent-values.yaml
|
||||
```
|
||||
</div>
|
||||
|
||||
Here is full file content `guide-vmcluster-vmagent-values.yaml`
|
||||
|
||||
```yaml
|
||||
remoteWriteUrls:
|
||||
- http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/
|
||||
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 10s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: vmagent
|
||||
static_configs:
|
||||
- targets: ["localhost:8429"]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [__meta_kubernetes_node_name]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
- job_name: "kubernetes-service-endpoints"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-service-endpoints-slow"
|
||||
scrape_interval: 5m
|
||||
scrape_timeout: 30s
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
|
||||
action: replace
|
||||
target_label: __scheme__
|
||||
regex: (https?)
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[
|
||||
__address__,
|
||||
__meta_kubernetes_service_annotation_prometheus_io_port,
|
||||
]
|
||||
action: replace
|
||||
target_label: __address__
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
action: replace
|
||||
target_label: kubernetes_name
|
||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
||||
action: replace
|
||||
target_label: kubernetes_node
|
||||
- job_name: "kubernetes-services"
|
||||
metrics_path: /probe
|
||||
params:
|
||||
module: [http_2xx]
|
||||
kubernetes_sd_configs:
|
||||
- role: service
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[__meta_kubernetes_service_annotation_prometheus_io_probe]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
- target_label: __address__
|
||||
replacement: blackbox
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_service_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_service_name]
|
||||
target_label: kubernetes_name
|
||||
- job_name: "kubernetes-pods"
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
relabel_configs:
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
|
||||
action: keep
|
||||
regex: true
|
||||
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
|
||||
action: replace
|
||||
target_label: __metrics_path__
|
||||
regex: (.+)
|
||||
- source_labels:
|
||||
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
|
||||
action: replace
|
||||
regex: ([^:]+)(?::\d+)?;(\d+)
|
||||
replacement: $1:$2
|
||||
target_label: __address__
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_pod_label_(.+)
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
action: replace
|
||||
target_label: kubernetes_namespace
|
||||
- source_labels: [__meta_kubernetes_pod_name]
|
||||
action: replace
|
||||
target_label: kubernetes_pod_name
|
||||
```
|
||||
|
||||
* By adding `remoteWriteUrls: - http://vmcluster-victoria-metrics-cluster-vminsert.default.svc.cluster.local:8480/insert/0/prometheus/` we configuring [vmagent](https://docs.victoriametrics.com/vmagent.html) to write scraped metrics into the `vmselect service`.
|
||||
* The second part of this yaml file is needed to add the `metric_relabel_configs` section that helps us to show Kubernetes metrics on the Grafana dashboard.
|
||||
|
||||
|
||||
Verify that `vmagent`'s pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods | grep vmagent
|
||||
```
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
vmagent-victoria-metrics-agent-69974b95b4-mhjph 1/1 Running 0 11m
|
||||
```
|
||||
|
||||
|
||||
## 4. Install and connect Grafana to VictoriaMetrics with Helm
|
||||
|
||||
Add the Grafana Helm repository.
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
</div>
|
||||
|
||||
See more information on Grafana ArtifactHUB [https://artifacthub.io/packages/helm/grafana/grafana](https://artifacthub.io/packages/helm/grafana/grafana)
|
||||
|
||||
To install the chart with the release name `my-grafana`, add the VictoriaMetrics datasource with official dashboard and the Kubernetes dashboard:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: victoriametrics
|
||||
type: prometheus
|
||||
orgId: 1
|
||||
url: http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/
|
||||
access: proxy
|
||||
isDefault: true
|
||||
updateIntervalSeconds: 10
|
||||
editable: true
|
||||
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: true
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/default
|
||||
|
||||
dashboards:
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 11176
|
||||
revision: 17
|
||||
datasource: victoriametrics
|
||||
vmagent:
|
||||
gnetId: 12683
|
||||
revision: 7
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
</div>
|
||||
|
||||
By running this command we:
|
||||
* Install Grafana from the Helm repository.
|
||||
* Provision a VictoriaMetrics data source with the url from the output above which we remembered.
|
||||
* Add this [https://grafana.com/grafana/dashboards/11176](https://grafana.com/grafana/dashboards/11176) dashboard for [VictoriaMetrics Cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
|
||||
* Add this [https://grafana.com/grafana/dashboards/12683](https://grafana.com/grafana/dashboards/12683) dashboard for [VictoriaMetrics Agent](https://docs.victoriametrics.com/vmagent.html).
|
||||
* Add this [https://grafana.com/grafana/dashboards/14205](https://grafana.com/grafana/dashboards/14205) dashboard to see Kubernetes cluster metrics.
|
||||
|
||||
|
||||
Please see the output log in your terminal. Copy, paste and run these commands.
|
||||
The first one will show `admin` password for the Grafana admin.
|
||||
The second and the third will forward Grafana to `127.0.0.1:3000`:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
```
|
||||
</div>
|
||||
|
||||
## 5. Check the result you obtained in your browser
|
||||
|
||||
To check that [VictoriaMetrics](https://victoriametrics.com) collects metrics from k8s cluster open in browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose the `Kubernetes Cluster Monitoring (via Prometheus)` dashboard. Use `admin` for login and `password` that you previously got from kubectl.
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-dashes-agent.png" width="800" alt="grafana dashboards">
|
||||
</p>
|
||||
|
||||
You will see something like this:
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-k8s-dashboard.png" width="800" alt="Kubernetes metrics provided by vmcluster">
|
||||
</p>
|
||||
|
||||
The VictoriaMetrics dashboard is also available to use:
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-grafana-dash.png" width="800" alt="VictoriaMetrics cluster dashboard">
|
||||
</p>
|
||||
|
||||
vmagent has it’s own dashboard:
|
||||
<p align="center">
|
||||
<img src="guide-vmcluster-vmagent-grafana-dash.png" width="800" alt="vmagent dashboard">
|
||||
</p>
|
||||
|
||||
## 6. Final thoughts
|
||||
|
||||
* We set up TimeSeries Database for your Kubernetes cluster.
|
||||
* We collected metrics from all running pods,nodes, … and stored them in a VictoriaMetrics database.
|
||||
* We visualized resources used in the Kubernetes cluster by using Grafana dashboards.
|
||||
351
guides/k8s-monitoring-via-vm-single.md
Normal file
@@ -0,0 +1,351 @@
|
||||
# Kubernetes monitoring via VictoriaMetrics Single
|
||||
|
||||
|
||||
**This guide covers:**
|
||||
|
||||
* The setup of a [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) in [Kubernetes](https://kubernetes.io/) via Helm charts
|
||||
* How to scrape metrics from k8s components using service discovery
|
||||
* How to visualize stored data
|
||||
* How to store metrics in [VictoriaMetrics](https://victoriametrics.com) tsdb
|
||||
|
||||
**Precondition**
|
||||
|
||||
We will use:
|
||||
* [Kubernetes cluster 1.19.9-gke.1900](https://cloud.google.com/kubernetes-engine)
|
||||
> We use GKE cluster from [GCP](https://cloud.google.com/) but this guide is also applied on any Kubernetes cluster. For example [Amazon EKS](https://aws.amazon.com/ru/eks/).
|
||||
* [Helm 3 ](https://helm.sh/docs/intro/install)
|
||||
* [kubectl 1.21](https://kubernetes.io/docs/tasks/tools/install-kubectl)
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-k8s-scheme.png" width="800" alt="VictoriaMetrics Single on Kubernetes cluster">
|
||||
</p>
|
||||
|
||||
## 1. VictoriaMetrics Helm repository
|
||||
|
||||
> For this guide we will use Helm 3 but if you already use Helm 2 please see this [https://github.com/VictoriaMetrics/helm-charts#for-helm-v2](https://github.com/VictoriaMetrics/helm-charts#for-helm-v2)
|
||||
|
||||
You need to add the VictoriaMetrics Helm repository to install VictoriaMetrics components. We’re going to use [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html). You can do this by running the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add vm https://victoriametrics.github.io/helm-charts/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Update Helm repositories:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
To verify that everything is set up correctly you may run this command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm search repo vm/
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
vm/victoria-metrics-agent 0.7.20 v1.62.0 Victoria Metrics Agent - collects metrics from ...
|
||||
vm/victoria-metrics-alert 0.3.34 v1.62.0 Victoria Metrics Alert - executes a list of giv...
|
||||
vm/victoria-metrics-auth 0.2.23 1.62.0 Victoria Metrics Auth - is a simple auth proxy ...
|
||||
vm/victoria-metrics-cluster 0.8.32 1.62.0 Victoria Metrics Cluster version - high-perform...
|
||||
vm/victoria-metrics-k8s-stack 0.2.9 1.16.0 Kubernetes monitoring on VictoriaMetrics stack....
|
||||
vm/victoria-metrics-operator 0.1.17 0.16.0 Victoria Metrics Operator
|
||||
vm/victoria-metrics-single 0.7.5 1.62.0 Victoria Metrics Single version - high-performa...
|
||||
```
|
||||
|
||||
|
||||
## 2. Install [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) from Helm Chart
|
||||
|
||||
Run this command in your terminal:
|
||||
|
||||
<div class="with-copy" markdown="1">.html
|
||||
|
||||
```yaml
|
||||
helm install vmsingle vm/victoria-metrics-single -f https://docs.victoriametrics.com/guides/guide-vmsingle-values.yaml
|
||||
```
|
||||
|
||||
Here is full file content `guide-vmsingle-values.yaml`
|
||||
|
||||
```yaml
|
||||
server:
|
||||
scrape:
|
||||
enabled: true
|
||||
configMap: ""
|
||||
config:
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
scrape_configs:
|
||||
- job_name: victoriametrics
|
||||
static_configs:
|
||||
- targets: [ "localhost:8428" ]
|
||||
- job_name: "kubernetes-apiservers"
|
||||
kubernetes_sd_configs:
|
||||
- role: endpoints
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels:
|
||||
[
|
||||
__meta_kubernetes_namespace,
|
||||
__meta_kubernetes_service_name,
|
||||
__meta_kubernetes_endpoint_port_name,
|
||||
]
|
||||
action: keep
|
||||
regex: default;kubernetes;https
|
||||
- job_name: "kubernetes-nodes"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics
|
||||
- job_name: "kubernetes-nodes-cadvisor"
|
||||
scheme: https
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
insecure_skip_verify: true
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
kubernetes_sd_configs:
|
||||
- role: node
|
||||
relabel_configs:
|
||||
- action: labelmap
|
||||
regex: __meta_kubernetes_node_label_(.+)
|
||||
- target_label: __address__
|
||||
replacement: kubernetes.default.svc:443
|
||||
- source_labels: [ __meta_kubernetes_node_name ]
|
||||
regex: (.+)
|
||||
target_label: __metrics_path__
|
||||
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
|
||||
metric_relabel_configs:
|
||||
- action: replace
|
||||
source_labels: [pod]
|
||||
regex: '(.+)'
|
||||
target_label: pod_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
source_labels: [container]
|
||||
regex: '(.+)'
|
||||
target_label: container_name
|
||||
replacement: '${1}'
|
||||
- action: replace
|
||||
target_label: name
|
||||
replacement: k8s_stub
|
||||
- action: replace
|
||||
source_labels: [id]
|
||||
regex: '^/system\.slice/(.+)\.service$'
|
||||
target_label: systemd_service_name
|
||||
replacement: '${1}'
|
||||
|
||||
```
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
* By running `helm install vmsingle vm/victoria-metrics-single` we install [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) to default [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) inside your cluster
|
||||
* By adding `scrape: enable: true` we add and enable autodiscovery scraping from kubernetes cluster to [VictoriaMetrics Single](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html)
|
||||
* On line 166 from [https://docs.victoriametrics.com/guides/guide-vmsingle-values.yaml](https://docs.victoriametrics.com/guides/guide-vmsingle-values.yaml) we added `metric_relabel_configs` section that will help us to show Kubernetes metrics on Grafana dashboard.
|
||||
|
||||
|
||||
As a result of the command you will see the following output:
|
||||
|
||||
```bash
|
||||
NAME: victoria-metrics
|
||||
LAST DEPLOYED: Fri Jun 25 12:06:13 2021
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
NOTES:
|
||||
The VictoriaMetrics write api can be accessed via port 8428 on the following DNS name from within your cluster:
|
||||
vmsingle-victoria-metrics-single-server.default.svc.cluster.local
|
||||
|
||||
|
||||
Metrics Ingestion:
|
||||
Get the Victoria Metrics service URL by running these commands in the same shell:
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app=server" -o jsonpath="{.items[0].metadata.name}")
|
||||
kubectl --namespace default port-forward $POD_NAME 8428
|
||||
|
||||
Write url inside the kubernetes cluster:
|
||||
http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428/api/v1/write
|
||||
|
||||
Metrics Scrape:
|
||||
Pull-based scrapes are enabled
|
||||
Scrape config can be displayed by running this command::
|
||||
kubectl get cm vmsingle-victoria-metrics-single-server-scrapeconfig -n default
|
||||
|
||||
The target’s information is accessible via api:
|
||||
Inside cluster:
|
||||
http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428/targets
|
||||
Outside cluster:
|
||||
You need to port-forward service (see instructions above) and call
|
||||
http://<service-host-port>/targets
|
||||
|
||||
Read Data:
|
||||
The following url can be used as the datasource url in Grafana::
|
||||
http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428
|
||||
|
||||
```
|
||||
|
||||
For us it’s important to remember the url for the datasource (copy lines from output).
|
||||
|
||||
Verify that VictoriaMetrics pod is up and running by executing the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get pods
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
The expected output is:
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
vmsingle-victoria-metrics-single-server-0 1/1 Running 0 68s
|
||||
```
|
||||
|
||||
|
||||
## 3. Install and connect Grafana to VictoriaMetrics with Helm
|
||||
|
||||
Add the Grafana Helm repository.
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
By installing the Chart with the release name `my-grafana`, you add the VictoriaMetrics datasource with official dashboard and kubernetes dashboard:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```yaml
|
||||
cat <<EOF | helm install my-grafana grafana/grafana -f -
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: victoriametrics
|
||||
type: prometheus
|
||||
orgId: 1
|
||||
url: http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428
|
||||
access: proxy
|
||||
isDefault: true
|
||||
updateIntervalSeconds: 10
|
||||
editable: true
|
||||
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: true
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/default
|
||||
|
||||
dashboards:
|
||||
default:
|
||||
victoriametrics:
|
||||
gnetId: 10229
|
||||
revision: 22
|
||||
datasource: victoriametrics
|
||||
kubernetes:
|
||||
gnetId: 14205
|
||||
revision: 1
|
||||
datasource: victoriametrics
|
||||
EOF
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
By running this command we:
|
||||
* Install Grafana from Helm repository.
|
||||
* Provision VictoriaMetrics datasource with the url from the output above which we copied before.
|
||||
* Add this [https://grafana.com/grafana/dashboards/10229](https://grafana.com/grafana/dashboards/10229) dashboard for VictoriaMetrics.
|
||||
* Add this [https://grafana.com/grafana/dashboards/14205](https://grafana.com/grafana/dashboards/14205) dashboard to see Kubernetes cluster metrics.
|
||||
|
||||
|
||||
Check the output log in your terminal.
|
||||
To see the password for Grafana `admin` user use the following command:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Expose Grafana service on `127.0.0.1:3000`:
|
||||
|
||||
<div class="with-copy" markdown="1">
|
||||
|
||||
```bash
|
||||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
|
||||
|
||||
kubectl --namespace default port-forward $POD_NAME 3000
|
||||
```
|
||||
|
||||
</div>
|
||||
|
||||
Now Grafana should be accessible on the [http://127.0.0.1:3000](http://127.0.0.1:3000) address.
|
||||
|
||||
|
||||
## 4. Check the obtained result in your browser
|
||||
|
||||
To check that VictoriaMetrics has collects metrics from the k8s cluster open in browser [http://127.0.0.1:3000/dashboards](http://127.0.0.1:3000/dashboards) and choose `Kubernetes Cluster Monitoring (via Prometheus)` dashboard. Use `admin` for login and `password` that you previously obtained from kubectl.
|
||||
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-grafana-dashboards.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
You will see something like this:
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-grafana-k8s-dashboard.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
VictoriaMetrics dashboard also available to use:
|
||||
<p align="center">
|
||||
<img src="guide-vmsingle-grafana.png" width="800" alt="">
|
||||
</p>
|
||||
|
||||
## 5. Final thoughts
|
||||
|
||||
* We have set up TimeSeries Database for your k8s cluster.
|
||||
* Collected metrics from all running pods,nodes, … and store them in VictoriaMetrics database.
|
||||
* Visualize resources used in Kubernetes cluster by Grafana dashboards.
|
||||
BIN
logo.png
Normal file
|
After Width: | Height: | Size: 15 KiB |
19
operator/README.md
Normal file
@@ -0,0 +1,19 @@
|
||||
---
|
||||
sort: 22
|
||||
---
|
||||
|
||||
# VictoriaMetrics Operator
|
||||
|
||||
1. [VictoriaMetrics Operator](VictoriaMetrics-Operator.html)
|
||||
2. [Additional Scrape Configuration](additional-scrape.html)
|
||||
3. [API Docs](api.html)
|
||||
4. [Authorization and exposing components](auth.html)
|
||||
5. [vmbackupmanager](backups.html)
|
||||
6. [Design](design.html)
|
||||
7. [High Availability](high-availability.html)
|
||||
8. [VMAlert, VMAgent, VMAlertmanager, VMSingle version](managing-versions.html)
|
||||
9. [Victoria Metrics Operator Quick Start](quick-start.html)
|
||||
10. [VMAgent relabel](relabeling.html)
|
||||
11. [CRD Validation](resources-validation.html)
|
||||
12. [Security](security.html)
|
||||
13. [Auto Generated vars for package config](vars.html)
|
||||
79
operator/VictoriaMetrics-Operator.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
sort: 1
|
||||
---
|
||||
|
||||
# VictoriaMetrics operator
|
||||
|
||||
## Overview
|
||||
|
||||
Design and implementation inspired by [prometheus-operator](https://github.com/prometheus-operator/prometheus-operator). It's great a tool for managing monitoring configuration of your applications. VictoriaMetrics operator has api capability with it.
|
||||
So you can use familiar CRD objects: `ServiceMonitor`, `PodMonitor`, `PrometheusRule` and `Probe`. Or you can use VictoriaMetrics CRDs:
|
||||
- `VMServiceScrape` - defines scraping metrics configuration from pods backed by services.
|
||||
- `VMPodScrape` - defines scraping metrics configuration from pods.
|
||||
- `VMRule` - defines alerting or recording rules.
|
||||
- `VMProbe` - defines a probing configuration for targets with blackbox exporter.
|
||||
|
||||
Besides, operator allows you to manage VictoriaMetrics applications inside kubernetes cluster and simplifies this process [quick-start](/Operator/quick-start.html)
|
||||
With CRD (Custom Resource Definition) you can define application configuration and apply it to your cluster [crd-objects](/Operator/api.html).
|
||||
|
||||
Operator simplifies VictoriaMetrics cluster installation, upgrading and managing.
|
||||
|
||||
It has integration with VictoriaMetrics `vmbackupmanager` - advanced tools for making backups. Check backup [docs](/Operator/backups.html)
|
||||
|
||||
## Use cases
|
||||
|
||||
For kubernetes-cluster administrators, it simplifies installation, configuration, management for `VictoriaMetrics` application. And the main feature of operator - is ability to delegate applications monitoring configuration to the end-users.
|
||||
|
||||
For applications developers, its great possibility for managing observability of applications. You can define metrics scraping and alerting configuration for your application and manage it with an application deployment process. Just define app_deployment.yaml, app_vmpodscrape.yaml and app_vmrule.yaml. That's it, you can apply it to a kubernetes cluster. Check [quick-start](/Operator/quick-start.html) for an example.
|
||||
|
||||
## Operator vs helm-chart
|
||||
|
||||
VictoriaMetrics provides [helm charts](https://github.com/VictoriaMetrics/helm-charts). Operator makes the same, simplifies it and provides advanced features.
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
Operator configured by env variables, list of it can be found at [link](/vars.html)
|
||||
|
||||
It defines default configuration options, like images for components, timeouts, features.
|
||||
|
||||
|
||||
## Kubernetes' compatibility versions
|
||||
|
||||
operator tested at kubernetes versions
|
||||
from 1.16 to 1.22
|
||||
|
||||
For clusters version below 1.16 you must use legacy CRDs from [path](config/crd/legacy)
|
||||
and disable CRD controller with flag: `--controller.disableCRDOwnership=true`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- cannot apply crd at kubernetes 1.18 + version and kubectl reports error:
|
||||
```bash
|
||||
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmalertmanagers.operator.victoriametrics.com" is invalid: [spec.validation.openAPIV3Schema.properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property, spec.validation.openAPIV3Schema.properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property]
|
||||
Error from server (Invalid): error when creating "release/crds/crd.yaml": CustomResourceDefinition.apiextensions.k8s.io "vmalerts.operator.victoriametrics.com" is invalid: [
|
||||
```
|
||||
upgrade to the latest release version. There is a bug with kubernetes objects at the early releases.
|
||||
|
||||
|
||||
## Development
|
||||
|
||||
- operator-sdk verson v1.0.0 + [https://github.com/operator-framework/operator-sdk]
|
||||
- golang 1.15 +
|
||||
- minikube or kind
|
||||
|
||||
start:
|
||||
```bash
|
||||
make run
|
||||
```
|
||||
|
||||
for test execution run:
|
||||
```bash
|
||||
#unit tests
|
||||
|
||||
make test
|
||||
|
||||
# you need minikube or kind for e2e, do not run it on live cluster
|
||||
#e2e tests with local binary
|
||||
make e2e-local
|
||||
```
|
||||
85
operator/additional-scrape.MD
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
sort: 2
|
||||
---
|
||||
|
||||
# Additional Scrape Configuration
|
||||
|
||||
AdditionalScrapeConfigs allows specifying a key of a Secret containing
|
||||
additional Prometheus scrape configurations or define scrape configuration at CRD spec.
|
||||
Scrape configurations specified
|
||||
are appended to the configurations generated by the operator.
|
||||
|
||||
Job configurations specified must have the form as specified in the official
|
||||
[Prometheus documentation](
|
||||
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
|
||||
As scrape configs are appended, the user is responsible to make sure it is
|
||||
valid.
|
||||
|
||||
## Creating an additional configuration inline at CRD
|
||||
|
||||
Add needed scrape configuration directly to the vmagent spec.inlineScrapeConfig
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeSelector: {}
|
||||
replicas: 1
|
||||
serviceAccountName: vmagent
|
||||
inlineScrapeConfig: |
|
||||
- job_name: "prometheus"
|
||||
static_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
remoteWrite:
|
||||
- url: "http://vmagent-example-vmsingle.default.svc:8429/api/v1/write"
|
||||
EOF
|
||||
```
|
||||
|
||||
NOTE: Do not use password and tokens with inlineScrapeConfig.
|
||||
|
||||
|
||||
## Creating an additional configuration with secret
|
||||
|
||||
First, you will need to create the additional configuration.
|
||||
Below we are making a simple "prometheus" config. Name this
|
||||
`prometheus-additional.yaml` or something similar.
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: additional-scrape-configs
|
||||
stringData:
|
||||
prometheus-additional.yaml: |
|
||||
- job_name: "prometheus"
|
||||
static_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
EOF
|
||||
```
|
||||
|
||||
Finally, reference this additional configuration in your `vmagent.yaml` CRD.
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeSelector: {}
|
||||
replicas: 1
|
||||
serviceAccountName: vmagent
|
||||
additionalScrapeConfigs:
|
||||
name: additional-scrape-configs
|
||||
key: prometheus-additional.yaml
|
||||
remoteWrite:
|
||||
- url: "http://vmagent-example-vmsingle.default.svc:8429/api/v1/write"
|
||||
EOF
|
||||
```
|
||||
|
||||
NOTE: Use only one secret for ALL additional scrape configurations.
|
||||
|
||||
2004
operator/api.MD
Normal file
182
operator/auth.MD
Normal file
@@ -0,0 +1,182 @@
|
||||
---
|
||||
sort: 4
|
||||
---
|
||||
|
||||
# Authorization and exposing components
|
||||
|
||||
## Exposing components
|
||||
|
||||
|
||||
CRD objects doesn't have `ingress` configuration. Instead, you can use `VMAuth` as proxy between ingress-controller and VM app components.
|
||||
It adds missing authorization and access control features and enforces it.
|
||||
|
||||
Access can be given with `VMUser` definition. It supports basic auth and bearer token authentication.
|
||||
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAuth
|
||||
metadata:
|
||||
name: main-router
|
||||
spec:
|
||||
userNamespaceSelector: {}
|
||||
userSelector: {}
|
||||
ingress: {}
|
||||
EOF
|
||||
```
|
||||
|
||||
Advanced configuration with cert-manager annotations:
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAuth
|
||||
metadata:
|
||||
name: router-main
|
||||
spec:
|
||||
podMetadata:
|
||||
labels:
|
||||
component: vmauth
|
||||
userSelector: {}
|
||||
userNamespaceSelector: {}
|
||||
replicaCount: 2
|
||||
resources:
|
||||
requests:
|
||||
cpu: "250m"
|
||||
memory: "350Mi"
|
||||
limits:
|
||||
cpu: "500m"
|
||||
memory: "850Mi"
|
||||
ingress:
|
||||
tlsSecretName: vmauth-tls
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: base
|
||||
class_name: nginx
|
||||
tlsHosts:
|
||||
- vm-access.example.com
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
simple static routing with read-only access to vmagent for username - `user-1` with password `Asafs124142`
|
||||
```yaml
|
||||
# curl vmauth:8427/metrics -u 'user-1:Asafs124142'
|
||||
cat << EOF | kubectl apply -f
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMUser
|
||||
metadata:
|
||||
name: user-1
|
||||
spec:
|
||||
password: Asafs124142
|
||||
targetRefs:
|
||||
- static:
|
||||
url: http://vmagent-base.default.svc:8429
|
||||
paths: ["/targets/api/v1","/targets","/metrics"]
|
||||
EOF
|
||||
```
|
||||
|
||||
With bearer token access:
|
||||
|
||||
```yaml
|
||||
# curl vmauth:8427/metrics -H 'Authorization: Bearer Asafs124142'
|
||||
cat << EOF | kubectl apply -f
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMUser
|
||||
metadata:
|
||||
name: user-2
|
||||
spec:
|
||||
bearerToken: Asafs124142
|
||||
targetRefs:
|
||||
- static:
|
||||
url: http://vmagent-base.default.svc:8429
|
||||
paths: ["/targets/api/v1","/targets","/metrics"]
|
||||
EOF
|
||||
```
|
||||
|
||||
It's also possible to use service discovery for objects:
|
||||
```yaml
|
||||
# curl vmauth:8427/metrics -H 'Authorization: Bearer Asafs124142'
|
||||
cat << EOF | kubectl apply -f
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMUser
|
||||
metadata:
|
||||
name: user-3
|
||||
spec:
|
||||
bearerToken: Asafs124142
|
||||
targetRefs:
|
||||
- crd:
|
||||
kind: VMAgent
|
||||
name: base
|
||||
namespace: default
|
||||
paths: ["/targets/api/v1","/targets","/metrics"]
|
||||
EOF
|
||||
```
|
||||
|
||||
Cluster components supports auto path generation for single tenant view:
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMUser
|
||||
metadata:
|
||||
name: vmuser-tenant-1
|
||||
spec:
|
||||
bearerToken: some-token
|
||||
targetRefs:
|
||||
- crd:
|
||||
kind: VMCluster/vminsert
|
||||
name: test-persistent
|
||||
namespace: default
|
||||
target_path_suffix: "/insert/1"
|
||||
- crd:
|
||||
kind: VMCluster/vmselect
|
||||
name: test-persistent
|
||||
namespace: default
|
||||
target_path_suffix: "/select/1"
|
||||
- static:
|
||||
url: http://vmselect-test-persistent.default.svc:8481/
|
||||
paths:
|
||||
- /internal/resetRollupResultCache
|
||||
EOF
|
||||
```
|
||||
|
||||
For each `VMUser` operator generates corresponding secret with username/password or bearer token at the same namespace as `VMUser`.
|
||||
|
||||
## Basic auth for targets
|
||||
|
||||
To authenticate a `VMServiceScrape`s over a metrics endpoint use [`basicAuth`](../api.html#basicauth)
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMServiceScrape
|
||||
metadata:
|
||||
labels:
|
||||
k8s-apps: basic-auth-example
|
||||
name: basic-auth-example
|
||||
spec:
|
||||
endpoints:
|
||||
- basicAuth:
|
||||
password:
|
||||
name: basic-auth
|
||||
key: password
|
||||
username:
|
||||
name: basic-auth
|
||||
key: user
|
||||
port: metrics
|
||||
selector:
|
||||
matchLabels:
|
||||
app: myapp
|
||||
EOF
|
||||
```
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: basic-auth
|
||||
data:
|
||||
password: dG9vcg== # toor
|
||||
user: YWRtaW4= # admin
|
||||
type: Opaque
|
||||
EOF
|
||||
```
|
||||
136
operator/backups.MD
Normal file
@@ -0,0 +1,136 @@
|
||||
---
|
||||
sort: 5
|
||||
---
|
||||
|
||||
# Backups
|
||||
|
||||
## vmbackupmanager
|
||||
|
||||
## vmbackupmanager is proprietary software.
|
||||
|
||||
Before using it, you must have signed contract and accept EULA https://victoriametrics.com/assets/VM_EULA.pdf
|
||||
|
||||
## Usage examples
|
||||
|
||||
`VMSingle` and `VMCluster` has built-in backup configuration, it uses `vmbackupmanager` - proprietary tool for backups.
|
||||
It supports incremental backups (hours, daily, etc) with popular object storages (aws s3, google cloud storage).
|
||||
|
||||
You can enable it with the simple configuration, define secret
|
||||
|
||||
```yaml
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: remote-storage-keys
|
||||
type: Opaque
|
||||
stringData:
|
||||
credentials: |-
|
||||
[default]
|
||||
aws_access_key_id = your_access_key_id
|
||||
aws_secret_access_key = your_secret_access_key
|
||||
---
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMSingle
|
||||
metadata:
|
||||
name: example-vmsingle
|
||||
spec:
|
||||
# Add fields here
|
||||
retentionPeriod: "1"
|
||||
vmBackup:
|
||||
# This is Enterprise Package feature you need to have signed contract to use it
|
||||
# and accept the EULA https://victoriametrics.com/assets/VM_EULA.pdf
|
||||
acceptEULA: true
|
||||
destination: "s3://your_bucket/folder"
|
||||
credentialsSecret:
|
||||
name: remote-storage-keys
|
||||
key: credentials
|
||||
---
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMCluster
|
||||
metadata:
|
||||
name: example-vmcluster-persistent
|
||||
spec:
|
||||
retentionPeriod: "4"
|
||||
replicationFactor: 2
|
||||
vmstorage:
|
||||
replicaCount: 2
|
||||
vmBackup:
|
||||
# This is Enterprise Package feature you need to have signed contract to use it
|
||||
# and accept the EULA https://victoriametrics.com/assets/VM_EULA.pdf
|
||||
acceptEULA: true
|
||||
destination: "s3://your_bucket/folder"
|
||||
credentialsSecret:
|
||||
name: remote-storage-keys
|
||||
key: credentials
|
||||
|
||||
```
|
||||
|
||||
NOTE: for cluster version operator adds suffix for `destination: "s3://your_bucket/folder"`, it becomes `"s3://your_bucket/folder/$(POD_NAME)"`.
|
||||
It's needed to make consistent backups for each storage node.
|
||||
|
||||
You can read more about backup configuration options and mechanics [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmbackup)
|
||||
|
||||
Possible configuration options for backup crd can be found at [link](/docs/api.html#vmbackup)
|
||||
|
||||
|
||||
## Restoring backups
|
||||
|
||||
|
||||
It can be done with [vmrestore](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmrestore)
|
||||
|
||||
There two ways:
|
||||
|
||||
First:
|
||||
You have to stop `VMSingle` by scaling it replicas to zero and manually restore data to the database directory.
|
||||
|
||||
Steps:
|
||||
1) edit `VMSingle` CRD, set replicaCount: 0
|
||||
2) wait until database stops
|
||||
3) ssh to some server, where you can mount `VMSingle` disk and mount it manually
|
||||
4) restore files with `vmrestore`
|
||||
5) umount disk
|
||||
6) edit `VMSingle` CRD, set replicaCount: 1
|
||||
7) wait database start
|
||||
|
||||
Second:
|
||||
|
||||
1) add init container with vmrestore command to `VMSingle` CRD, example:
|
||||
```yaml
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMSingle
|
||||
metadata:
|
||||
name: vmsingle-restored
|
||||
namespace: monitoring-system
|
||||
spec:
|
||||
initContainers:
|
||||
- name: vmrestore
|
||||
image: victoriametrics/vmrestore:latest
|
||||
volumeMounts:
|
||||
- mountPath: /victoria-metrics-data
|
||||
name: data
|
||||
- mountPath: /etc/vm/creds
|
||||
name: secret-remote-storage-keys
|
||||
readOnly: true
|
||||
args:
|
||||
- -storageDataPath=/victoria-metrics-data
|
||||
- -src=s3://your_bucket/folder/latest
|
||||
- -credsFilePath=/etc/vm/creds/credentials
|
||||
vmBackup:
|
||||
# This is Enterprise Package feature you need to have signed contract to use it
|
||||
# and accept the EULA https://victoriametrics.com/assets/VM_EULA.pdf
|
||||
acceptEULA: true
|
||||
destination: "s3://your_bucket/folder"
|
||||
extraArgs:
|
||||
runOnStart: "true"
|
||||
image:
|
||||
repository: victoriametrics/vmbackupmanager
|
||||
tag: v1.67.0-enterprise
|
||||
credentialsSecret:
|
||||
name: remote-storage-keys
|
||||
key: credentials
|
||||
|
||||
```
|
||||
2) apply it, and db will be restored from s3
|
||||
|
||||
3) remove initContainers and apply crd.
|
||||
229
operator/design.MD
Normal file
@@ -0,0 +1,229 @@
|
||||
---
|
||||
sort: 6
|
||||
---
|
||||
|
||||
# Design
|
||||
|
||||
This document describes the design and interaction between the custom resource definitions (CRD) that the Victoria
|
||||
Metrics Operator introduces.
|
||||
|
||||
Operator introduces the following custom resources:
|
||||
|
||||
* [VMSingle](#vmsingle)
|
||||
* [VMCluster](#vmcluster)
|
||||
* [VMAgent](#vmagent)
|
||||
* [VMAlert](#vmalert)
|
||||
* [VMServiceScrape](#vmservicescrape)
|
||||
* [VMPodScrape](#vmpodscrape)
|
||||
* [VMAlertmanager](#vmalertmanager)
|
||||
* [VMAlertmanagerConfig](#vmalertmanagerconfig)
|
||||
* [VMRule](#vmrule)
|
||||
* [VMProbe](#vmprobe)
|
||||
* [VMNodeScrape](#vmodescrape)
|
||||
* [VMStaticScrape](#vmstaticscrape)
|
||||
* [VMAuth](#vmauth)
|
||||
* [VMUser](#vmuser)
|
||||
|
||||
## VMSingle
|
||||
|
||||
The `VMSingle` CRD declaratively defines a [single-node VM](https://github.com/VictoriaMetrics/VictoriaMetrics)
|
||||
installation to run in a Kubernetes cluster.
|
||||
|
||||
For each `VMSingle` resource, the Operator deploys a properly configured `Deployment` in the same namespace.
|
||||
The VMSingle `Pod`s are configured to mount an empty dir or `PersistentVolumeClaimSpec` for storing data.
|
||||
Deployment update strategy set to [recreate](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment).
|
||||
No more than one replica allowed.
|
||||
|
||||
For each `VMSingle` resource, the Operator adds `Service` and `VMServiceScrape` in the same namespace prefixed with
|
||||
name `<VMSingle-name>`.
|
||||
|
||||
## VMCluster
|
||||
|
||||
The `VMCluster` CRD defines a [cluster version VM](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
|
||||
|
||||
For each `VMCluster` resource, the Operator creates `VMStorage` as `StatefulSet`, `VMSelect` as `StatefulSet` and `VMInsert`
|
||||
as deployment. For `VMStorage` and `VMSelect` headless services are created. `VMInsert` is created as service with clusterIP.
|
||||
|
||||
There is a strict order for these objects creation and reconciliation:
|
||||
1. `VMStorage` is synced - the Operator waits until all its pods are ready;
|
||||
2. Then it syncs `VMSelect` with the same manner;
|
||||
3. `VMInsert` is the last object to sync.
|
||||
|
||||
All statefulsets are created with [OnDelete](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#on-delete)
|
||||
update type. It allows to manually manage the rolling update process for Operator by deleting pods one by one and waiting
|
||||
for the ready status.
|
||||
|
||||
Rolling update process may be configured by the operator env variables.
|
||||
The most important is `VM_PODWAITREADYTIMEOUT=80s` - it controls how long to wait for pod's ready status.
|
||||
|
||||
## VMAgent
|
||||
|
||||
The `VMAgent` CRD declaratively defines a desired [VMAgent](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmagent)
|
||||
setup to run in a Kubernetes cluster.
|
||||
|
||||
For each `VMAgent` resource Operator deploys a properly configured `Deployment` in the same namespace.
|
||||
The VMAgent `Pod`s are configured to mount a `Secret` prefixed with `<VMAgent-name>` containing the configuration
|
||||
for VMAgent.
|
||||
|
||||
For each `VMAgent` resource, the Operator adds `Service` and `VMServiceScrape` in the same namespace prefixed with
|
||||
name `<VMAgent-name>`.
|
||||
|
||||
The CRD specifies which `VMServiceScrape` should be covered by the deployed VMAgent instances based on label selection.
|
||||
The Operator then generates a configuration based on the included `VMServiceScrape`s and updates the `Secret` which
|
||||
contains the configuration. It continuously does so for all changes that are made to the `VMServiceScrape`s or the
|
||||
`VMAgent` resource itself.
|
||||
|
||||
If no selection of `VMServiceScrape`s is provided - Operator leaves management of the `Secret` to the user,
|
||||
so user can set custom configuration while still benefiting from the Operator's capabilities of managing VMAgent setups.
|
||||
|
||||
## VMAlert
|
||||
|
||||
The `VMAlert` CRD declaratively defines a desired [VMAlert](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmalert)
|
||||
setup to run in a Kubernetes cluster.
|
||||
|
||||
For each `VMAlert` resource, the Operator deploys a properly configured `Deployment` in the same namespace.
|
||||
The VMAlert `Pod`s are configured to mount a list of `Configmaps` prefixed with `<VMAlert-name>-number` containing
|
||||
the configuration for alerting rules.
|
||||
|
||||
For each `VMAlert` resource, the Operator adds `Service` and `VMServiceScrape` in the same namespace prefixed with
|
||||
name `<VMAlert-name>`.
|
||||
|
||||
The CRD specifies which `VMRule`s should be covered by the deployed VMAlert instances based on label selection.
|
||||
The Operator then generates a configuration based on the included `VMRule`s and updates the `Configmaps` containing
|
||||
the configuration. It continuously does so for all changes that are made to `VMRule`s or to the `VMAlert` resource itself.
|
||||
|
||||
Alerting rules are filtered by selector `ruleNamespaceSelector` in `VMAlert` CRD definition. For selecting rules from all
|
||||
namespaces you must specify it to empty value:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
ruleNamespaceSelector: {}
|
||||
```
|
||||
|
||||
|
||||
## VMServiceScrape
|
||||
|
||||
The `VMServiceScrape` CRD allows to define a dynamic set of services for monitoring. Services
|
||||
and scraping configurations can be matched via label selections. This allows an organization to introduce conventions
|
||||
for how metrics should be exposed. Following these conventions new services will be discovered automatically without
|
||||
need to reconfigure.
|
||||
|
||||
Monitoring configuration based on `discoveryRole` setting. By default, `endpoints` is used to get objects from kubernetes api.
|
||||
Its also possible to use `discoveryRole: service` or `discoveryRole: endpointslices`
|
||||
|
||||
`Endpoints` objects are essentially lists of IP addresses.
|
||||
Typically, `Endpoints` objects are populated by `Service` object. `Service` object discovers `Pod`s by a label
|
||||
selector and adds those to the `Endpoints` object.
|
||||
|
||||
A `Service` may expose one or more service ports backed by a list of one or multiple endpoints pointing to
|
||||
specific `Pod`s. The same reflected in the respective `Endpoints` object as well.
|
||||
|
||||
The `VMServiceScrape` object discovers `Endpoints` objects and configures VMAgent to monitor `Pod`s.
|
||||
|
||||
The `Endpoints` section of the `VMServiceScrapeSpec` is used to configure which `Endpoints` ports should be scraped.
|
||||
For advanced use cases, one may want to monitor ports of backing `Pod`s, which are not a part of the service endpoints.
|
||||
Therefore, when specifying an endpoint in the `endpoints` section, they are strictly used.
|
||||
|
||||
> Note: `endpoints` (lowercase) is the field in the `VMServiceScrape` CRD, while `Endpoints` (capitalized) is the Kubernetes object kind.
|
||||
|
||||
Both `VMServiceScrape` and discovered targets may belong to any namespace. It is important for cross-namespace monitoring
|
||||
use cases, e.g. for meta-monitoring. Using the `serviceScrapeSelector` of the `VMAgentSpec`
|
||||
one can restrict the namespaces from which `VMServiceScrape`s are selected from by the respective VMAgent server.
|
||||
Using the `namespaceSelector` of the `VMServiceScrape` one can restrict the namespaces from which `Endpoints` can be
|
||||
discovered from. To discover targets in all namespaces the `namespaceSelector` has to be empty:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
namespaceSelector: {}
|
||||
```
|
||||
|
||||
## VMPodScrape
|
||||
|
||||
The `VMPodScrape` CRD allows to declaratively define how a dynamic set of pods should be monitored.
|
||||
Use label selections to match pods for scraping. This allows an organization to introduce conventions
|
||||
for how metrics should be exposed. Following these conventions new services will be discovered automatically without
|
||||
need to reconfigure.
|
||||
|
||||
A `Pod` is a collection of one or more containers which can expose Prometheus metrics on a number of ports.
|
||||
|
||||
The `VMPodScrape` object discovers pods and generates the relevant scraping configuration.
|
||||
|
||||
The `PodMetricsEndpoints` section of the `VMPodScrapeSpec` is used to configure which ports of a pod are going to be
|
||||
scraped for metrics and with which parameters.
|
||||
|
||||
Both `VMPodScrapes` and discovered targets may belong to any namespace. It is important for cross-namespace monitoring
|
||||
use cases, e.g. for meta-monitoring. Using the `namespaceSelector` of the `VMPodScrapeSpec` one can restrict the
|
||||
namespaces from which `Pods` are discovered from. To discover targets in all namespaces the `namespaceSelector` has to
|
||||
be empty:
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
namespaceSelector:
|
||||
any: true
|
||||
```
|
||||
|
||||
## VMAlertmanager
|
||||
|
||||
The `VMAlertmanager` CRD declaratively defines a desired Alertmanager setup to run in a Kubernetes cluster.
|
||||
It provides options to configure replication and persistent storage.
|
||||
|
||||
For each `Alertmanager` resource, the Operator deploys a properly configured `StatefulSet` in the same namespace.
|
||||
The Alertmanager pods are configured to include a `Secret` called `<alertmanager-name>` which holds the used
|
||||
configuration file in the key `alertmanager.yaml`.
|
||||
|
||||
When there are two or more configured replicas the Operator runs the Alertmanager instances in high availability mode.
|
||||
|
||||
## VMAlertmanagerConfig
|
||||
|
||||
The `VMAlertmanagerConfig` provides way to configure `VMAlertmanager` configuration with CRD. It allows to define different configuration parts,
|
||||
which will be merged by operator into config. It behaves like other config parts - `VMServiceScrape` and etc.
|
||||
|
||||
## VMRule
|
||||
|
||||
The `VMRule` CRD declaratively defines a desired Prometheus rule to be consumed by one or more VMAlert instances.
|
||||
|
||||
Alerts and recording rules can be saved and applied as YAML files, and dynamically loaded without requiring any restart.
|
||||
|
||||
|
||||
## VMPrometheusConverter
|
||||
|
||||
By default, the Operator converts and updates existing prometheus-operator API objects:
|
||||
|
||||
`ServiceMonitor` into `VMServiceScrape`
|
||||
`PodMonitor` into `VMPodScrape`
|
||||
`PrometheusRule` into `VMRule`
|
||||
`Probe` into `VMProbe`
|
||||
Removing prometheus-operator API objects wouldn't delete any converted objects. So you can safely migrate or run
|
||||
two operators at the same time.
|
||||
|
||||
|
||||
## VMProbe
|
||||
|
||||
The `VMProbe` CRD provides probing target ability with a prober. The most common prober is [blackbox exporter](https://github.com/prometheus/blackbox_exporter).
|
||||
By specifying configuration at CRD, operator generates config for `VMAgent` and syncs it. Its possible to use static targets
|
||||
or use standard k8s discovery mechanism with `Ingress`.
|
||||
You have to configure blackbox exporter before you can use this feature. The second requirement is `VMAgent` selectors,
|
||||
it must match your `VMProbe` by label or namespace selector.
|
||||
|
||||
## VMNodeScrape
|
||||
|
||||
The `VMNodeScrape` CRD provides discovery mechanism for scraping metrics kubernetes nodes.
|
||||
By specifying configuration at CRD, operator generates config for `VMAgent` and syncs it. Its useful for cadvisor scraping,
|
||||
node-exporter or other node-based exporters. `VMAgent` nodeScrapeSelector must match `VMNodeScrape` labels.
|
||||
|
||||
## VMStaticScrape
|
||||
|
||||
The `VMStaticScrape` CRD provides mechanism for scraping metrics from static targets, configured by CRD targets.
|
||||
By specifying configuration at CRD, operator generates config for `VMAgent` and syncs it. It's useful for external targets management,
|
||||
when service-discovery is not available. `VMAgent` staticScrapeSelector must match `VMStaticScrape` labels.
|
||||
|
||||
## VMAuth
|
||||
|
||||
The `VMAuth` CRD provides mechanism for exposing application with authorization to outside world or to other applications inside kubernetes cluster.
|
||||
For first case, user can configure `ingress` setting at `VMAuth` CRD. For second one, operator will create secret with `username` and `password` at `VMUser` CRD name.
|
||||
So it will be possible to access this credentials from any application by targeting corresponding kubernetes secret.
|
||||
|
||||
## VMUser
|
||||
|
||||
The `VMUser` CRD describes user configuration, its authentication methods `basic auth` or `Authorization` header. User access permissions, with possible routing information.
|
||||
User can define routing target with `static` config, by entering target `url`, or with `CRDRef`, in this case, operator queries kubernetes API, retrieves information about CRD and builds proper url.
|
||||
322
operator/high-availability.MD
Normal file
@@ -0,0 +1,322 @@
|
||||
---
|
||||
sort: 7
|
||||
---
|
||||
|
||||
# High Availability
|
||||
|
||||
High availability is not only important for customer-facing software but if the monitoring infrastructure is not highly available, then there is a risk that operations people are not notified for alerts. Therefore high availability must be just as thought through for the monitoring stack, as for anything else.
|
||||
|
||||
## VMAgent
|
||||
|
||||
To run VMAgent in a highly available manner you have to configure deduplication at Victoria Metrics first [doc](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/docs/Single-server-VictoriaMetrics.md#deduplication)
|
||||
|
||||
Then increase replicas for VMAgent.
|
||||
|
||||
create `VMSingle` with dedup flag
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMSingle
|
||||
metadata:
|
||||
name: example-vmsingle-persisted
|
||||
spec:
|
||||
retentionPeriod: "1"
|
||||
extraArgs:
|
||||
dedup.minScrapeInterval: 60s
|
||||
EOF
|
||||
```
|
||||
create `VMAgent` with 2 replicas
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
scrapeInterval: 60s
|
||||
vmAgentExternalLabelName: vmagent-ha
|
||||
replicaCount: 2
|
||||
remoteWrite:
|
||||
- url: "http://vmsingle-example-vmsingle-persisted.default.svc:8429/api/v1/write"
|
||||
EOF
|
||||
|
||||
```
|
||||
|
||||
Sharding for `VMAgent` distributes scraping between multiple deployments of `VMAgent`.
|
||||
more info https://victoriametrics.github.io/vmagent.html#scraping-big-number-of-targets
|
||||
|
||||
Example usage:
|
||||
```yaml
|
||||
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
scrapeInterval: 60s
|
||||
vmAgentExternalLabelName: vmagent-ha
|
||||
shardCount: 5
|
||||
replicaCount: 2
|
||||
remoteWrite:
|
||||
- url: "http://vmsingle-example-vmsingle-persisted.default.svc:8429/api/v1/write"
|
||||
EOF
|
||||
```
|
||||
|
||||
This configuration produces 5 deployments with 2 replicas at each. Each deployment has own shard num
|
||||
and scrapes only 1/5 of all targets.
|
||||
|
||||
|
||||
## VMAlert
|
||||
|
||||
It can be launched with multiple replicas without an additional configuration, alertmanager is responsible for alert deduplication.
|
||||
Note, if you want to use `VMAlert` with high-available `VMAlertmanager`, which has more then 1 replica. You have to specify all pod fqdns
|
||||
at `VMAlert.spec.notifiers.[url]`. Or you can use service discovery for notifier, examples:
|
||||
|
||||
alertmanager:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: vmalertmanager-example-alertmanager
|
||||
labels:
|
||||
app: vm-operator
|
||||
type: Opaque
|
||||
stringData:
|
||||
alertmanager.yaml: |
|
||||
global:
|
||||
resolve_timeout: 5m
|
||||
route:
|
||||
group_by: ['job']
|
||||
group_wait: 30s
|
||||
group_interval: 5m
|
||||
repeat_interval: 12h
|
||||
receiver: 'webhook'
|
||||
receivers:
|
||||
- name: 'webhook'
|
||||
webhook_configs:
|
||||
- url: 'http://alertmanagerwh:30500/'
|
||||
|
||||
---
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAlertmanager
|
||||
metadata:
|
||||
name: example
|
||||
namespace: default
|
||||
labels:
|
||||
usage: dedicated
|
||||
spec:
|
||||
replicaCount: 2
|
||||
configSecret: vmalertmanager-example-alertmanager
|
||||
configSelector: {}
|
||||
configNamespaceSelector: {}
|
||||
```
|
||||
vmalert with fqdns:
|
||||
```yaml
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAlert
|
||||
metadata:
|
||||
name: example-ha
|
||||
namespace: default
|
||||
spec:
|
||||
datasource:
|
||||
url: http://vmsingle-example.default.svc:8429
|
||||
notifiers:
|
||||
- url: http://vmalertmanager-example-0.vmalertmanager-example.default.svc:9093
|
||||
- url: http://vmalertmanager-example-1.vmalertmanager-example.default.svc:9093
|
||||
```
|
||||
|
||||
vmalert with service discovery:
|
||||
```yaml
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAlert
|
||||
metadata:
|
||||
name: example-ha
|
||||
namespace: default
|
||||
spec:
|
||||
datasource:
|
||||
url: http://vmsingle-example.default.svc:8429
|
||||
notifiers:
|
||||
- selector:
|
||||
namespaceSelector:
|
||||
matchNames:
|
||||
- default
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
usage: dedicated
|
||||
```
|
||||
|
||||
|
||||
## VMSingle
|
||||
|
||||
It doesn't support high availability by default, for such purpose use VMCluster or duplicate the setup.
|
||||
|
||||
|
||||
## VMCluster
|
||||
|
||||
Cluster version provides a full set of high availability features - metrics replication, node failover, horizontal scaling.
|
||||
|
||||
For using cluster version you have to create corresponding CRD object:
|
||||
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMCluster
|
||||
metadata:
|
||||
name: example-vmcluster-persistent
|
||||
spec:
|
||||
retentionPeriod: "4"
|
||||
replicationFactor: 2
|
||||
vmstorage:
|
||||
replicaCount: 2
|
||||
storageDataPath: "/vm-data"
|
||||
podMetadata:
|
||||
labels:
|
||||
owner: infra
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchExpressions:
|
||||
- key: "app.kubernetes.io/name"
|
||||
operator: In
|
||||
values:
|
||||
- "vmstorage"
|
||||
topologyKey: "kubernetes.io/hostname"
|
||||
storage:
|
||||
volumeClaimTemplate:
|
||||
spec:
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
resources:
|
||||
limits:
|
||||
cpu: "2"
|
||||
memory: 2048Mi
|
||||
vmselect:
|
||||
replicaCount: 2
|
||||
cacheMountPath: "/select-cache"
|
||||
podMetadata:
|
||||
labels:
|
||||
owner: infra
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchExpressions:
|
||||
- key: "app.kubernetes.io/name"
|
||||
operator: In
|
||||
values:
|
||||
- "vmselect"
|
||||
topologyKey: "kubernetes.io/hostname"
|
||||
|
||||
storage:
|
||||
volumeClaimTemplate:
|
||||
spec:
|
||||
resources:
|
||||
requests:
|
||||
storage: 2Gi
|
||||
resources:
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: "500Mi"
|
||||
vminsert:
|
||||
replicaCount: 2
|
||||
podMetadata:
|
||||
labels:
|
||||
owner: infra
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchExpressions:
|
||||
- key: "app.kubernetes.io/name"
|
||||
operator: In
|
||||
values:
|
||||
- "vminsert"
|
||||
topologyKey: "kubernetes.io/hostname"
|
||||
resources:
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: "500Mi"
|
||||
|
||||
EOF
|
||||
```
|
||||
|
||||
Then wait for the cluster becomes ready
|
||||
|
||||
```bash
|
||||
kubectl get vmclusters -w
|
||||
NAME INSERT COUNT STORAGE COUNT SELECT COUNT AGE STATUS
|
||||
example-vmcluster-persistent 2 2 2 2s expanding
|
||||
example-vmcluster-persistent 2 2 2 30s operational
|
||||
```
|
||||
|
||||
Get links for connection by executing command:
|
||||
|
||||
```bash
|
||||
kubectl get svc -l app.kubernetes.io/instance=example-vmcluster-persistent
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
vminsert-example-vmcluster-persistent ClusterIP 10.96.34.94 <none> 8480/TCP 69s
|
||||
vmselect-example-vmcluster-persistent ClusterIP None <none> 8481/TCP 79s
|
||||
vmstorage-example-vmcluster-persistent ClusterIP None <none> 8482/TCP,8400/TCP,8401/TCP 85s
|
||||
```
|
||||
|
||||
Now you can connect vmagent to vminsert and vmalert to vmselect
|
||||
|
||||
>NOTE do not forget to create rbac for vmagent
|
||||
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
# Add fields here
|
||||
replicaCount: 1
|
||||
remoteWrite:
|
||||
- url: "http://vminsert-example-vmcluster-persistent.default.svc.cluster.local:8480/insert/0/prometheus/api/v1/write"
|
||||
EOF
|
||||
```
|
||||
|
||||
Config for vmalert
|
||||
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAlert
|
||||
metadata:
|
||||
name: example-vmalert
|
||||
spec:
|
||||
# Add fields here
|
||||
replicas: 1
|
||||
datasource:
|
||||
url: "http://vmselect-example-vmcluster-persistent.default.svc.cluster.local:8481/select/0/prometheus"
|
||||
notifier:
|
||||
url: "http://alertmanager-operated.default.svc:9093"
|
||||
evaluationInterval: "10s"
|
||||
ruleSelector: {}
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
## Alertmanager
|
||||
|
||||
The final step of the high availability scheme is Alertmanager, when an alert triggers, actually fires alerts against *all* instances of an Alertmanager cluster.
|
||||
|
||||
The Alertmanager, starting with the `v0.5.0` release, ships with a high availability mode. It implements a gossip protocol to synchronize instances of an Alertmanager cluster regarding notifications that have been sent out, to prevent duplicate notifications. It is an AP (available and partition tolerant) system. Being an AP system means that notifications are guaranteed to be sent at least once.
|
||||
|
||||
The Victoria Metrics Operator ensures that Alertmanager clusters are properly configured to run highly available on Kubernetes.
|
||||
BIN
operator/logo.png
Normal file
|
After Width: | Height: | Size: 15 KiB |
85
operator/managing-versions.MD
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
sort: 8
|
||||
---
|
||||
|
||||
|
||||
# Managing application versions
|
||||
|
||||
## VMAlert, VMAgent, VMAlertmanager, VMSingle version
|
||||
|
||||
|
||||
for those objects you can specify following settings at `spec.Image`
|
||||
|
||||
for instance, to set `VMSingle` version add `spec.image.tag` name from [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMSingle
|
||||
metadata:
|
||||
name: example-vmsingle
|
||||
spec:
|
||||
image:
|
||||
repository: victoriametrics/victoria-metrics
|
||||
tag: v1.39.2
|
||||
pullPolicy: Always
|
||||
retentionPeriod: "1"
|
||||
EOF
|
||||
```
|
||||
|
||||
Also, you can specify `imagePullSecrets` if you are pulling images from private repo:
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMSingle
|
||||
metadata:
|
||||
name: example-vmsingle
|
||||
spec:
|
||||
imagePullSecrets:
|
||||
- name: my-repo-secret
|
||||
image:
|
||||
repository: my-repo-url/victoria-metrics
|
||||
tag: v1.39.2
|
||||
retentionPeriod: "1"
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
# VMCluster
|
||||
|
||||
for `VMCluster` you can specify tag and repository setting per cluster object.
|
||||
But `imagePullSecrets` is global setting for all `VMCluster` specification.
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMCluster
|
||||
metadata:
|
||||
name: example-vmcluster
|
||||
spec:
|
||||
imagePullSecrets:
|
||||
- name: my-repo-secret
|
||||
# Add fields here
|
||||
retentionPeriod: "1"
|
||||
vmstorage:
|
||||
replicaCount: 2
|
||||
image:
|
||||
repository: victoriametrics/vmstorage
|
||||
tag: v1.39.2-cluster
|
||||
pullPolicy: Always
|
||||
vmselect:
|
||||
replicaCount: 2
|
||||
image:
|
||||
repository: victoriametrics/vmselect
|
||||
tag: v1.39.2-cluster
|
||||
pullPolicy: Always
|
||||
vminsert:
|
||||
replicaCount: 2
|
||||
image:
|
||||
repository: victoriametrics/vminsert
|
||||
tag: v1.39.2-cluster
|
||||
pullPolicy: Always
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
|
||||
1519
operator/quick-start.MD
Normal file
242
operator/relabeling.MD
Normal file
@@ -0,0 +1,242 @@
|
||||
---
|
||||
sort: 10
|
||||
---
|
||||
|
||||
# Relabeling
|
||||
|
||||
## VMAgent relabel
|
||||
|
||||
|
||||
`VMAgent` supports global relabeling for all metrics and per remoteWrite target relabel config.
|
||||
|
||||
> Note in some cases, you don't need relabeling,
|
||||
> key=value label pairs can be added to the all scrapped metrics with `spec.externalLabels` for `VMAgent`.
|
||||
>
|
||||
```yaml
|
||||
# simple label add config
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: stack
|
||||
spec:
|
||||
externalLabels:
|
||||
clusterid: some_cluster
|
||||
```
|
||||
|
||||
|
||||
|
||||
It supports relabeling with custom configMap or inline defined at CRD
|
||||
|
||||
## Configmap example
|
||||
|
||||
Quick tour how to to create `Confimap` with relabeling configuration
|
||||
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: vmagent-relabel
|
||||
data:
|
||||
global-relabel.yaml: |
|
||||
- target_label: bar
|
||||
- source_labels: [aa]
|
||||
separator: "foobar"
|
||||
regex: "foo.+bar"
|
||||
target_label: aaa
|
||||
replacement: "xxx"
|
||||
- action: keep
|
||||
source_labels: [aaa]
|
||||
- action: drop
|
||||
source_labels: [aaa]
|
||||
target-1-relabel.yaml: |
|
||||
- action: keep_if_equal
|
||||
source_labels: [foo, bar]
|
||||
- action: drop_if_equal
|
||||
source_labels: [foo, bar]
|
||||
EOF
|
||||
```
|
||||
|
||||
Second, add `relabelConfig` to `VMagent` spec for global relabeling with name of `Configmap` - `vmagent-relabel` and key `global-relabel.yaml`.
|
||||
For relabeling per remoteWrite target, add `urlRelabelConfig` name of `Configmap` - `vmagent-relabel` and key `target-1-relabel.yaml` to one of remoteWrite target for relabeling only
|
||||
for those target.
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
replicaCount: 1
|
||||
serviceAccountName: vmagent
|
||||
relabelConfig:
|
||||
name: "vmagent-relabel"
|
||||
key: "global-relabel.yaml"
|
||||
remoteWrite:
|
||||
- url: "http://vmsingle-example-vmsingle-persisted.default.svc:8429/api/v1/write"
|
||||
- url: "http://vmsingle-example-vmsingle.default.svc:8429/api/v1/write"
|
||||
urlRelabelConfig:
|
||||
name: "vmagent-relabel"
|
||||
key: "target-1-relabel.yaml"
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
## Inline example
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
replicaCount: 1
|
||||
serviceAccountName: vmagent
|
||||
inlineRelabelConfig:
|
||||
- target_label: bar
|
||||
- source_labels: [aa]
|
||||
separator: "foobar"
|
||||
regex: "foo.+bar"
|
||||
target_label: aaa
|
||||
replacement: "xxx"
|
||||
- action: keep
|
||||
source_labels: [aaa]
|
||||
- action: drop
|
||||
source_labels: [aaa]
|
||||
remoteWrite:
|
||||
- url: "http://vmsingle-example-vmsingle-persisted.default.svc:8429/api/v1/write"
|
||||
- url: "http://vmsingle-example-vmsingle.default.svc:8429/api/v1/write"
|
||||
inlineUrlRelabelConfig:
|
||||
- action: keep_if_equal
|
||||
source_labels: [foo, bar]
|
||||
- action: drop_if_equal
|
||||
source_labels: [foo, bar]
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
## Combined example
|
||||
|
||||
Its also possible to use both features in combination.
|
||||
|
||||
First will be added relabeling configs from `inlineRelabelConfig`, then `relabelConfig` from configmap.
|
||||
|
||||
```yaml
|
||||
cat << EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: vmagent-relabel
|
||||
data:
|
||||
global-relabel.yaml: |
|
||||
- target_label: bar
|
||||
- source_labels: [aa]
|
||||
separator: "foobar"
|
||||
regex: "foo.+bar"
|
||||
target_label: aaa
|
||||
replacement: "xxx"
|
||||
- action: keep
|
||||
source_labels: [aaa]
|
||||
- action: drop
|
||||
source_labels: [aaa]
|
||||
target-1-relabel.yaml: |
|
||||
- action: keep_if_equal
|
||||
source_labels: [foo, bar]
|
||||
- action: drop_if_equal
|
||||
source_labels: [foo, bar]
|
||||
EOF
|
||||
```
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMAgent
|
||||
metadata:
|
||||
name: example-vmagent
|
||||
spec:
|
||||
serviceScrapeNamespaceSelector: {}
|
||||
podScrapeNamespaceSelector: {}
|
||||
podScrapeSelector: {}
|
||||
serviceScrapeSelector: {}
|
||||
replicaCount: 1
|
||||
serviceAccountName: vmagent
|
||||
inlineRelabelConfig:
|
||||
- target_label: bar1
|
||||
- source_labels: [aa]
|
||||
relabelConfig:
|
||||
name: "vmagent-relabel"
|
||||
key: "global-relabel.yaml"
|
||||
remoteWrite:
|
||||
- url: "http://vmsingle-example-vmsingle-persisted.default.svc:8429/api/v1/write"
|
||||
- url: "http://vmsingle-example-vmsingle.default.svc:8429/api/v1/write"
|
||||
urlRelabelConfig:
|
||||
name: "vmagent-relabel"
|
||||
key: "target-1-relabel.yaml"
|
||||
inlineUrlRelabelConfig:
|
||||
- action: keep_if_equal
|
||||
source_labels: [foo1, bar2]
|
||||
EOF
|
||||
```
|
||||
|
||||
|
||||
Resulted configmap:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
data:
|
||||
global_relabeling.yaml: |
|
||||
- target_label: bar1
|
||||
- source_labels:
|
||||
- aa
|
||||
- target_label: bar
|
||||
- source_labels: [aa]
|
||||
separator: "foobar"
|
||||
regex: "foo.+bar"
|
||||
target_label: aaa
|
||||
replacement: "xxx"
|
||||
- action: keep
|
||||
source_labels: [aaa]
|
||||
- action: drop
|
||||
source_labels: [aaa]
|
||||
url_rebaling-1.yaml: |
|
||||
- source_labels:
|
||||
- foo1
|
||||
- bar2
|
||||
action: keep_if_equal
|
||||
- action: keep_if_equal
|
||||
source_labels: [foo, bar]
|
||||
- action: drop_if_equal
|
||||
source_labels: [foo, bar]
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
finalizers:
|
||||
- apps.victoriametrics.com/finalizer
|
||||
labels:
|
||||
app.kubernetes.io/component: monitoring
|
||||
app.kubernetes.io/instance: example-vmagent
|
||||
app.kubernetes.io/name: vmagent
|
||||
managed-by: vm-operator
|
||||
name: relabelings-assets-vmagent-example-vmagent
|
||||
namespace: default
|
||||
ownerReferences:
|
||||
- apiVersion: operator.victoriametrics.com/v1beta1
|
||||
blockOwnerDeletion: true
|
||||
controller: true
|
||||
kind: VMAgent
|
||||
name: example-vmagent
|
||||
uid: 7e9fb838-65da-4443-a43b-c00cd6c4db5b
|
||||
```
|
||||
|
||||
|
||||
## Additional information
|
||||
|
||||
`VMAgent` also has some extra options for relabeling actions, you can check it [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmagent/README.md#relabeling)
|
||||
36
operator/resources-validation.MD
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
sort: 11
|
||||
---
|
||||
|
||||
# CRD Validation
|
||||
|
||||
## Description
|
||||
Operator supports validation admission webhook [docs](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
|
||||
|
||||
It checks resources configuration and returns errors to caller before resource will be created at kubernetes api.
|
||||
This should reduce errors and simplify debugging.
|
||||
|
||||
## Configuration
|
||||
|
||||
Validation hooks at operator side must be enabled with flags:
|
||||
```
|
||||
--webhook.enable
|
||||
# optional configuration for certDir and tls names.
|
||||
--webhook.certDir=/tmp/k8s-webhook-server/serving-certs/
|
||||
--webhook.keyName=tls.key
|
||||
--webhook.certName=tls.crt
|
||||
```
|
||||
|
||||
You have to mount correct certificates at give directory.
|
||||
It can be simplified with cert-manager and kustomize command: `kustomize build config/deployments/webhook/ `
|
||||
|
||||
|
||||
## Requirements
|
||||
|
||||
- Valid certificate with key must be provided to operator
|
||||
- Valid CABundle must be added to the `ValidatingWebhookConfiguration`
|
||||
|
||||
|
||||
## Useful links
|
||||
- [k8s admission webhooks](https://banzaicloud.com/blog/k8s-admission-webhooks/)
|
||||
- [olm webhooks](https://docs.openshift.com/container-platform/4.5/operators/user/olm-webhooks.html)
|
||||
85
operator/security.MD
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
sort: 12
|
||||
---
|
||||
|
||||
# Security
|
||||
|
||||
VictoriaMetrics operator provides several security features, such as [PodSecurityPolicies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/), [PodSecurityContext](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/).
|
||||
|
||||
|
||||
## PodSecurityPolicy.
|
||||
|
||||
By default, operator creates serviceAccount for each cluster resource and binds default `PodSecurityPolicy` to it.
|
||||
|
||||
Default psp:
|
||||
```yaml
|
||||
apiVersion: policy/v1beta1
|
||||
kind: PodSecurityPolicy
|
||||
metadata:
|
||||
name: vmagent-example-vmagent
|
||||
spec:
|
||||
allowPrivilegeEscalation: false
|
||||
fsGroup:
|
||||
rule: RunAsAny
|
||||
hostNetwork: true
|
||||
requiredDropCapabilities:
|
||||
- ALL
|
||||
runAsUser:
|
||||
rule: RunAsAny
|
||||
seLinux:
|
||||
rule: RunAsAny
|
||||
supplementalGroups:
|
||||
rule: RunAsAny
|
||||
volumes:
|
||||
- persistentVolumeClaim
|
||||
- secret
|
||||
- emptyDir
|
||||
- configMap
|
||||
- projected
|
||||
- downwardAPI
|
||||
- nfs
|
||||
```
|
||||
|
||||
This behaviour may be disabled with env variable passed to operator:
|
||||
```yaml
|
||||
- name: VM_PSPAUTOCREATEENABLED
|
||||
value: "false"
|
||||
```
|
||||
|
||||
User may also override default pod security policy with setting: `spec.podSecurityPolicyName: "psp-name"`.
|
||||
|
||||
|
||||
## PodSecurityContext
|
||||
|
||||
`PodSecurityContext` can be configured with spec setting. It may be useful for mounted volumes, with `VMSingle` for example:
|
||||
|
||||
```yaml
|
||||
apiVersion: operator.victoriametrics.com/v1beta1
|
||||
kind: VMSingle
|
||||
metadata:
|
||||
name: vmsingle-f
|
||||
namespace: monitoring-system
|
||||
spec:
|
||||
retentionPeriod: "2"
|
||||
removePvcAfterDelete: true
|
||||
securityContext:
|
||||
runAsUser: 1000
|
||||
fsGroup: 1000
|
||||
runAsGroup: 1000
|
||||
extraArgs:
|
||||
dedup.minScrapeInterval: 10s
|
||||
storage:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 25Gi
|
||||
resources:
|
||||
requests:
|
||||
cpu: "0.5"
|
||||
memory: "512Mi"
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: "1512Mi"
|
||||
|
||||
```
|
||||
110
operator/vars.MD
Normal file
@@ -0,0 +1,110 @@
|
||||
# Auto Generated vars for package config
|
||||
updated at Fri Jan 21 15:57:41 UTC 2022
|
||||
|
||||
|
||||
| varible name | variable default value | variable required | variable description |
|
||||
| --- | --- | --- | --- |
|
||||
| VM_USECUSTOMCONFIGRELOADER | false | false | enables custom config reloader for vmauth and vmagent,it should speed-up config reloading process. |
|
||||
| VM_CUSTOMCONFIGRELOADERIMAGE | victoriametrics/operator:config-reloader-0.1.0 | false | - |
|
||||
| VM_PSPAUTOCREATEENABLED | true | false | - |
|
||||
| VM_VMALERTDEFAULT_IMAGE | victoriametrics/vmalert | false | - |
|
||||
| VM_VMALERTDEFAULT_VERSION | v1.72.0 | false | - |
|
||||
| VM_VMALERTDEFAULT_PORT | 8080 | false | - |
|
||||
| VM_VMALERTDEFAULT_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMALERTDEFAULT_RESOURCE_LIMIT_MEM | 500Mi | false | - |
|
||||
| VM_VMALERTDEFAULT_RESOURCE_LIMIT_CPU | 200m | false | - |
|
||||
| VM_VMALERTDEFAULT_RESOURCE_REQUEST_MEM | 200Mi | false | - |
|
||||
| VM_VMALERTDEFAULT_RESOURCE_REQUEST_CPU | 50m | false | - |
|
||||
| VM_VMALERTDEFAULT_CONFIGRELOADERCPU | 100m | false | - |
|
||||
| VM_VMALERTDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - |
|
||||
| VM_VMALERTDEFAULT_CONFIGRELOADIMAGE | jimmidyson/configmap-reload:v0.3.0 | false | - |
|
||||
| VM_VMAGENTDEFAULT_IMAGE | victoriametrics/vmagent | false | - |
|
||||
| VM_VMAGENTDEFAULT_VERSION | v1.72.0 | false | - |
|
||||
| VM_VMAGENTDEFAULT_CONFIGRELOADIMAGE | quay.io/prometheus-operator/prometheus-config-reloader:v0.48.1 | false | - |
|
||||
| VM_VMAGENTDEFAULT_PORT | 8429 | false | - |
|
||||
| VM_VMAGENTDEFAULT_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMAGENTDEFAULT_RESOURCE_LIMIT_MEM | 500Mi | false | - |
|
||||
| VM_VMAGENTDEFAULT_RESOURCE_LIMIT_CPU | 200m | false | - |
|
||||
| VM_VMAGENTDEFAULT_RESOURCE_REQUEST_MEM | 200Mi | false | - |
|
||||
| VM_VMAGENTDEFAULT_RESOURCE_REQUEST_CPU | 50m | false | - |
|
||||
| VM_VMAGENTDEFAULT_CONFIGRELOADERCPU | 100m | false | - |
|
||||
| VM_VMAGENTDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - |
|
||||
| VM_VMSINGLEDEFAULT_IMAGE | victoriametrics/victoria-metrics | false | - |
|
||||
| VM_VMSINGLEDEFAULT_VERSION | v1.72.0 | false | - |
|
||||
| VM_VMSINGLEDEFAULT_PORT | 8429 | false | - |
|
||||
| VM_VMSINGLEDEFAULT_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMSINGLEDEFAULT_RESOURCE_LIMIT_MEM | 1500Mi | false | - |
|
||||
| VM_VMSINGLEDEFAULT_RESOURCE_LIMIT_CPU | 1200m | false | - |
|
||||
| VM_VMSINGLEDEFAULT_RESOURCE_REQUEST_MEM | 500Mi | false | - |
|
||||
| VM_VMSINGLEDEFAULT_RESOURCE_REQUEST_CPU | 150m | false | - |
|
||||
| VM_VMSINGLEDEFAULT_CONFIGRELOADERCPU | 100m | false | - |
|
||||
| VM_VMSINGLEDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_IMAGE | victoriametrics/vmselect | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_VERSION | v1.72.0-cluster | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_PORT | 8481 | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_LIMIT_MEM | 1000Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_LIMIT_CPU | 500m | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_REQUEST_MEM | 500Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSELECTDEFAULT_RESOURCE_REQUEST_CPU | 100m | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_IMAGE | victoriametrics/vmstorage | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VERSION | v1.72.0-cluster | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VMINSERTPORT | 8400 | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_VMSELECTPORT | 8401 | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_PORT | 8482 | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_RESOURCE_LIMIT_MEM | 1500Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_RESOURCE_LIMIT_CPU | 1000m | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_RESOURCE_REQUEST_MEM | 500Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMSTORAGEDEFAULT_RESOURCE_REQUEST_CPU | 250m | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_IMAGE | victoriametrics/vminsert | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_VERSION | v1.72.0-cluster | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_PORT | 8480 | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_RESOURCE_LIMIT_MEM | 500Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_RESOURCE_LIMIT_CPU | 500m | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_RESOURCE_REQUEST_MEM | 200Mi | false | - |
|
||||
| VM_VMCLUSTERDEFAULT_VMINSERTDEFAULT_RESOURCE_REQUEST_CPU | 150m | false | - |
|
||||
| VM_VMALERTMANAGER_CONFIGRELOADERIMAGE | jimmidyson/configmap-reload:v0.3.0 | false | - |
|
||||
| VM_VMALERTMANAGER_CONFIGRELOADERCPU | 100m | false | - |
|
||||
| VM_VMALERTMANAGER_CONFIGRELOADERMEMORY | 25Mi | false | - |
|
||||
| VM_VMALERTMANAGER_ALERTMANAGERDEFAULTBASEIMAGE | prom/alertmanager | false | - |
|
||||
| VM_VMALERTMANAGER_ALERTMANAGERVERSION | v0.22.2 | false | - |
|
||||
| VM_VMALERTMANAGER_LOCALHOST | 127.0.0.1 | false | - |
|
||||
| VM_VMALERTMANAGER_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMALERTMANAGER_RESOURCE_LIMIT_MEM | 256Mi | false | - |
|
||||
| VM_VMALERTMANAGER_RESOURCE_LIMIT_CPU | 100m | false | - |
|
||||
| VM_VMALERTMANAGER_RESOURCE_REQUEST_MEM | 56Mi | false | - |
|
||||
| VM_VMALERTMANAGER_RESOURCE_REQUEST_CPU | 30m | false | - |
|
||||
| VM_DISABLESELFSERVICESCRAPECREATION | false | false | - |
|
||||
| VM_VMBACKUP_IMAGE | victoriametrics/vmbackupmanager | false | - |
|
||||
| VM_VMBACKUP_VERSION | v1.72.0-enterprise | false | - |
|
||||
| VM_VMBACKUP_PORT | 8300 | false | - |
|
||||
| VM_VMBACKUP_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMBACKUP_RESOURCE_LIMIT_MEM | 500Mi | false | - |
|
||||
| VM_VMBACKUP_RESOURCE_LIMIT_CPU | 500m | false | - |
|
||||
| VM_VMBACKUP_RESOURCE_REQUEST_MEM | 200Mi | false | - |
|
||||
| VM_VMBACKUP_RESOURCE_REQUEST_CPU | 150m | false | - |
|
||||
| VM_VMBACKUP_LOGLEVEL | INFO | false | - |
|
||||
| VM_VMAUTHDEFAULT_IMAGE | victoriametrics/vmauth | false | - |
|
||||
| VM_VMAUTHDEFAULT_VERSION | v1.72.0 | false | - |
|
||||
| VM_VMAUTHDEFAULT_CONFIGRELOADIMAGE | quay.io/prometheus-operator/prometheus-config-reloader:v0.48.1 | false | - |
|
||||
| VM_VMAUTHDEFAULT_PORT | 8427 | false | - |
|
||||
| VM_VMAUTHDEFAULT_USEDEFAULTRESOURCES | true | false | - |
|
||||
| VM_VMAUTHDEFAULT_RESOURCE_LIMIT_MEM | 300Mi | false | - |
|
||||
| VM_VMAUTHDEFAULT_RESOURCE_LIMIT_CPU | 200m | false | - |
|
||||
| VM_VMAUTHDEFAULT_RESOURCE_REQUEST_MEM | 100Mi | false | - |
|
||||
| VM_VMAUTHDEFAULT_RESOURCE_REQUEST_CPU | 50m | false | - |
|
||||
| VM_VMAUTHDEFAULT_CONFIGRELOADERCPU | 100m | false | - |
|
||||
| VM_VMAUTHDEFAULT_CONFIGRELOADERMEMORY | 25Mi | false | - |
|
||||
| VM_ENABLEDPROMETHEUSCONVERTER_PODMONITOR | true | false | - |
|
||||
| VM_ENABLEDPROMETHEUSCONVERTER_SERVICESCRAPE | true | false | - |
|
||||
| VM_ENABLEDPROMETHEUSCONVERTER_PROMETHEUSRULE | true | false | - |
|
||||
| VM_ENABLEDPROMETHEUSCONVERTER_PROBE | true | false | - |
|
||||
| VM_ENABLEDPROMETHEUSCONVERTEROWNERREFERENCES | false | false | - |
|
||||
| VM_HOST | 0.0.0.0 | false | - |
|
||||
| VM_LISTENADDRESS | 0.0.0.0 | false | - |
|
||||
| VM_DEFAULTLABELS | managed-by=vm-operator | false | - |
|
||||
| VM_LABELS | - | false | - |
|
||||
| VM_CLUSTERDOMAINNAME | - | false | - |
|
||||
| VM_PODWAITREADYTIMEOUT | 80s | false | - |
|
||||
| VM_PODWAITREADYINTERVALCHECK | 5s | false | - |
|
||||
| VM_PODWAITREADYINITDELAY | 10s | false | - |
|
||||
2
robots.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
user-agent: *
|
||||
allow: /
|
||||
988
vmagent.md
Normal file
@@ -0,0 +1,988 @@
|
||||
---
|
||||
sort: 3
|
||||
---
|
||||
|
||||
# vmagent
|
||||
|
||||
`vmagent` is a tiny but mighty agent which helps you collect metrics from various sources
|
||||
and store them in [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)
|
||||
or any other Prometheus-compatible storage systems that support the `remote_write` protocol.
|
||||
|
||||
<img alt="vmagent" src="vmagent.png">
|
||||
|
||||
|
||||
## Motivation
|
||||
|
||||
While VictoriaMetrics provides an efficient solution to store and observe metrics, our users needed something fast
|
||||
and RAM friendly to scrape metrics from Prometheus-compatible exporters into VictoriaMetrics.
|
||||
Also, we found that our user's infrastructure are like snowflakes in that no two are alike. Therefore we decided to add more flexibility
|
||||
to `vmagent` such as the ability to push metrics instead of pulling them. We did our best and will continue to improve vmagent.
|
||||
|
||||
|
||||
## Features
|
||||
|
||||
* Can be used as a drop-in replacement for Prometheus for scraping targets such as [node_exporter](https://github.com/prometheus/node_exporter). See [Quick Start](#quick-start) for details.
|
||||
* Can read data from Kafka. See [these docs](#reading-metrics-from-kafka).
|
||||
* Can write data to Kafka. See [these docs](#writing-metrics-to-kafka).
|
||||
* Can add, remove and modify labels (aka tags) via Prometheus relabeling. Can filter data before sending it to remote storage. See [these docs](#relabeling) for details.
|
||||
* Accepts data via all ingestion protocols supported by VictoriaMetrics:
|
||||
* DataDog "submit metrics" API. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-datadog-agent).
|
||||
* InfluxDB line protocol via `http://<vmagent>:8429/write`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf).
|
||||
* Graphite plaintext protocol if `-graphiteListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-graphite-compatible-agents-such-as-statsd).
|
||||
* OpenTSDB telnet and http protocols if `-opentsdbListenAddr` command-line flag is set. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-opentsdb-compatible-agents).
|
||||
* Prometheus remote write protocol via `http://<vmagent>:8429/api/v1/write`.
|
||||
* JSON lines import protocol via `http://<vmagent>:8429/api/v1/import`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-json-line-format).
|
||||
* Native data import protocol via `http://<vmagent>:8429/api/v1/import/native`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-native-format).
|
||||
* Prometheus exposition format via `http://<vmagent>:8429/api/v1/import/prometheus`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-data-in-prometheus-exposition-format) for details.
|
||||
* Arbitrary CSV data via `http://<vmagent>:8429/api/v1/import/csv`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-import-csv-data).
|
||||
* Can replicate collected metrics simultaneously to multiple remote storage systems.
|
||||
* Works smoothly in environments with unstable connections to remote storage. If the remote storage is unavailable, the collected metrics
|
||||
are buffered at `-remoteWrite.tmpDataPath`. The buffered metrics are sent to remote storage as soon as the connection
|
||||
to the remote storage is repaired. The maximum disk usage for the buffer can be limited with `-remoteWrite.maxDiskUsagePerURL`.
|
||||
* Uses lower amounts of RAM, CPU, disk IO and network bandwidth compared with Prometheus.
|
||||
* Scrape targets can be spread among multiple `vmagent` instances when big number of targets must be scraped. See [these docs](#scraping-big-number-of-targets).
|
||||
* Can efficiently scrape targets that expose millions of time series such as [/federate endpoint in Prometheus](https://prometheus.io/docs/prometheus/latest/federation/). See [these docs](#stream-parsing-mode).
|
||||
* Can deal with [high cardinality](https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality) and [high churn rate](https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate) issues by limiting the number of unique time series at scrape time and before sending them to remote storage systems. See [these docs](#cardinality-limiter).
|
||||
* Can load scrape configs from multiple files. See [these docs](#loading-scrape-configs-from-multiple-files).
|
||||
|
||||
## Quick Start
|
||||
|
||||
Please download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases), unpack it
|
||||
and configure the following flags to the `vmagent` binary in order to start scraping Prometheus targets:
|
||||
|
||||
* `-promscrape.config` with the path to Prometheus config file (usually located at `/etc/prometheus/prometheus.yml`). The path can point either to local file or to http url.
|
||||
* `-remoteWrite.url` with the remote storage endpoint such as VictoriaMetrics, the `-remoteWrite.url` argument can be specified multiple times to replicate data concurrently to an arbitrary number of remote storage systems.
|
||||
|
||||
Example command line:
|
||||
|
||||
```
|
||||
/path/to/vmagent -promscrape.config=/path/to/prometheus.yml -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
|
||||
```
|
||||
|
||||
If you only need to collect InfluxDB data, then the following command is sufficient:
|
||||
|
||||
```
|
||||
/path/to/vmagent -remoteWrite.url=https://victoria-metrics-host:8428/api/v1/write
|
||||
```
|
||||
|
||||
Then send InfluxDB data to `http://vmagent-host:8429`. See [these docs](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf) for more details.
|
||||
|
||||
`vmagent` is also available in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags).
|
||||
|
||||
Pass `-help` to `vmagent` in order to see [the full list of supported command-line flags with their descriptions](#advanced-usage).
|
||||
|
||||
|
||||
## Configuration update
|
||||
|
||||
`vmagent` should be restarted in order to update config options set via command-line args.
|
||||
|
||||
`vmagent` supports multiple approaches for reloading configs from updated config files such as `-promscrape.config`, `-remoteWrite.relabelConfig` and `-remoteWrite.urlRelabelConfig`:
|
||||
|
||||
* Sending `SUGHUP` signal to `vmagent` process:
|
||||
```bash
|
||||
kill -SIGHUP `pidof vmagent`
|
||||
```
|
||||
|
||||
* Sending HTTP request to `http://vmagent:8429/-/reload` endpoint.
|
||||
|
||||
There is also `-promscrape.configCheckInterval` command-line option, which can be used for automatic reloading configs from updated `-promscrape.config` file.
|
||||
|
||||
|
||||
## Use cases
|
||||
|
||||
|
||||
### IoT and Edge monitoring
|
||||
|
||||
`vmagent` can run and collect metrics in IoT and industrial networks with unreliable or scheduled connections to their remote storage.
|
||||
It buffers the collected data in local files until the connection to remote storage becomes available and then sends the buffered
|
||||
data to the remote storage. It re-tries sending the data to remote storage until any errors are resolved.
|
||||
The maximum buffer size can be limited with `-remoteWrite.maxDiskUsagePerURL`.
|
||||
|
||||
`vmagent` works on various architectures from the IoT world - 32-bit arm, 64-bit arm, ppc64, 386, amd64.
|
||||
See [the corresponding Makefile rules](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmagent/Makefile) for details.
|
||||
|
||||
|
||||
### Drop-in replacement for Prometheus
|
||||
|
||||
If you use Prometheus only for scraping metrics from various targets and forwarding those metrics to remote storage
|
||||
then `vmagent` can replace Prometheus. Typically, `vmagent` requires lower amounts of RAM, CPU and network bandwidth compared with Prometheus.
|
||||
See [these docs](#how-to-collect-metrics-in-prometheus-format) for details.
|
||||
|
||||
|
||||
### Replication and high availability
|
||||
|
||||
`vmagent` replicates the collected metrics among multiple remote storage instances configured via `-remoteWrite.url` args.
|
||||
If a single remote storage instance temporarily is out of service, then the collected data remains available in another remote storage instance.
|
||||
`vmagent` buffers the collected data in files at `-remoteWrite.tmpDataPath` until the remote storage becomes available again and then it sends the buffered data to the remote storage in order to prevent data gaps.
|
||||
|
||||
|
||||
### Relabeling and filtering
|
||||
|
||||
`vmagent` can add, remove or update labels on the collected data before sending it to the remote storage. Additionally,
|
||||
it can remove unwanted samples via Prometheus-like relabeling before sending the collected data to remote storage.
|
||||
Please see [these docs](#relabeling) for details.
|
||||
|
||||
|
||||
### Splitting data streams among multiple systems
|
||||
|
||||
`vmagent` supports splitting the collected data between muliple destinations with the help of `-remoteWrite.urlRelabelConfig`,
|
||||
which is applied independently for each configured `-remoteWrite.url` destination. For example, it is possible to replicate or split
|
||||
data among long-term remote storage, short-term remote storage and a real-time analytical system [built on top of Kafka](https://github.com/Telefonica/prometheus-kafka-adapter).
|
||||
Note that each destination can receive it's own subset of the collected data due to per-destination relabeling via `-remoteWrite.urlRelabelConfig`.
|
||||
|
||||
|
||||
### Prometheus remote_write proxy
|
||||
|
||||
`vmagent` can be used as a proxy for Prometheus data sent via Prometheus `remote_write` protocol. It can accept data via the `remote_write` API
|
||||
at the`/api/v1/write` endpoint. Then apply relabeling and filtering and proxy it to another `remote_write` system .
|
||||
The `vmagent` can be configured to encrypt the incoming `remote_write` requests with `-tls*` command-line flags.
|
||||
Also, Basic Auth can be enabled for the incoming `remote_write` requests with `-httpAuth.*` command-line flags.
|
||||
|
||||
|
||||
### remote_write for clustered version
|
||||
|
||||
While `vmagent` can accept data in several supported protocols (OpenTSDB, Influx, Prometheus, Graphite) and scrape data from various targets, writes are always peformed in Promethes remote_write protocol. Therefore for the [clustered version](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html), `-remoteWrite.url` the command-line flag should be configured as `<schema>://<vminsert-host>:8480/insert/<accountID>/prometheus/api/v1/write` according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format). There is also support for multitenant writes. See [these docs](#multitenancy).
|
||||
|
||||
## Multitenancy
|
||||
|
||||
By default `vmagent` collects the data without tenant identifiers and routes it to the configured `-remoteWrite.url`. But it can accept multitenant data if `-remoteWrite.multitenantURL` is set. In this case it accepts multitenant data at `http://vmagent:8429/insert/<accountID>/...` in the same way as cluster version of VictoriaMetrics does according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format) and routes it to `<-remoteWrite.multitenantURL>/insert/<accountID>/prometheus/api/v1/write`. If multiple `-remoteWrite.multitenantURL` command-line options are set, then `vmagent` replicates the collected data across all the configured urls. This allows using a single `vmagent` instance in front of VictoriaMetrics clusters for processing the data from all the tenants.
|
||||
|
||||
|
||||
## How to collect metrics in Prometheus format
|
||||
|
||||
Specify the path to `prometheus.yml` file via `-promscrape.config` command-line flag. `vmagent` takes into account the following
|
||||
sections from [Prometheus config file](https://prometheus.io/docs/prometheus/latest/configuration/configuration/):
|
||||
|
||||
* `global`
|
||||
* `scrape_configs`
|
||||
|
||||
All other sections are ignored, including the [remote_write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) section.
|
||||
Use `-remoteWrite.*` command-line flag instead for configuring remote write settings.
|
||||
|
||||
The following scrape types in [scrape_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) section are supported:
|
||||
|
||||
* `static_configs` - is for scraping statically defined targets. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#static_config) for details.
|
||||
* `file_sd_configs` - is for scraping targets defined in external files (aka file-based service discover).
|
||||
See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config) for details
|
||||
* `kubernetes_sd_configs` - for scraping targets in Kubernetes (k8s).
|
||||
See [kubernetes_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) for details.
|
||||
* `ec2_sd_configs` - is for scraping targets in Amazon EC2.
|
||||
See [ec2_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config) for details.
|
||||
`vmagent` doesn't support the `profile` config param yet.
|
||||
* `gce_sd_configs` - is for scraping targets in Google Compute Engine (GCE).
|
||||
See [gce_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config) for details.
|
||||
`vmagent` provides the following additional functionality for `gce_sd_config`:
|
||||
* if `project` arg is missing then `vmagent` uses the project for the instance where it runs;
|
||||
* if `zone` arg is missing then `vmagent` uses the zone for the instance where it runs;
|
||||
* if `zone` arg is equal to `"*"`, then `vmagent` discovers all the zones for the given project;
|
||||
* `zone` may contain an arbitrary number of zones, i.e. `zone: [us-east1-a, us-east1-b]`.
|
||||
* `consul_sd_configs` - is for scraping the targets registered in Consul.
|
||||
See [consul_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config) for details.
|
||||
* `dns_sd_configs` - is for scraping targets discovered from DNS records (SRV, A and AAAA).
|
||||
See [dns_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config) for details.
|
||||
* `openstack_sd_configs` - is for scraping OpenStack targets.
|
||||
See [openstack_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config) for details.
|
||||
[OpenStack identity API v3](https://docs.openstack.org/api-ref/identity/v3/) is supported only.
|
||||
* `docker_sd_configs` - is for scraping Docker targets.
|
||||
See [docker_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config) for details.
|
||||
* `dockerswarm_sd_configs` - is for scraping Docker Swarm targets.
|
||||
See [dockerswarm_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config) for details.
|
||||
* `eureka_sd_configs` - is for scraping targets registered in [Netflix Eureka](https://github.com/Netflix/eureka).
|
||||
See [eureka_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config) for details.
|
||||
* `digitalocean_sd_configs` is for scraping targerts registered in [DigitalOcean](https://www.digitalocean.com/)
|
||||
See [digitalocean_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config) for details.
|
||||
* `http_sd_configs` is for scraping targerts registered in http service discovery.
|
||||
See [http_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config) for details.
|
||||
|
||||
Please file feature requests to [our issue tracker](https://github.com/VictoriaMetrics/VictoriaMetrics/issues) if you need other service discovery mechanisms to be supported by `vmagent`.
|
||||
|
||||
`vmagent` also support the following additional options in `scrape_configs` section:
|
||||
|
||||
* `disable_compression: true` - to disable response compression on a per-job basis. By default `vmagent` requests compressed responses from scrape targets
|
||||
to save network bandwidth.
|
||||
* `disable_keepalive: true` - to disable [HTTP keep-alive connections](https://en.wikipedia.org/wiki/HTTP_persistent_connection) on a per-job basis.
|
||||
By default, `vmagent` uses keep-alive connections to scrape targets to reduce overhead on connection re-establishing.
|
||||
* `series_limit: N` - for limiting the number of unique time series a single scrape target can expose. See [these docs](#cardinality-limiter).
|
||||
* `stream_parse: true` - for scraping targets in a streaming manner. This may be useful for targets exporting big number of metrics. See [these docs](#stream-parsing-mode).
|
||||
* `scrape_align_interval: duration` - for aligning scrapes to the given interval instead of using random offset in the range `[0 ... scrape_interval]` for scraping each target. The random offset helps spreading scrapes evenly in time.
|
||||
* `scrape_offset: duration` - for specifying the exact offset for scraping instead of using random offset in the range `[0 ... scrape_interval]`.
|
||||
* `relabel_debug: true` - for enabling debug logging during relabeling of the discovered targets. See [these docs](#relabeling).
|
||||
* `metric_relabel_debug: true` - for enabling debug logging during relabeling of the scraped metrics. See [these docs](#relabeling).
|
||||
|
||||
Note that `vmagent` doesn't support `refresh_interval` option for these scrape configs. Use the corresponding `-promscrape.*CheckInterval`
|
||||
command-line flag instead. For example, `-promscrape.consulSDCheckInterval=60s` sets `refresh_interval` for all the `consul_sd_configs`
|
||||
entries to 60s. Run `vmagent -help` in order to see default values for the `-promscrape.*CheckInterval` flags.
|
||||
|
||||
The file pointed by `-promscrape.config` may contain `%{ENV_VAR}` placeholders which are substituted by the corresponding `ENV_VAR` environment variable values.
|
||||
|
||||
|
||||
## Loading scrape configs from multiple files
|
||||
|
||||
`vmagent` supports loading scrape configs from multiple files specified in the `scrape_config_files` section of `-promscrape.config` file. For example, the following `-promscrape.config` instructs `vmagent` loading scrape configs from all the `*.yml` files under `configs` directory, from `single_scrape_config.yml` local file and from `https://config-server/scrape_config.yml` url:
|
||||
|
||||
```yml
|
||||
scrape_config_files:
|
||||
- configs/*.yml
|
||||
- single_scrape_config.yml
|
||||
- https://config-server/scrape_config.yml
|
||||
```
|
||||
|
||||
Every referred file can contain arbitrary number of [supported scrape configs](#how-to-collect-metrics-in-prometheus-format). There is no need in specifying top-level `scrape_configs` section in these files. For example:
|
||||
|
||||
```yml
|
||||
- job_name: foo
|
||||
static_configs:
|
||||
- targets: ["vmagent:8429"]
|
||||
- job_name: bar
|
||||
kubernetes_sd_configs:
|
||||
- role: pod
|
||||
```
|
||||
|
||||
`vmagent` dynamically reloads these files on `SIGHUP` signal or on the request to `http://vmagent:8429/-/reload`.
|
||||
|
||||
|
||||
## Adding labels to metrics
|
||||
|
||||
Labels can be added to metrics by the following mechanisms:
|
||||
|
||||
* The `global -> external_labels` section in `-promscrape.config` file. These labels are added only to metrics scraped from targets configured in the `-promscrape.config` file. They aren't added to metrics collected via other [data ingestion protocols](https://docs.victoriametrics.com/#how-to-import-time-series-data).
|
||||
* The `-remoteWrite.label` command-line flag. These labels are added to all the collected metrics before sending them to `-remoteWrite.url`. For example, the following command will start `vmagent`, which will add `{datacenter="foobar"}` label to all the metrics pushed to all the configured remote storage systems (all the `-remoteWrite.url` flag values):
|
||||
|
||||
```
|
||||
/path/to/vmagent -remoteWrite.label=datacenter=foobar ...
|
||||
```
|
||||
|
||||
|
||||
## Relabeling
|
||||
|
||||
`vmagent` and VictoriaMetrics support Prometheus-compatible relabeling.
|
||||
They provide the following additional actions on top of actions from the [Prometheus relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config):
|
||||
|
||||
* `replace_all`: replaces all of the occurences of `regex` in the values of `source_labels` with the `replacement` and stores the results in the `target_label`.
|
||||
* `labelmap_all`: replaces all of the occurences of `regex` in all the label names with the `replacement`.
|
||||
* `keep_if_equal`: keeps the entry if all the label values from `source_labels` are equal.
|
||||
* `drop_if_equal`: drops the entry if all the label values from `source_labels` are equal.
|
||||
* `keep_metrics`: keeps all the metrics with names matching the given `regex`.
|
||||
* `drop_metrics`: drops all the metrics with names matching the given `regex`.
|
||||
|
||||
The `regex` value can be split into multiple lines for improved readability and maintainability. These lines are automatically joined with `|` char when parsed. For example, the following configs are equivalent:
|
||||
|
||||
```yaml
|
||||
- action: keep_metrics
|
||||
regex: "metric_a|metric_b|foo_.+"
|
||||
```
|
||||
|
||||
```yaml
|
||||
- action: keep_metrics
|
||||
regex:
|
||||
- "metric_a"
|
||||
- "metric_b"
|
||||
- "foo_.+"
|
||||
```
|
||||
|
||||
The relabeling can be defined in the following places:
|
||||
|
||||
* At the `scrape_config -> relabel_configs` section in `-promscrape.config` file. This relabeling is applied to target labels. This relabeling can be debugged by passing `relabel_debug: true` option to the corresponding `scrape_config` section. In this case `vmagent` logs target labels before and after the relabeling and then drops the logged target.
|
||||
* At the `scrape_config -> metric_relabel_configs` section in `-promscrape.config` file. This relabeling is applied to all the scraped metrics in the given `scrape_config`. This relabeling can be debugged by passing `metric_relabel_debug: true` option to the corresponding `scrape_config` section. In this case `vmagent` logs metrics before and after the relabeling and then drops the logged metrics.
|
||||
* At the `-remoteWrite.relabelConfig` file. This relabeling is applied to all the collected metrics before sending them to remote storage. This relabeling can be debugged by passing `-remoteWrite.relabelDebug` command-line option to `vmagent`. In this case `vmagent` logs metrics before and after the relabeling and then drops all the logged metrics instead of sending them to remote storage.
|
||||
* At the `-remoteWrite.urlRelabelConfig` files. This relabeling is applied to metrics before sending them to the corresponding `-remoteWrite.url`. This relabeling can be debugged by passing `-remoteWrite.urlRelabelDebug` command-line options to `vmagent`. In this case `vmagent` logs metrics before and after the relabeling and then drops all the logged metrics instead of sending them to the corresponding `-remoteWrite.url`.
|
||||
|
||||
You can read more about relabeling in the following articles:
|
||||
|
||||
* [How to use Relabeling in Prometheus and VictoriaMetrics](https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2)
|
||||
* [Life of a label](https://www.robustperception.io/life-of-a-label)
|
||||
* [Discarding targets and timeseries with relabeling](https://www.robustperception.io/relabelling-can-discard-targets-timeseries-and-alerts)
|
||||
* [Dropping labels at scrape time](https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus)
|
||||
* [Extracting labels from legacy metric names](https://www.robustperception.io/extracting-labels-from-legacy-metric-names)
|
||||
* [relabel_configs vs metric_relabel_configs](https://www.robustperception.io/relabel_configs-vs-metric_relabel_configs)
|
||||
|
||||
|
||||
## Prometheus staleness markers
|
||||
|
||||
`vmagent` sends [Prometheus staleness markers](https://www.robustperception.io/staleness-and-promql) to `-remoteWrite.url` in the following cases:
|
||||
|
||||
* If they are passed to `vmagent` via [Prometheus remote_write protocol](#prometheus-remote_write-proxy).
|
||||
* If the metric disappears from the list of scraped metrics, then stale marker is sent to this particular metric.
|
||||
* If the scrape target becomes temporarily unavailable, then stale markers are sent for all the metrics scraped from this target.
|
||||
* If the scrape target is removed from the list of targets, then stale markers are sent for all the metrics scraped from this target.
|
||||
|
||||
Prometheus staleness markers' tracking needs additional memory, since it must store the previous response body per each scrape target in order to compare it to the current response body. The memory usage may be reduced by passing `-promscrape.noStaleMarkers` command-line flag to `vmagent`. This disables staleness tracking. This also disables tracking the number of new time series per each scrape with the auto-generated `scrape_series_added` metric. See [these docs](https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series) for details.
|
||||
|
||||
|
||||
## Stream parsing mode
|
||||
|
||||
By default `vmagent` reads the full response body from scrape target into memory, then parses it, applies [relabeling](#relabeling) and then pushes the resulting metrics to the configured `-remoteWrite.url`. This mode works good for the majority of cases when the scrape target exposes small number of metrics (e.g. less than 10 thousand). But this mode may take big amounts of memory when the scrape target exposes big number of metrics. In this case it is recommended enabling stream parsing mode. When this mode is enabled, then `vmagent` reads response from scrape target in chunks, then immediately processes every chunk and pushes the processed metrics to remote storage. This allows saving memory when scraping targets that expose millions of metrics.
|
||||
|
||||
Stream parsing mode is automatically enabled for scrape targets returning response bodies with sizes bigger than the `-promscrape.minResponseSizeForStreamParse` command-line flag value. Additionally, the stream parsing mode can be explicitly enabled in the following places:
|
||||
|
||||
- Via `-promscrape.streamParse` command-line flag. In this case all the scrape targets defined in the file pointed by `-promscrape.config` are scraped in stream parsing mode.
|
||||
- Via `stream_parse: true` option at `scrape_configs` section. In this case all the scrape targets defined in this section are scraped in stream parsing mode.
|
||||
- Via `__stream_parse__=true` label, which can be set via [relabeling](#relabeling) at `relabel_configs` section. In this case stream parsing mode is enabled for the corresponding scrape targets. Typical use case: to set the label via [Kubernetes annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) for targets exposing big number of metrics.
|
||||
|
||||
Examples:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: 'big-federate'
|
||||
stream_parse: true
|
||||
static_configs:
|
||||
- targets:
|
||||
- big-prometeus1
|
||||
- big-prometeus2
|
||||
honor_labels: true
|
||||
metrics_path: /federate
|
||||
params:
|
||||
'match[]': ['{__name__!=""}']
|
||||
```
|
||||
|
||||
Note that `sample_limit` and `series_limit` options cannot be used in stream parsing mode because the parsed data is pushed to remote storage as soon as it is parsed.
|
||||
|
||||
|
||||
## Scraping big number of targets
|
||||
|
||||
A single `vmagent` instance can scrape tens of thousands of scrape targets. Sometimes this isn't enough due to limitations on CPU, network, RAM, etc.
|
||||
In this case scrape targets can be split among multiple `vmagent` instances (aka `vmagent` horizontal scaling, sharding and clustering).
|
||||
Each `vmagent` instance in the cluster must use identical `-promscrape.config` files with distinct `-promscrape.cluster.memberNum` values.
|
||||
The flag value must be in the range `0 ... N-1`, where `N` is the number of `vmagent` instances in the cluster.
|
||||
The number of `vmagent` instances in the cluster must be passed to `-promscrape.cluster.membersCount` command-line flag. For example, the following commands
|
||||
spread scrape targets among a cluster of two `vmagent` instances:
|
||||
|
||||
```
|
||||
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
|
||||
/path/to/vmagent -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
|
||||
```
|
||||
|
||||
By default each scrape target is scraped only by a single `vmagent` instance in the cluster. If there is a need for replicating scrape targets among multiple `vmagent` instances,
|
||||
then `-promscrape.cluster.replicationFactor` command-line flag must be set to the desired number of replicas. For example, the following commands
|
||||
start a cluster of three `vmagent` instances, where each target is scraped by two `vmagent` instances:
|
||||
|
||||
```
|
||||
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=0 -promscrape.config=/path/to/config.yml ...
|
||||
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=1 -promscrape.config=/path/to/config.yml ...
|
||||
/path/to/vmagent -promscrape.cluster.membersCount=3 -promscrape.cluster.replicationFactor=2 -promscrape.cluster.memberNum=2 -promscrape.config=/path/to/config.yml ...
|
||||
```
|
||||
|
||||
If each target is scraped by multiple `vmagent` instances, then data deduplication must be enabled at remote storage pointed by `-remoteWrite.url`.
|
||||
See [these docs](https://docs.victoriametrics.com/#deduplication) for details.
|
||||
|
||||
|
||||
## Scraping targets via a proxy
|
||||
|
||||
`vmagent` supports scraping targets via http, https and socks5 proxies. Proxy address must be specified in `proxy_url` option. For example, the following scrape config instructs
|
||||
target scraping via https proxy at `https://proxy-addr:1234`:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
proxy_url: https://proxy-addr:1234
|
||||
```
|
||||
|
||||
Proxy can be configured with the following optional settings:
|
||||
|
||||
* `proxy_authorization` for generic token authorization. See [Prometheus docs for details on authorization section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config)
|
||||
* `proxy_bearer_token` and `proxy_bearer_token_file` for Bearer token authorization
|
||||
* `proxy_basic_auth` for Basic authorization. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
|
||||
* `proxy_tls_config` for TLS config. See [these docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config).
|
||||
|
||||
For example:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
proxy_url: https://proxy-addr:1234
|
||||
proxy_basic_auth:
|
||||
username: foobar
|
||||
password: secret
|
||||
proxy_tls_config:
|
||||
insecure_skip_verify: true
|
||||
cert_file: /path/to/cert
|
||||
key_file: /path/to/key
|
||||
ca_file: /path/to/ca
|
||||
server_name: real-server-name
|
||||
```
|
||||
|
||||
## Cardinality limiter
|
||||
|
||||
By default `vmagent` doesn't limit the number of time series each scrape target can expose. The limit can be enforced in the following places:
|
||||
|
||||
- Via `-promscrape.seriesLimitPerTarget` command-line option. This limit is applied individually to all the scrape targets defined in the file pointed by `-promscrape.config`.
|
||||
- Via `series_limit` config option at `scrape_config` section. This limit is applied individually to all the scrape targets defined in the given `scrape_config`.
|
||||
- Via `__series_limit__` label, which can be set with [relabeling](#relabeling) at `relabel_configs` section. This limit is applied to the corresponding scrape targets. Typical use case: to set the limit via [Kubernetes annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) for targets, which may expose too high number of time series.
|
||||
|
||||
All the scraped metrics are dropped for time series exceeding the given limit. The exceeded limit can be [monitored](#monitoring) via `promscrape_series_limit_rows_dropped_total` metric.
|
||||
|
||||
See also `sample_limit` option at [scrape_config section](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
|
||||
|
||||
By default `vmagent` doesn't limit the number of time series written to remote storage systems specified at `-remoteWrite.url`. The limit can be enforced by setting the following command-line flags:
|
||||
|
||||
* `-remoteWrite.maxHourlySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last hour. Useful for limiting the number of active time series.
|
||||
* `-remoteWrite.maxDailySeries` - limits the number of unique time series `vmagent` can write to remote storage systems during the last day. Useful for limiting daily churn rate.
|
||||
|
||||
Both limits can be set simultaneously. If any of these limits is reached, then samples for new time series are dropped instead of sending them to remote storage systems. A sample of dropped series is put in the log with `WARNING` level.
|
||||
|
||||
The exceeded limits can be [monitored](#monitoring) with the following metrics:
|
||||
|
||||
* `vmagent_hourly_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded hourly limit on the number of unique time series.
|
||||
* `vmagent_daily_series_limit_rows_dropped_total` - the number of metrics dropped due to exceeded daily limit on the number of unique time series.
|
||||
|
||||
These limits are approximate, so `vmagent` can underflow/overflow the limit by a small percentage (usually less than 1%).
|
||||
|
||||
|
||||
## Monitoring
|
||||
|
||||
`vmagent` exports various metrics in Prometheus exposition format at `http://vmagent-host:8429/metrics` page. We recommend setting up regular scraping of this page
|
||||
either through `vmagent` itself or by Prometheus so that the exported metrics may be analyzed later.
|
||||
Use official [Grafana dashboard](https://grafana.com/grafana/dashboards/12683) for `vmagent` state overview. Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
|
||||
If you have suggestions for improvements or have found a bug - please open an issue on github or add a review to the dashboard.
|
||||
|
||||
`vmagent` also exports the status for various targets at the following handlers:
|
||||
|
||||
* `http://vmagent-host:8429/targets`. This handler returns human-readable status for every active target.
|
||||
This page is easy to query from the command line with `wget`, `curl` or similar tools.
|
||||
It accepts optional `show_original_labels=1` query arg which shows the original labels per each target before applying the relabeling.
|
||||
This information may be useful for debugging target relabeling.
|
||||
* `http://vmagent-host:8429/api/v1/targets`. This handler returns data compatible with [the corresponding page from Prometheus API](https://prometheus.io/docs/prometheus/latest/querying/api/#targets).
|
||||
|
||||
* `http://vmagent-host:8429/ready`. This handler returns http 200 status code when `vmagent` finishes it's initialization for all service_discovery configs.
|
||||
It may be useful to perform `vmagent` rolling update without any scrape loss.
|
||||
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
* We recommend you [set up the official Grafana dashboard](#monitoring) in order to monitor the state of `vmagent'.
|
||||
|
||||
* We recommend you increase the maximum number of open files in the system (`ulimit -n`) when scraping a big number of targets,
|
||||
as `vmagent` establishes at least a single TCP connection per target.
|
||||
|
||||
* If `vmagent` uses too big amounts of memory, then the following options can help:
|
||||
* Disabling staleness tracking with `-promscrape.noStaleMarkers` option. See [these docs](#prometheus-staleness-markers).
|
||||
* Enabling stream parsing mode if `vmagent` scrapes targets with millions of metrics per target. See [these docs](#stream-parsing-mode).
|
||||
* Reducing the number of output queues with `-remoteWrite.queues` command-line option.
|
||||
* Reducing the amounts of RAM vmagent can use for in-memory buffering with `-memory.allowedPercent` or `-memory.allowedBytes` command-line option. Another option is to reduce memory limits in Docker and/or Kuberntes if `vmagent` runs under these systems.
|
||||
* Reducing the number of CPU cores vmagent can use by passing `GOMAXPROCS=N` environment variable to `vmagent`, where `N` is the desired limit on CPU cores. Another option is to reduce CPU limits in Docker or Kubernetes if `vmagent` runs under these systems.
|
||||
* Passing `-promscrape.dropOriginalLabels` command-line option to `vmagent`, so it drops `"discoveredLabels"` and `"droppedTargets"` lists at `/api/v1/targets` page. This reduces memory usage when scraping big number of targets at the cost of reduced debuggability for improperly configured per-target relabeling.
|
||||
|
||||
* When `vmagent` scrapes many unreliable targets, it can flood the error log with scrape errors. These errors can be suppressed
|
||||
by passing `-promscrape.suppressScrapeErrors` command-line flag to `vmagent`. The most recent scrape error per each target can be observed at `http://vmagent-host:8429/targets`
|
||||
and `http://vmagent-host:8429/api/v1/targets`.
|
||||
|
||||
* The `/api/v1/targets` page could be useful for debugging relabeling process for scrape targets.
|
||||
This page contains original labels for targets dropped during relabeling (see "droppedTargets" section in the page output). By default the `-promscrape.maxDroppedTargets` targets are shown here. If your setup drops more targets during relabeling, then increase `-promscrape.maxDroppedTargets` command-line flag value to see all the dropped targets. Note that tracking each dropped target requires up to 10Kb of RAM. Therefore big values for `-promscrape.maxDroppedTargets` may result in increased memory usage if a big number of scrape targets are dropped during relabeling.
|
||||
|
||||
* We recommend you increase `-remoteWrite.queues` if `vmagent_remotewrite_pending_data_bytes` metric exported at `http://vmagent-host:8429/metrics` page grows constantly. It is also recommended increasing `-remoteWrite.maxBlockSize` and `-remoteWrite.maxRowsPerBlock` command-line options in this case. This can improve data ingestion performance to the configured remote storage systems at the cost of higher memory usage.
|
||||
|
||||
* If you see gaps in the data pushed by `vmagent` to remote storage when `-remoteWrite.maxDiskUsagePerURL` is set, try increasing `-remoteWrite.queues`. Such gaps may appear because `vmagent` cannot keep up with sending the collected data to remote storage. Therefore it starts dropping the buffered data if the on-disk buffer size exceeds `-remoteWrite.maxDiskUsagePerURL`.
|
||||
|
||||
* `vmagent` drops data blocks if remote storage replies with `400 Bad Request` and `409 Conflict` HTTP responses. The number of dropped blocks can be monitored via `vmagent_remotewrite_packets_dropped_total` metric exported at [/metrics page](#monitoring).
|
||||
|
||||
* Use `-remoteWrite.queues=1` when `-remoteWrite.url` points to remote storage, which doesn't accept out-of-order samples (aka data backfilling). Such storage systems include Prometheus, Cortex and Thanos, which typically emit `out of order sample` errors. The best solution is to use remote storage with [backfilling support](https://docs.victoriametrics.com/#backfilling).
|
||||
|
||||
* `vmagent` buffers scraped data at the `-remoteWrite.tmpDataPath` directory until it is sent to `-remoteWrite.url`.
|
||||
The directory can grow large when remote storage is unavailable for extended periods of time and if `-remoteWrite.maxDiskUsagePerURL` isn't set.
|
||||
If you don't want to send all the data from the directory to remote storage then simply stop `vmagent` and delete the directory.
|
||||
|
||||
* By default `vmagent` masks `-remoteWrite.url` with `secret-url` values in logs and at `/metrics` page because
|
||||
the url may contain sensitive information such as auth tokens or passwords.
|
||||
Pass `-remoteWrite.showURL` command-line flag when starting `vmagent` in order to see all the valid urls.
|
||||
|
||||
* By default `vmagent` evenly spreads scrape load in time. If a particular scrape target must be scraped at the beginning of some interval,
|
||||
then `scrape_align_interval` option must be used. For example, the following config aligns hourly scrapes to the beginning of hour:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
scrape_interval: 1h
|
||||
scrape_align_interval: 1h
|
||||
```
|
||||
|
||||
* By default `vmagent` evenly spreads scrape load in time. If a particular scrape target must be scraped at specific offset, then `scrape_offset` option must be used.
|
||||
For example, the following config instructs `vmagent` to scrape the target at 10 seconds of every minute:
|
||||
|
||||
```yml
|
||||
scrape_configs:
|
||||
- job_name: foo
|
||||
scrape_interval: 1m
|
||||
scrape_offset: 10s
|
||||
```
|
||||
|
||||
* If you see `skipping duplicate scrape target with identical labels` errors when scraping Kubernetes pods, then it is likely these pods listen to multiple ports
|
||||
or they use an init container. These errors can either be fixed or suppressed with the `-promscrape.suppressDuplicateScrapeTargetErrors` command-line flag.
|
||||
See the available options below if you prefer fixing the root cause of the error:
|
||||
|
||||
The following relabeling rule may be added to `relabel_configs` section in order to filter out pods with unneeded ports:
|
||||
```yml
|
||||
- action: keep_if_equal
|
||||
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
|
||||
```
|
||||
|
||||
The following relabeling rule may be added to `relabel_configs` section in order to filter out init container pods:
|
||||
```yml
|
||||
- action: drop
|
||||
source_labels: [__meta_kubernetes_pod_container_init]
|
||||
regex: true
|
||||
```
|
||||
|
||||
## Kafka integration
|
||||
|
||||
[Enterprise version](https://victoriametrics.com/products/enterprise/) of `vmagent` can read and write metrics from / to Kafka:
|
||||
|
||||
* [Reading metrics from Kafka](#reading-metrics-from-kafka)
|
||||
* [Writing metrics to Kafka](#writing-metrics-to-kafka)
|
||||
|
||||
The enterprise version of vmagent is available for evaluation at [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) page in `vmutils-*-enteprise.tar.gz` archives and in [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
|
||||
|
||||
|
||||
### Reading metrics from Kafka
|
||||
|
||||
[Enterprise version](https://victoriametrics.com/products/enterprise/) of `vmagent` can read metrics in various formats from Kafka messages. These formats can be configured with `-kafka.consumer.topic.defaultFormat` or `-kafka.consumer.topic.format` command-line options. The following formats are supported:
|
||||
|
||||
* `promremotewrite` - [Prometheus remote_write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write). Messages in this format can be sent by vmagent - see [these docs](#writing-metrics-to-kafka).
|
||||
* `influx` - [InfluxDB line protocol format](https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/).
|
||||
* `prometheus` - [Prometheus text exposition format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format) and [OpenMetrics format](https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md).
|
||||
* `graphite` - [Graphite plaintext format](https://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol).
|
||||
* `jsonline` - [JSON line format](https://docs.victoriametrics.com/#how-to-import-data-in-json-line-format).
|
||||
|
||||
Every Kafka message may contain multiple lines in `influx`, `prometheus`, `graphite` and `jsonline` format delimited by `\n`.
|
||||
|
||||
`vmagent` consumes messages from Kafka topics specified by `-kafka.consumer.topic` command-line flag. Multiple topics can be specified by passing multiple `-kafka.consumer.topic` command-line flags to `vmagent`.
|
||||
|
||||
`vmagent` consumes messages from Kafka brokers specified by `-kafka.consumer.topic.brokers` command-line flag. Multiple brokers can be specified per each `-kafka.consumer.topic` by passing a list of brokers delimited by `;`. For example, `-kafka.consumer.topic.brokers=host1:9092;host2:9092`.
|
||||
|
||||
The following command starts `vmagent`, which reads metrics in InfluxDB line protocol format from Kafka broker at `localhost:9092` from the topic `metrics-by-telegraf` and sends them to remote storage at `http://localhost:8428/api/v1/write`:
|
||||
|
||||
```bash
|
||||
./bin/vmagent -remoteWrite.url=http://localhost:8428/api/v1/write \
|
||||
-kafka.consumer.topic.brokers=localhost:9092 \
|
||||
-kafka.consumer.topic.format=influx \
|
||||
-kafka.consumer.topic=metrics-by-telegraf \
|
||||
-kafka.consumer.topic.groupID=some-id
|
||||
```
|
||||
|
||||
It is expected that [Telegraf](https://github.com/influxdata/telegraf) sends metrics to the `metrics-by-telegraf` topic with the following config:
|
||||
|
||||
```yaml
|
||||
[[outputs.kafka]]
|
||||
brokers = ["localhost:9092"]
|
||||
topic = "influx"
|
||||
data_format = "influx"
|
||||
```
|
||||
|
||||
|
||||
#### Command-line flags for Kafka consumer
|
||||
|
||||
These command-line flags are available only in [enterprise](https://victoriametrics.com/products/enterprise/) version of `vmagent`, which can be downloaded for evaluation from [releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) page (see `vmutils-*-enteprise.tar.gz` archives) and from [docker images](https://hub.docker.com/r/victoriametrics/vmagent/tags) with tags containing `enterprise` suffix.
|
||||
|
||||
```
|
||||
-kafka.consumer.topic array
|
||||
Kafka topic names for data consumption.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.basicAuth.password array
|
||||
Optional basic auth password for -kafka.consumer.topic. Must be used in conjunction with any supported auth methods for kafka client, specified by flag -kafka.consumer.topic.options='security.protocol=SASL_SSL;sasl.mechanisms=PLAIN'
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.basicAuth.username array
|
||||
Optional basic auth username for -kafka.consumer.topic. Must be used in conjunction with any supported auth methods for kafka client, specified by flag -kafka.consumer.topic.options='security.protocol=SASL_SSL;sasl.mechanisms=PLAIN'
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.brokers array
|
||||
List of brokers to connect for given topic, e.g. -kafka.consumer.topic.broker=host-1:9092;host-2:9092
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.defaultFormat string
|
||||
Expected data format in the topic if -kafka.consumer.topic.format is skipped. (default "promremotewrite")
|
||||
-kafka.consumer.topic.format array
|
||||
data format for corresponding kafka topic. Valid formats: influx, prometheus, promremotewrite, graphite, jsonline
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.groupID array
|
||||
Defines group.id for topic
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.isGzipped array
|
||||
Enables gzip setting for topic messages payload. Only prometheus, jsonline and influx formats accept gzipped messages.
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-kafka.consumer.topic.options array
|
||||
Optional key=value;key1=value2 settings for topic consumer. See full configuration options at https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
```
|
||||
|
||||
### Writing metrics to Kafka
|
||||
|
||||
[Enterprise version](https://victoriametrics.com/products/enterprise/) of `vmagent` writes data to Kafka with `at-least-once` semantics if `-remoteWrite.url` contains e.g. Kafka url. For example, if `vmagent` is started with `-remoteWrite.url=kafka://localhost:9092/?topic=prom-rw`, then it would send Prometheus remote_write messages to Kafka bootstrap server at `localhost:9092` with the topic `prom-rw`. These messages can be read later from Kafka by another `vmagent` - see [these docs](#reading-metrics-from-kafka) for details.
|
||||
|
||||
Additional Kafka options can be passed as query params to `-remoteWrite.url`. For instance, `kafka://localhost:9092/?topic=prom-rw&client.id=my-favorite-id` sets `client.id` Kafka option to `my-favorite-id`. The full list of Kafka options is available [here](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md).
|
||||
|
||||
|
||||
#### Kafka broker authorization and authentication
|
||||
|
||||
Two types of auth are supported:
|
||||
|
||||
* sasl with username and password:
|
||||
|
||||
```bash
|
||||
./bin/vmagent -remoteWrite.url=kafka://localhost:9092/?topic=prom-rw&security.protocol=SASL_SSL&sasl.mechanisms=PLAIN -remoteWrite.basicAuth.username=user -remoteWrite.basicAuth.password=password
|
||||
```
|
||||
|
||||
* tls certificates:
|
||||
|
||||
```bash
|
||||
./bin/vmagent -remoteWrite.url=kafka://localhost:9092/?topic=prom-rw&security.protocol=SSL -remoteWrite.tlsCAFile=/opt/ca.pem -remoteWrite.tlsCertFile=/opt/cert.pem -remoteWrite.tlsKeyFile=/opt/key.pem
|
||||
```
|
||||
|
||||
|
||||
## How to build from sources
|
||||
|
||||
We recommend using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmagent` is located in the `vmutils-*` archives .
|
||||
|
||||
|
||||
### Development build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmagent` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds the `vmagent` binary and puts it into the `bin` folder.
|
||||
|
||||
### Production build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmagent-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmagent-prod` binary and puts it into the `bin` folder.
|
||||
|
||||
### Building docker images
|
||||
|
||||
Run `make package-vmagent`. It builds `victoriametrics/vmagent:<PKG_TAG>` docker image locally.
|
||||
`<PKG_TAG>` is an auto-generated image tag, which depends on source code in [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmagent`.
|
||||
|
||||
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
|
||||
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
|
||||
|
||||
```bash
|
||||
ROOT_IMAGE=scratch make package-vmagent
|
||||
```
|
||||
|
||||
### ARM build
|
||||
|
||||
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
|
||||
|
||||
### Development ARM build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmagent-arm` or `make vmagent-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics)
|
||||
It builds `vmagent-arm` or `vmagent-arm64` binary respectively and puts it into the `bin` folder.
|
||||
|
||||
### Production ARM build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmagent-arm-prod` or `make vmagent-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmagent-arm-prod` or `vmagent-arm64-prod` binary respectively and puts it into the `bin` folder.
|
||||
|
||||
|
||||
## Profiling
|
||||
|
||||
`vmagent` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):
|
||||
|
||||
* Memory profile can be collected with the following command:
|
||||
|
||||
```bash
|
||||
curl -s http://<vmagent-host>:8429/debug/pprof/heap > mem.pprof
|
||||
```
|
||||
|
||||
* CPU profile can be collected with the following command:
|
||||
|
||||
```bash
|
||||
curl -s http://<vmagent-host>:8429/debug/pprof/profile > cpu.pprof
|
||||
```
|
||||
|
||||
The command for collecting CPU profile waits for 30 seconds before returning.
|
||||
|
||||
The collected profiles may be analyzed with [go tool pprof](https://github.com/google/pprof).
|
||||
|
||||
|
||||
## Advanced usage
|
||||
|
||||
`vmagent` can be fine-tuned with various command-line flags. Run `./vmagent -help` in order to see the full list of these flags with their desciptions and default values:
|
||||
|
||||
```
|
||||
./vmagent -help
|
||||
|
||||
vmagent collects metrics data via popular data ingestion protocols and routes them to VictoriaMetrics.
|
||||
|
||||
See the docs at https://docs.victoriametrics.com/vmagent.html .
|
||||
|
||||
-configAuthKey string
|
||||
Authorization key for accessing /config page. It must be passed via authKey query arg
|
||||
-csvTrimTimestamp duration
|
||||
Trim timestamps when importing csv data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-datadog.maxInsertRequestSize size
|
||||
The maximum size in bytes of a single DataDog POST request to /api/v1/series
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 67108864)
|
||||
-dryRun
|
||||
Whether to check only config files without running vmagent. The following files are checked: -promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig . Unknown config entries are allowed in -promscrape.config by default. This can be changed with -promscrape.config.strictParse
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-graphiteListenAddr string
|
||||
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
|
||||
-graphiteTrimTimestamp duration
|
||||
Trim timestamps for Graphite data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections. Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. Note that /targets and /metrics pages aren't available if -httpListenAddr='' (default ":8429")
|
||||
-import.maxLineLen size
|
||||
The maximum length in bytes of a single line accepted by /api/v1/import; the line length can be limited with 'max_rows_per_line' query arg passed to /api/v1/export
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 104857600)
|
||||
-influx.databaseNames array
|
||||
Comma-separated list of database names to return from /query and /influx/query API. This can be needed for accepting data from Telegraf plugins such as https://github.com/fangli/fluent-plugin-influxdb
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-influx.maxLineSize size
|
||||
The maximum size in bytes for a single InfluxDB line during parsing
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 262144)
|
||||
-influxListenAddr string
|
||||
TCP and UDP address to listen for InfluxDB line protocol data. Usually :8189 must be set. Doesn't work if empty. This flag isn't needed when ingesting data over HTTP - just send it to http://<vmagent>:8429/write
|
||||
-influxMeasurementFieldSeparator string
|
||||
Separator for '{measurement}{separator}{field_name}' metric name when inserted via InfluxDB line protocol (default "_")
|
||||
-influxSkipMeasurement
|
||||
Uses '{field_name}' as a metric name while ignoring '{measurement}' and '-influxMeasurementFieldSeparator'
|
||||
-influxSkipSingleField
|
||||
Uses '{measurement}' instead of '{measurement}{separator}{field_name}' for metic name if InfluxDB line contains only a single field
|
||||
-influxTrimTimestamp duration
|
||||
Trim timestamps for InfluxDB line protocol data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-insert.maxQueueDuration duration
|
||||
The maximum duration for waiting in the queue for insert requests due to -maxConcurrentInserts (default 1m0s)
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-maxConcurrentInserts int
|
||||
The maximum number of concurrent inserts. Default value should work for most cases, since it minimizes the overhead for concurrent inserts. This option is tigthly coupled with -insert.maxQueueDuration (default 16)
|
||||
-maxInsertRequestSize size
|
||||
The maximum size in bytes of a single Prometheus remote_write API request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-opentsdbHTTPListenAddr string
|
||||
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbListenAddr string
|
||||
TCP and UDP address to listen for OpentTSDB metrics. Telnet put messages and HTTP /api/put messages are simultaneously served on TCP port. Usually :4242 must be set. Doesn't work if empty
|
||||
-opentsdbTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB 'telnet put' data to this duration. Minimum practical duration is 1s. Higher duration (i.e. 1m) may be used for reducing disk space usage for timestamp data (default 1s)
|
||||
-opentsdbhttp.maxInsertRequestSize size
|
||||
The maximum size of OpenTSDB HTTP put request
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 33554432)
|
||||
-opentsdbhttpTrimTimestamp duration
|
||||
Trim timestamps for OpenTSDB HTTP data to this duration. Minimum practical duration is 1ms. Higher duration (i.e. 1s) may be used for reducing disk space usage for timestamp data (default 1ms)
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-promscrape.cluster.memberNum int
|
||||
The number of number in the cluster of scrapers. It must be an unique value in the range 0 ... promscrape.cluster.membersCount-1 across scrapers in the cluster
|
||||
-promscrape.cluster.membersCount int
|
||||
The number of members in a cluster of scrapers. Each member must have an unique -promscrape.cluster.memberNum in the range 0 ... promscrape.cluster.membersCount-1 . Each member then scrapes roughly 1/N of all the targets. By default cluster scraping is disabled, i.e. a single scraper scrapes all the targets
|
||||
-promscrape.cluster.replicationFactor int
|
||||
The number of members in the cluster, which scrape the same targets. If the replication factor is greater than 2, then the deduplication must be enabled at remote storage side. See https://docs.victoriametrics.com/#deduplication (default 1)
|
||||
-promscrape.config string
|
||||
Optional path to Prometheus config file with 'scrape_configs' section containing targets to scrape. The path can point to local file and to http url. See https://docs.victoriametrics.com/#how-to-scrape-prometheus-exporters-such-as-node-exporter for details
|
||||
-promscrape.config.dryRun
|
||||
Checks -promscrape.config file for errors and unsupported fields and then exits. Returns non-zero exit code on parsing errors and emits these errors to stderr. See also -promscrape.config.strictParse command-line flag. Pass -loggerLevel=ERROR if you don't need to see info messages in the output.
|
||||
-promscrape.config.strictParse
|
||||
Whether to allow only supported fields in -promscrape.config . By default unsupported fields are silently skipped
|
||||
-promscrape.configCheckInterval duration
|
||||
Interval for checking for changes in '-promscrape.config' file. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-promscrape.consul.waitTime duration
|
||||
Wait time used by Consul service discovery. Default value is used if not set
|
||||
-promscrape.consulSDCheckInterval duration
|
||||
Interval for checking for changes in Consul. This works only if consul_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config for details (default 30s)
|
||||
-promscrape.digitaloceanSDCheckInterval duration
|
||||
Interval for checking for changes in digital ocean. This works only if digitalocean_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#digitalocean_sd_config for details (default 1m0s)
|
||||
-promscrape.disableCompression
|
||||
Whether to disable sending 'Accept-Encoding: gzip' request headers to all the scrape targets. This may reduce CPU usage on scrape targets at the cost of higher network bandwidth utilization. It is possible to set 'disable_compression: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.disableKeepAlive
|
||||
Whether to disable HTTP keep-alive connections when scraping all the targets. This may be useful when targets has no support for HTTP keep-alive connection. It is possible to set 'disable_keepalive: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control. Note that disabling HTTP keep-alive may increase load on both vmagent and scrape targets
|
||||
-promscrape.discovery.concurrency int
|
||||
The maximum number of concurrent requests to Prometheus autodiscovery API (Consul, Kubernetes, etc.) (default 100)
|
||||
-promscrape.discovery.concurrentWaitTime duration
|
||||
The maximum duration for waiting to perform API requests if more than -promscrape.discovery.concurrency requests are simultaneously performed (default 1m0s)
|
||||
-promscrape.dnsSDCheckInterval duration
|
||||
Interval for checking for changes in dns. This works only if dns_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config for details (default 30s)
|
||||
-promscrape.dockerSDCheckInterval duration
|
||||
Interval for checking for changes in docker. This works only if docker_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#docker_sd_config for details (default 30s)
|
||||
-promscrape.dockerswarmSDCheckInterval duration
|
||||
Interval for checking for changes in dockerswarm. This works only if dockerswarm_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config for details (default 30s)
|
||||
-promscrape.dropOriginalLabels
|
||||
Whether to drop original labels for scrape targets at /targets and /api/v1/targets pages. This may be needed for reducing memory usage when original labels for big number of scrape targets occupy big amounts of memory. Note that this reduces debuggability for improper per-target relabeling configs
|
||||
-promscrape.ec2SDCheckInterval duration
|
||||
Interval for checking for changes in ec2. This works only if ec2_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config for details (default 1m0s)
|
||||
-promscrape.eurekaSDCheckInterval duration
|
||||
Interval for checking for changes in eureka. This works only if eureka_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#eureka_sd_config for details (default 30s)
|
||||
-promscrape.fileSDCheckInterval duration
|
||||
Interval for checking for changes in 'file_sd_config'. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config for details (default 30s)
|
||||
-promscrape.gceSDCheckInterval duration
|
||||
Interval for checking for changes in gce. This works only if gce_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#gce_sd_config for details (default 1m0s)
|
||||
-promscrape.httpSDCheckInterval duration
|
||||
Interval for checking for changes in http endpoint service discovery. This works only if http_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config for details (default 1m0s)
|
||||
-promscrape.kubernetes.apiServerTimeout duration
|
||||
How frequently to reload the full state from Kuberntes API server (default 30m0s)
|
||||
-promscrape.kubernetesSDCheckInterval duration
|
||||
Interval for checking for changes in Kubernetes API server. This works only if kubernetes_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config for details (default 30s)
|
||||
-promscrape.maxDroppedTargets int
|
||||
The maximum number of droppedTargets to show at /api/v1/targets page. Increase this value if your setup drops more scrape targets during relabeling and you need investigating labels for all the dropped targets. Note that the increased number of tracked dropped targets may result in increased memory usage (default 1000)
|
||||
-promscrape.maxResponseHeadersSize size
|
||||
The maximum size of http response headers from Prometheus scrape targets
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 4096)
|
||||
-promscrape.maxScrapeSize size
|
||||
The maximum size of scrape response in bytes to process from Prometheus targets. Bigger responses are rejected
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 16777216)
|
||||
-promscrape.minResponseSizeForStreamParse size
|
||||
The minimum target response size for automatic switching to stream parsing mode, which can reduce memory usage. See https://docs.victoriametrics.com/vmagent.html#stream-parsing-mode
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 1000000)
|
||||
-promscrape.noStaleMarkers
|
||||
Whether to disable sending Prometheus stale markers for metrics when scrape target disappears. This option may reduce memory usage if stale markers aren't needed for your setup. This option also disables populating the scrape_series_added metric. See https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series
|
||||
-promscrape.openstackSDCheckInterval duration
|
||||
Interval for checking for changes in openstack API server. This works only if openstack_sd_configs is configured in '-promscrape.config' file. See https://prometheus.io/docs/prometheus/latest/configuration/configuration/#openstack_sd_config for details (default 30s)
|
||||
-promscrape.seriesLimitPerTarget int
|
||||
Optional limit on the number of unique time series a single scrape target can expose. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter for more info
|
||||
-promscrape.streamParse
|
||||
Whether to enable stream parsing for metrics obtained from scrape targets. This may be useful for reducing memory usage when millions of metrics are exposed per each scrape target. It is posible to set 'stream_parse: true' individually per each 'scrape_config' section in '-promscrape.config' for fine grained control
|
||||
-promscrape.suppressDuplicateScrapeTargetErrors
|
||||
Whether to suppress 'duplicate scrape target' errors; see https://docs.victoriametrics.com/vmagent.html#troubleshooting for details
|
||||
-promscrape.suppressScrapeErrors
|
||||
Whether to suppress scrape errors logging. The last error for each target is always available at '/targets' page even if scrape errors logging is suppressed
|
||||
-remoteWrite.basicAuth.password array
|
||||
Optional basic auth password to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.basicAuth.passwordFile array
|
||||
Optional path to basic auth password to use for -remoteWrite.url. The file is re-read every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.basicAuth.username array
|
||||
Optional basic auth username to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.bearerToken array
|
||||
Optional bearer auth token to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.bearerTokenFile array
|
||||
Optional path to bearer token file to use for -remoteWrite.url. The token is re-read from the file every second. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.flushInterval duration
|
||||
Interval for flushing the data to remote storage. This option takes effect only when less than 10K data points per second are pushed to -remoteWrite.url (default 1s)
|
||||
-remoteWrite.label array
|
||||
Optional label in the form 'name=value' to add to all the metrics before sending them to -remoteWrite.url. Pass multiple -remoteWrite.label flags in order to add multiple labels to metrics before sending them to remote storage
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.maxBlockSize size
|
||||
The maximum block size to send to remote storage. Bigger blocks may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxRowsPerBlock
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 8388608)
|
||||
-remoteWrite.maxDailySeries int
|
||||
The maximum number of unique series vmagent can send to remote storage systems during the last 24 hours. Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter
|
||||
-remoteWrite.maxDiskUsagePerURL size
|
||||
The maximum file-based buffer size in bytes at -remoteWrite.tmpDataPath for each -remoteWrite.url. When buffer size reaches the configured maximum, then old data is dropped when adding new data to the buffer. Buffered data is stored in ~500MB chunks, so the minimum practical value for this flag is 500000000. Disk usage is unlimited if the value is set to 0
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-remoteWrite.maxHourlySeries int
|
||||
The maximum number of unique series vmagent can send to remote storage systems during the last hour. Excess series are logged and dropped. This can be useful for limiting series cardinality. See https://docs.victoriametrics.com/vmagent.html#cardinality-limiter
|
||||
-remoteWrite.maxRowsPerBlock int
|
||||
The maximum number of samples to send in each block to remote storage. Higher number may improve performance at the cost of the increased memory usage. See also -remoteWrite.maxBlockSize (default 10000)
|
||||
-remoteWrite.multitenantURL array
|
||||
Base path for multitenant remote storage URL to write data to. See https://docs.victoriametrics.com/vmagent.html#multitenancy for details. Example url: http://<vminsert>:8480 . Pass multiple -remoteWrite.multitenantURL flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.clientID array
|
||||
Optional OAuth2 clientID to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.clientSecret array
|
||||
Optional OAuth2 clientSecret to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.clientSecretFile array
|
||||
Optional OAuth2 clientSecretFile to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.scopes array
|
||||
Optional OAuth2 scopes to use for -remoteWrite.url. Scopes must be delimited by ';'. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.oauth2.tokenUrl array
|
||||
Optional OAuth2 tokenURL to use for -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.proxyURL array
|
||||
Optional proxy URL for writing data to -remoteWrite.url. Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.queues int
|
||||
The number of concurrent queues to each -remoteWrite.url. Set more queues if default number of queues isn't enough for sending high volume of collected data to remote storage. Default value is 2 * numberOfAvailableCPUs (default 8)
|
||||
-remoteWrite.rateLimit array
|
||||
Optional rate limit in bytes per second for data sent to -remoteWrite.url. By default the rate limit is disabled. It can be useful for limiting load on remote storage when big amounts of buffered data is sent after temporary unavailability of the remote storage
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.relabelConfig string
|
||||
Optional path to file with relabel_config entries. The path can point either to local file or to http url. These entries are applied to all the metrics before sending them to -remoteWrite.url. See https://docs.victoriametrics.com/vmagent.html#relabeling for details
|
||||
-remoteWrite.relabelDebug
|
||||
Whether to log metrics before and after relabeling with -remoteWrite.relabelConfig. If the -remoteWrite.relabelDebug is enabled, then the metrics aren't sent to remote storage. This is useful for debugging the relabeling configs
|
||||
-remoteWrite.roundDigits array
|
||||
Round metric values to this number of decimal digits after the point before writing them to remote storage. Examples: -remoteWrite.roundDigits=2 would round 1.236 to 1.24, while -remoteWrite.roundDigits=-1 would round 126.78 to 130. By default digits rounding is disabled. Set it to 100 for disabling it for a particular remote storage. This option may be used for improving data compression for the stored metrics
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.sendTimeout array
|
||||
Timeout for sending a single block of data to -remoteWrite.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.showURL
|
||||
Whether to show -remoteWrite.url in the exported metrics. It is hidden by default, since it can contain sensitive info such as auth key
|
||||
-remoteWrite.significantFigures array
|
||||
The number of significant figures to leave in metric values before writing them to remote storage. See https://en.wikipedia.org/wiki/Significant_figures . Zero value saves all the significant figures. This option may be used for improving data compression for the stored metrics. See also -remoteWrite.roundDigits
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.tlsCAFile array
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.tlsCertFile array
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.tlsInsecureSkipVerify array
|
||||
Whether to skip tls verification when connecting to -remoteWrite.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.tlsKeyFile array
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.tlsServerName array
|
||||
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used. If multiple args are set, then they are applied independently for the corresponding -remoteWrite.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.tmpDataPath string
|
||||
Path to directory where temporary data for remote write component is stored. See also -remoteWrite.maxDiskUsagePerURL (default "vmagent-remotewrite-data")
|
||||
-remoteWrite.url array
|
||||
Remote storage URL to write data to. It must support Prometheus remote_write API. It is recommended using VictoriaMetrics as remote storage. Example url: http://<victoriametrics-host>:8428/api/v1/write . Pass multiple -remoteWrite.url flags in order to replicate data to multiple remote storage systems. See also -remoteWrite.multitenantURL
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.urlRelabelConfig array
|
||||
Optional path to relabel config for the corresponding -remoteWrite.url. The path can point either to local file or to http url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-remoteWrite.urlRelabelDebug array
|
||||
Whether to log metrics before and after relabeling with -remoteWrite.urlRelabelConfig. If the -remoteWrite.urlRelabelDebug is enabled, then the metrics aren't sent to the corresponding -remoteWrite.url. This is useful for debugging the relabeling configs
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-sortLabels
|
||||
Whether to sort labels for incoming samples before writing them to all the configured remote storage systems. This may be needed for reducing memory usage at remote storage when the order of labels in incoming samples is random. For example, if m{k1="v1",k2="v2"} may be sent as m{k2="v2",k1="v1"}Enabled sorting for labels can slow down ingestion performance a bit
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
BIN
vmagent.png
Normal file
|
After Width: | Height: | Size: 69 KiB |
780
vmalert.md
Normal file
@@ -0,0 +1,780 @@
|
||||
---
|
||||
sort: 4
|
||||
---
|
||||
|
||||
# vmalert
|
||||
|
||||
`vmalert` executes a list of the given [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)
|
||||
or [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
|
||||
rules against configured `-datasource.url`. For sending alerting notifications
|
||||
vmalert relies on [Alertmanager]((https://github.com/prometheus/alertmanager)) configured via `-notifier.url` flag.
|
||||
Recording rules results are persisted via [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
|
||||
protocol and require `-remoteWrite.url` to be configured.
|
||||
Vmalert is heavily inspired by [Prometheus](https://prometheus.io/docs/alerting/latest/overview/)
|
||||
implementation and aims to be compatible with its syntax.
|
||||
|
||||
## Features
|
||||
* Integration with [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics) TSDB;
|
||||
* VictoriaMetrics [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html)
|
||||
support and expressions validation;
|
||||
* Prometheus [alerting rules definition format](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#defining-alerting-rules)
|
||||
support;
|
||||
* Integration with [Alertmanager](https://github.com/prometheus/alertmanager);
|
||||
* Keeps the alerts [state on restarts](#alerts-state-on-restarts);
|
||||
* Graphite datasource can be used for alerting and recording rules. See [these docs](#graphite);
|
||||
* Recording and Alerting rules backfilling (aka `replay`). See [these docs](#rules-backfilling);
|
||||
* Lightweight without extra dependencies.
|
||||
|
||||
## Limitations
|
||||
* `vmalert` execute queries against remote datasource which has reliability risks because of the network.
|
||||
It is recommended to configure alerts thresholds and rules expressions with the understanding that network
|
||||
requests may fail;
|
||||
* by default, rules execution is sequential within one group, but persistence of execution results to remote
|
||||
storage is asynchronous. Hence, user shouldn't rely on chaining of recording rules when result of previous
|
||||
recording rule is reused in the next one;
|
||||
|
||||
## QuickStart
|
||||
|
||||
To build `vmalert` from sources:
|
||||
```
|
||||
git clone https://github.com/VictoriaMetrics/VictoriaMetrics
|
||||
cd VictoriaMetrics
|
||||
make vmalert
|
||||
```
|
||||
The build binary will be placed in `VictoriaMetrics/bin` folder.
|
||||
|
||||
To start using `vmalert` you will need the following things:
|
||||
* list of rules - PromQL/MetricsQL expressions to execute;
|
||||
* datasource address - reachable MetricsQL endpoint to run queries against;
|
||||
* notifier address [optional] - reachable [Alert Manager](https://github.com/prometheus/alertmanager) instance for processing,
|
||||
aggregating alerts, and sending notifications.
|
||||
* remote write address [optional] - [remote write](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations)
|
||||
compatible storage to persist rules and alerts state info;
|
||||
* remote read address [optional] - MetricsQL compatible datasource to restore alerts state from.
|
||||
|
||||
Then configure `vmalert` accordingly:
|
||||
```
|
||||
./bin/vmalert -rule=alert.rules \ # Path to the file with rules configuration. Supports wildcard
|
||||
-datasource.url=http://localhost:8428 \ # PromQL compatible datasource
|
||||
-notifier.url=http://localhost:9093 \ # AlertManager URL (required if alerting rules are used)
|
||||
-notifier.url=http://127.0.0.1:9093 \ # AlertManager replica URL
|
||||
-remoteWrite.url=http://localhost:8428 \ # Remote write compatible storage to persist rules and alerts state info (required if recording rules are used)
|
||||
-remoteRead.url=http://localhost:8428 \ # MetricsQL compatible datasource to restore alerts state from
|
||||
-external.label=cluster=east-1 \ # External label to be applied for each rule
|
||||
-external.label=replica=a # Multiple external labels may be set
|
||||
```
|
||||
|
||||
Note there's a separate `remoteRead.url` to allow writing results of
|
||||
alerting/recording rules into a different storage than the initial data that's
|
||||
queried. This allows using `vmalert` to aggregate data from a short-term,
|
||||
high-frequency, high-cardinality storage into a long-term storage with
|
||||
decreased cardinality and a bigger interval between samples.
|
||||
|
||||
See the full list of configuration flags in [configuration](#configuration) section.
|
||||
|
||||
If you run multiple `vmalert` services for the same datastore or AlertManager - do not forget
|
||||
to specify different `external.label` flags in order to define which `vmalert` generated rules or alerts.
|
||||
|
||||
Configuration for [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/)
|
||||
and [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) rules is very
|
||||
similar to Prometheus rules and configured using YAML. Configuration examples may be found
|
||||
in [testdata](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/config/testdata) folder.
|
||||
Every `rule` belongs to a `group` and every configuration file may contain arbitrary number of groups:
|
||||
```yaml
|
||||
groups:
|
||||
[ - <rule_group> ]
|
||||
```
|
||||
|
||||
### Groups
|
||||
|
||||
Each group has the following attributes:
|
||||
```yaml
|
||||
# The name of the group. Must be unique within a file.
|
||||
name: <string>
|
||||
|
||||
# How often rules in the group are evaluated.
|
||||
[ interval: <duration> | default = -evaluationInterval flag ]
|
||||
|
||||
# How many rules execute at once within a group. Increasing concurrency may speed
|
||||
# up round execution speed.
|
||||
[ concurrency: <integer> | default = 1 ]
|
||||
|
||||
# Optional type for expressions inside the rules. Supported values: "graphite" and "prometheus".
|
||||
# By default "prometheus" type is used.
|
||||
[ type: <string> ]
|
||||
|
||||
# Warning: DEPRECATED
|
||||
# Please use `params` instead:
|
||||
# params:
|
||||
# extra_label: ["job=nodeexporter", "env=prod"]
|
||||
extra_filter_labels:
|
||||
[ <labelname>: <labelvalue> ... ]
|
||||
|
||||
# Optional list of HTTP URL parameters
|
||||
# applied for all rules requests within a group
|
||||
# For example:
|
||||
# params:
|
||||
# nocache: ["1"] # disable caching for vmselect
|
||||
# denyPartialResponse: ["true"] # fail if one or more vmstorage nodes returned an error
|
||||
# extra_label: ["env=dev"] # apply additional label filter "env=dev" for all requests
|
||||
# see more details at https://docs.victoriametrics.com#prometheus-querying-api-enhancements
|
||||
params:
|
||||
[ <string>: [<string>, ...]]
|
||||
|
||||
# Optional list of labels added to every rule within a group.
|
||||
# It has priority over the external labels.
|
||||
# Labels are commonly used for adding environment
|
||||
# or tenant-specific tag.
|
||||
labels:
|
||||
[ <labelname>: <labelvalue> ... ]
|
||||
|
||||
rules:
|
||||
[ - <rule> ... ]
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
Every rule contains `expr` field for [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/)
|
||||
or [MetricsQL](https://docs.victoriametrics.com/MetricsQL.html) expression. Vmalert will execute the configured
|
||||
expression and then act according to the Rule type.
|
||||
|
||||
There are two types of Rules:
|
||||
* [alerting](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) -
|
||||
Alerting rules allow defining alert conditions via `expr` field and to send notifications to
|
||||
[Alertmanager](https://github.com/prometheus/alertmanager) if execution result is not empty.
|
||||
* [recording](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) -
|
||||
Recording rules allow defining `expr` which result will be then backfilled to configured
|
||||
`-remoteWrite.url`. Recording rules are used to precompute frequently needed or computationally
|
||||
expensive expressions and save their result as a new set of time series.
|
||||
|
||||
`vmalert` forbids defining duplicates - rules with the same combination of name, expression, and labels
|
||||
within one group.
|
||||
|
||||
#### Alerting rules
|
||||
|
||||
The syntax for alerting rule is the following:
|
||||
```yaml
|
||||
# The name of the alert. Must be a valid metric name.
|
||||
alert: <string>
|
||||
|
||||
# The expression to evaluate. The expression language depends on the type value.
|
||||
# By default PromQL/MetricsQL expression is used. If group.type="graphite", then the expression
|
||||
# must contain valid Graphite expression.
|
||||
expr: <string>
|
||||
|
||||
# Alerts are considered firing once they have been returned for this long.
|
||||
# Alerts which have not yet been fired for long enough are considered pending.
|
||||
# If param is omitted or set to 0 then alerts will be immediately considered
|
||||
# as firing once they return.
|
||||
[ for: <duration> | default = 0s ]
|
||||
|
||||
# Labels to add or overwrite for each alert.
|
||||
labels:
|
||||
[ <labelname>: <tmpl_string> ]
|
||||
|
||||
# Annotations to add to each alert.
|
||||
annotations:
|
||||
[ <labelname>: <tmpl_string> ]
|
||||
```
|
||||
|
||||
It is allowed to use [Go templating](https://golang.org/pkg/text/template/) in annotations
|
||||
to format data, iterate over it or execute expressions.
|
||||
Additionally, `vmalert` provides some extra templating functions
|
||||
listed [here](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/app/vmalert/notifier/template_func.go).
|
||||
|
||||
#### Recording rules
|
||||
|
||||
The syntax for recording rules is following:
|
||||
```yaml
|
||||
# The name of the time series to output to. Must be a valid metric name.
|
||||
record: <string>
|
||||
|
||||
# The expression to evaluate. The expression language depends on the type value.
|
||||
# By default MetricsQL expression is used. If group.type="graphite", then the expression
|
||||
# must contain valid Graphite expression.
|
||||
expr: <string>
|
||||
|
||||
# Labels to add or overwrite before storing the result.
|
||||
labels:
|
||||
[ <labelname>: <labelvalue> ]
|
||||
```
|
||||
|
||||
For recording rules to work `-remoteWrite.url` must be specified.
|
||||
|
||||
|
||||
### Alerts state on restarts
|
||||
|
||||
`vmalert` has no local storage, so alerts state is stored in the process memory. Hence, after restart of `vmalert`
|
||||
the process alerts state will be lost. To avoid this situation, `vmalert` should be configured via the following flags:
|
||||
* `-remoteWrite.url` - URL to VictoriaMetrics (Single) or vminsert (Cluster). `vmalert` will persist alerts state
|
||||
into the configured address in the form of time series named `ALERTS` and `ALERTS_FOR_STATE` via remote-write protocol.
|
||||
These are regular time series and maybe queried from VM just as any other time series.
|
||||
The state is stored to the configured address on every rule evaluation.
|
||||
* `-remoteRead.url` - URL to VictoriaMetrics (Single) or vmselect (Cluster). `vmalert` will try to restore alerts state
|
||||
from configured address by querying time series with name `ALERTS_FOR_STATE`.
|
||||
|
||||
Both flags are required for proper state restoration. Restore process may fail if time series are missing
|
||||
in configured `-remoteRead.url`, weren't updated in the last `1h` (controlled by `-remoteRead.lookback`)
|
||||
or received state doesn't match current `vmalert` rules configuration.
|
||||
|
||||
|
||||
### Multitenancy
|
||||
|
||||
There are the following approaches exist for alerting and recording rules across
|
||||
[multiple tenants](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#multitenancy):
|
||||
|
||||
* To run a separate `vmalert` instance per each tenant.
|
||||
The corresponding tenant must be specified in `-datasource.url` command-line flag
|
||||
according to [these docs](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format).
|
||||
For example, `/path/to/vmalert -datasource.url=http://vmselect:8481/select/123/prometheus`
|
||||
would run alerts against `AccountID=123`. For recording rules the `-remoteWrite.url` command-line
|
||||
flag must contain the url for the specific tenant as well.
|
||||
For example, `-remoteWrite.url=http://vminsert:8480/insert/123/prometheus` would write recording
|
||||
rules to `AccountID=123`.
|
||||
|
||||
* To specify `tenant` parameter per each alerting and recording group if
|
||||
[enterprise version of vmalert](https://victoriametrics.com/products/enterprise/) is used
|
||||
with `-clusterMode` command-line flag. For example:
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: rules_for_tenant_123
|
||||
tenant: "123"
|
||||
rules:
|
||||
# Rules for accountID=123
|
||||
|
||||
- name: rules_for_tenant_456:789
|
||||
tenant: "456:789"
|
||||
rules:
|
||||
# Rules for accountID=456, projectID=789
|
||||
```
|
||||
|
||||
If `-clusterMode` is enabled, then `-datasource.url`, `-remoteRead.url` and `-remoteWrite.url` must
|
||||
contain only the hostname without tenant id. For example: `-datasource.url=http://vmselect:8481`.
|
||||
`vmalert` automatically adds the specified tenant to urls per each recording rule in this case.
|
||||
|
||||
If `-clusterMode` is enabled and the `tenant` in a particular group is missing, then the tenant value
|
||||
is obtained from `-defaultTenant.prometheus` or `-defaultTenant.graphite` depending on the `type` of the group.
|
||||
|
||||
The enterprise version of vmalert is available in `vmutils-*-enterprise.tar.gz` files
|
||||
at [release page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) and in `*-enterprise`
|
||||
tags at [Docker Hub](https://hub.docker.com/r/victoriametrics/vmalert/tags).
|
||||
|
||||
### Topology examples
|
||||
|
||||
The following sections are showing how `vmalert` may be used and configured
|
||||
for different scenarios.
|
||||
|
||||
Please note, not all flags in examples are required:
|
||||
* `-remoteWrite.url` and `-remoteRead.url` are optional and are needed only if
|
||||
you have recording rules or want to store [alerts state](#alerts-state-on-restarts) on `vmalert` restarts;
|
||||
* `-notifier.url` is optional and is needed only if you have alerting rules.
|
||||
|
||||
#### Single-node VictoriaMetrics
|
||||
|
||||
The simplest configuration where one single-node VM server is used for
|
||||
rules execution, storing recording rules results and alerts state.
|
||||
|
||||
`vmalert` configuration flags:
|
||||
```
|
||||
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||
-datasource.url=http://victoriametrics:8428 \ # VM-single addr for executing rules expressions
|
||||
-remoteWrite.url=http://victoriametrics:8428 \ # VM-single addr to persist alerts state and recording rules results
|
||||
-remoteRead.url=http://victoriametrics:8428 \ # VM-single addr for restoring alerts state after restart
|
||||
-notifier.url=http://alertmanager:9093 # AlertManager addr to send alerts when they trigger
|
||||
```
|
||||
|
||||
<img alt="vmalert single" width="500" src="vmalert_single.png">
|
||||
|
||||
|
||||
#### Cluster VictoriaMetrics
|
||||
|
||||
In [cluster mode](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html)
|
||||
VictoriaMetrics has separate components for writing and reading path:
|
||||
`vminsert` and `vmselect` components respectively. `vmselect` is used for executing rules expressions
|
||||
and `vminsert` is used to persist recording rules results and alerts state.
|
||||
Cluster mode could have multiple `vminsert` and `vmselect` components.
|
||||
|
||||
`vmalert` configuration flags:
|
||||
```
|
||||
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||
-datasource.url=http://vmselect:8481/select/0/prometheus # vmselect addr for executing rules expressions
|
||||
-remoteWrite.url=http://vminsert:8480/insert/0/prometheuss # vminsert addr to persist alerts state and recording rules results
|
||||
-remoteRead.url=http://vmselect:8481/select/0/prometheus # vmselect addr for restoring alerts state after restart
|
||||
-notifier.url=http://alertmanager:9093 # AlertManager addr to send alerts when they trigger
|
||||
```
|
||||
|
||||
<img alt="vmalert cluster" src="vmalert_cluster.png">
|
||||
|
||||
In case when you want to spread the load on these components - add balancers before them and configure
|
||||
`vmalert` with balancer's addresses. Please, see more about VM's cluster architecture
|
||||
[here](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#architecture-overview).
|
||||
|
||||
#### HA vmalert
|
||||
|
||||
For HA user can run multiple identically configured `vmalert` instances.
|
||||
It means all of them will execute the same rules, write state and results to
|
||||
the same destinations, and send alert notifications to multiple configured
|
||||
Alertmanagers.
|
||||
|
||||
`vmalert` configuration flags:
|
||||
```
|
||||
./bin/vmalert -rule=rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||
-datasource.url=http://victoriametrics:8428 \ # VM-single addr for executing rules expressions
|
||||
-remoteWrite.url=http://victoriametrics:8428 \ # VM-single addr to persist alerts state and recording rules results
|
||||
-remoteRead.url=http://victoriametrics:8428 \ # VM-single addr for restoring alerts state after restart
|
||||
-notifier.url=http://alertmanager1:9093 \ # Multiple AlertManager addresses to send alerts when they trigger
|
||||
-notifier.url=http://alertmanagerN:9093 # The same alert will be sent to all configured notifiers
|
||||
```
|
||||
|
||||
<img alt="vmalert ha" width="800px" src="vmalert_ha.png">
|
||||
|
||||
To avoid recording rules results and alerts state duplication in VictoriaMetrics server
|
||||
don't forget to configure [deduplication](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#deduplication).
|
||||
|
||||
Alertmanager will automatically deduplicate alerts with identical labels, so ensure that
|
||||
all `vmalert`s are having the same config.
|
||||
|
||||
Don't forget to configure [cluster mode](https://prometheus.io/docs/alerting/latest/alertmanager/)
|
||||
for Alertmanagers for better reliability.
|
||||
|
||||
This example uses single-node VM server for the sake of simplicity.
|
||||
Check how to replace it with [cluster VictoriaMetrics](#cluster-victoriametrics) if needed.
|
||||
|
||||
|
||||
#### Downsampling and aggregation via vmalert
|
||||
|
||||
Example shows how to build a topology where `vmalert` will process data from one cluster
|
||||
and write results into another. Such clusters may be called as "hot" (low retention,
|
||||
high-speed disks, used for operative monitoring) and "cold" (long term retention,
|
||||
slower/cheaper disks, low resolution data). With help of `vmalert`, user can setup
|
||||
recording rules to process raw data from "hot" cluster (by applying additional transformations
|
||||
or reducing resolution) and push results to "cold" cluster.
|
||||
|
||||
`vmalert` configuration flags:
|
||||
```
|
||||
./bin/vmalert -rule=downsampling-rules.yml \ # Path to the file with rules configuration. Supports wildcard
|
||||
-datasource.url=http://raw-cluster-vmselect:8481/select/0/prometheus # vmselect addr for executing recordi ng rules expressions
|
||||
-remoteWrite.url=http://aggregated-cluster-vminsert:8480/insert/0/prometheuss # vminsert addr to persist recording rules results
|
||||
```
|
||||
|
||||
<img alt="vmalert multi cluster" src="vmalert_multicluster.png">
|
||||
|
||||
Please note, [replay](#rules-backfilling) feature may be used for transforming historical data.
|
||||
|
||||
Flags `-remoteRead.url` and `-notifier.url` are omitted since we assume only recording rules are used.
|
||||
|
||||
|
||||
### Web
|
||||
|
||||
`vmalert` runs a web-server (`-httpListenAddr`) for serving metrics and alerts endpoints:
|
||||
* `http://<vmalert-addr>` - UI;
|
||||
* `http://<vmalert-addr>/api/v1/groups` - list of all loaded groups and rules;
|
||||
* `http://<vmalert-addr>/api/v1/alerts` - list of all active alerts;
|
||||
* `http://<vmalert-addr>/api/v1/<groupID>/<alertID>/status" ` - get alert status by ID.
|
||||
Used as alert source in AlertManager.
|
||||
* `http://<vmalert-addr>/metrics` - application metrics.
|
||||
* `http://<vmalert-addr>/-/reload` - hot configuration reload.
|
||||
|
||||
|
||||
## Graphite
|
||||
|
||||
vmalert sends requests to `<-datasource.url>/render?format=json` during evaluation of alerting and recording rules
|
||||
if the corresponding group or rule contains `type: "graphite"` config option. It is expected that the `<-datasource.url>/render`
|
||||
implements [Graphite Render API](https://graphite.readthedocs.io/en/stable/render_api.html) for `format=json`.
|
||||
When using vmalert with both `graphite` and `prometheus` rules configured against cluster version of VM do not forget
|
||||
to set `-datasource.appendTypePrefix` flag to `true`, so vmalert can adjust URL prefix automatically based on the query type.
|
||||
|
||||
## Rules backfilling
|
||||
|
||||
vmalert supports alerting and recording rules backfilling (aka `replay`). In replay mode vmalert
|
||||
can read the same rules configuration as normal, evaluate them on the given time range and backfill
|
||||
results via remote write to the configured storage. vmalert supports any PromQL/MetricsQL compatible
|
||||
data source for backfilling.
|
||||
|
||||
### How it works
|
||||
|
||||
In `replay` mode vmalert works as a cli-tool and exits immediately after work is done.
|
||||
To run vmalert in `replay` mode:
|
||||
```
|
||||
./bin/vmalert -rule=path/to/your.rules \ # path to files with rules you usually use with vmalert
|
||||
-datasource.url=http://localhost:8428 \ # PromQL/MetricsQL compatible datasource
|
||||
-remoteWrite.url=http://localhost:8428 \ # remote write compatible storage to persist results
|
||||
-replay.timeFrom=2021-05-11T07:21:43Z \ # time from begin replay
|
||||
-replay.timeTo=2021-05-29T18:40:43Z # time to finish replay
|
||||
```
|
||||
|
||||
The output of the command will look like the following:
|
||||
```
|
||||
Replay mode:
|
||||
from: 2021-05-11 07:21:43 +0000 UTC # set by -replay.timeFrom
|
||||
to: 2021-05-29 18:40:43 +0000 UTC # set by -replay.timeTo
|
||||
max data points per request: 1000 # set by -replay.maxDatapointsPerQuery
|
||||
|
||||
Group "ReplayGroup"
|
||||
interval: 1m0s
|
||||
requests to make: 27
|
||||
max range per request: 16h40m0s
|
||||
> Rule "type:vm_cache_entries:rate5m" (ID: 1792509946081842725)
|
||||
27 / 27 [----------------------------------------------------------------------------------------------------] 100.00% 78 p/s
|
||||
> Rule "go_cgo_calls_count:rate5m" (ID: 17958425467471411582)
|
||||
27 / 27 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
|
||||
|
||||
Group "vmsingleReplay"
|
||||
interval: 30s
|
||||
requests to make: 54
|
||||
max range per request: 8h20m0s
|
||||
> Rule "RequestErrorsToAPI" (ID: 17645863024999990222)
|
||||
54 / 54 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
|
||||
> Rule "TooManyLogs" (ID: 9042195394653477652)
|
||||
54 / 54 [-----------------------------------------------------------------------------------------------------] 100.00% ? p/s
|
||||
2021-06-07T09:59:12.098Z info app/vmalert/replay.go:68 replay finished! Imported 511734 samples
|
||||
```
|
||||
|
||||
In `replay` mode all groups are executed sequentially one-by-one. Rules within the group are
|
||||
executed sequentially as well (`concurrency` setting is ignored). Vmalert sends rule's expression
|
||||
to [/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) endpoint
|
||||
of the configured `-datasource.url`. Returned data is then processed according to the rule type and
|
||||
backfilled to `-remoteWrite.url` via [remote Write protocol](https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations).
|
||||
Vmalert respects `evaluationInterval` value set by flag or per-group during the replay.
|
||||
Vmalert automatically disables caching on VictoriaMetrics side by sending `nocache=1` param. It allows
|
||||
to prevent cache pollution and unwanted time range boundaries adjustment during backfilling.
|
||||
|
||||
#### Recording rules
|
||||
|
||||
The result of recording rules `replay` should match with results of normal rules evaluation.
|
||||
|
||||
#### Alerting rules
|
||||
|
||||
The result of alerting rules `replay` is time series reflecting [alert's state](#alerts-state-on-restarts).
|
||||
To see if `replayed` alert has fired in the past use the following PromQL/MetricsQL expression:
|
||||
```
|
||||
ALERTS{alertname="your_alertname", alertstate="firing"}
|
||||
```
|
||||
Execute the query against storage which was used for `-remoteWrite.url` during the `replay`.
|
||||
|
||||
### Additional configuration
|
||||
|
||||
There are following non-required `replay` flags:
|
||||
|
||||
* `-replay.maxDatapointsPerQuery` - the max number of data points expected to receive in one request.
|
||||
In two words, it affects the max time range for every `/query_range` request. The higher the value,
|
||||
the fewer requests will be issued during `replay`.
|
||||
* `-replay.ruleRetryAttempts` - when datasource fails to respond vmalert will make this number of retries
|
||||
per rule before giving up.
|
||||
* `-replay.rulesDelay` - delay between sequential rules execution. Important in cases if there are chaining
|
||||
(rules which depend on each other) rules. It is expected, that remote storage will be able to persist
|
||||
previously accepted data during the delay, so data will be available for the subsequent queries.
|
||||
Keep it equal or bigger than `-remoteWrite.flushInterval`.
|
||||
|
||||
See full description for these flags in `./vmalert --help`.
|
||||
|
||||
### Limitations
|
||||
|
||||
* Graphite engine isn't supported yet;
|
||||
* `query` template function is disabled for performance reasons (might be changed in future);
|
||||
|
||||
|
||||
## Monitoring
|
||||
|
||||
`vmalert` exports various metrics in Prometheus exposition format at `http://vmalert-host:8880/metrics` page.
|
||||
We recommend setting up regular scraping of this page either through `vmagent` or by Prometheus so that the exported
|
||||
metrics may be analyzed later.
|
||||
|
||||
Use the official [Grafana dashboard](https://grafana.com/grafana/dashboards/14950) for `vmalert` overview. Graphs on this dashboard contain useful hints - hover the `i` icon at the top left corner of each graph in order to read it.
|
||||
If you have suggestions for improvements or have found a bug - please open an issue on github or add
|
||||
a review to the dashboard.
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
### Flags
|
||||
|
||||
Pass `-help` to `vmalert` in order to see the full list of supported
|
||||
command-line flags with their descriptions.
|
||||
|
||||
The shortlist of configuration flags is the following:
|
||||
```
|
||||
-datasource.appendTypePrefix
|
||||
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
|
||||
-datasource.basicAuth.password string
|
||||
Optional basic auth password for -datasource.url
|
||||
-datasource.basicAuth.passwordFile string
|
||||
Optional path to basic auth password to use for -datasource.url
|
||||
-datasource.basicAuth.username string
|
||||
Optional basic auth username for -datasource.url
|
||||
-datasource.bearerToken string
|
||||
Optional bearer auth token to use for -datasource.url.
|
||||
-datasource.bearerTokenFile string
|
||||
Optional path to bearer token file to use for -datasource.url.
|
||||
-datasource.lookback duration
|
||||
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
|
||||
-datasource.maxIdleConnections int
|
||||
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
|
||||
-datasource.queryStep duration
|
||||
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.If queryStep isn't specified, rule's evaluationInterval will be used instead.
|
||||
-datasource.roundDigits int
|
||||
Adds "round_digits" GET param to datasource requests. In VM "round_digits" limits the number of digits after the decimal point in response values.
|
||||
-datasource.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
|
||||
-datasource.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
|
||||
-datasource.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -datasource.url
|
||||
-datasource.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
|
||||
-datasource.tlsServerName string
|
||||
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
|
||||
-datasource.url string
|
||||
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
|
||||
-disableAlertgroupLabel
|
||||
Whether to disable adding group's name as label to generated alerts and time series.
|
||||
-dryRun -rule
|
||||
Whether to check only config files without running vmalert. The rules file are validated. The -rule flag must be specified.
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-evaluationInterval duration
|
||||
How often to evaluate the rules (default 1m0s)
|
||||
-external.alert.source string
|
||||
External Alert Source allows to override the Source link for alerts sent to AlertManager for cases where you want to build a custom link to Grafana, Prometheus or any other service.
|
||||
eg. 'explore?orgId=1&left=[\"now-1h\",\"now\",\"VictoriaMetrics\",{\"expr\": \"{{$expr|quotesEscape|crlfEscape|queryEscape}}\"},{\"mode\":\"Metrics\"},{\"ui\":[true,true,true,\"none\"]}]'.If empty '/api/v1/:groupID/alertID/status' is used
|
||||
-external.label array
|
||||
Optional label in the form 'name=value' to add to all generated recording rules and alerts. Pass multiple -label flags in order to add multiple label sets.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-external.url string
|
||||
External URL is used as alert's source for sent alerts to the notifier
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
Address to listen for http connections (default ":8880")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-notifier.basicAuth.password array
|
||||
Optional basic auth password for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.basicAuth.username array
|
||||
Optional basic auth username for -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsCAFile array
|
||||
Optional path to TLS CA file to use for verifying connections to -notifier.url. By default system CA is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsCertFile array
|
||||
Optional path to client-side TLS certificate file to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsInsecureSkipVerify array
|
||||
Whether to skip tls verification when connecting to -notifier.url
|
||||
Supports array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsKeyFile array
|
||||
Optional path to client-side TLS certificate key to use when connecting to -notifier.url
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.tlsServerName array
|
||||
Optional TLS server name to use for connections to -notifier.url. By default the server name from -notifier.url is used
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-notifier.url array
|
||||
Prometheus alertmanager URL, e.g. http://127.0.0.1:9093
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-remoteRead.basicAuth.password string
|
||||
Optional basic auth password for -remoteRead.url
|
||||
-remoteRead.basicAuth.passwordFile string
|
||||
Optional path to basic auth password to use for -remoteRead.url
|
||||
-remoteRead.basicAuth.username string
|
||||
Optional basic auth username for -remoteRead.url
|
||||
-remoteRead.bearerToken string
|
||||
Optional bearer auth token to use for -remoteRead.url.
|
||||
-remoteRead.bearerTokenFile string
|
||||
Optional path to bearer token file to use for -remoteRead.url.
|
||||
-remoteRead.disablePathAppend
|
||||
Whether to disable automatic appending of '/api/v1/query' path to the configured -remoteRead.url.
|
||||
-remoteRead.ignoreRestoreErrors
|
||||
Whether to ignore errors from remote storage when restoring alerts state on startup. (default true)
|
||||
-remoteRead.lookback duration
|
||||
Lookback defines how far to look into past for alerts timeseries. For example, if lookback=1h then range from now() to now()-1h will be scanned. (default 1h0m0s)
|
||||
-remoteRead.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteRead.url. By default system CA is used
|
||||
-remoteRead.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteRead.url
|
||||
-remoteRead.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -remoteRead.url
|
||||
-remoteRead.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteRead.url
|
||||
-remoteRead.tlsServerName string
|
||||
Optional TLS server name to use for connections to -remoteRead.url. By default the server name from -remoteRead.url is used
|
||||
-remoteRead.url vmalert
|
||||
Optional URL to VictoriaMetrics or vmselect that will be used to restore alerts state. This configuration makes sense only if vmalert was configured with `remoteWrite.url` before and has been successfully persisted its state. E.g. http://127.0.0.1:8428. See also -remoteRead.disablePathAppend
|
||||
-remoteWrite.basicAuth.password string
|
||||
Optional basic auth password for -remoteWrite.url
|
||||
-remoteWrite.basicAuth.passwordFile string
|
||||
Optional path to basic auth password to use for -remoteWrite.url
|
||||
-remoteWrite.basicAuth.username string
|
||||
Optional basic auth username for -remoteWrite.url
|
||||
-remoteWrite.bearerToken string
|
||||
Optional bearer auth token to use for -remoteWrite.url.
|
||||
-remoteWrite.bearerTokenFile string
|
||||
Optional path to bearer token file to use for -remoteWrite.url.
|
||||
-remoteWrite.concurrency int
|
||||
Defines number of writers for concurrent writing into remote querier (default 1)
|
||||
-remoteWrite.disablePathAppend
|
||||
Whether to disable automatic appending of '/api/v1/write' path to the configured -remoteWrite.url.
|
||||
-remoteWrite.flushInterval duration
|
||||
Defines interval of flushes to remote write endpoint (default 5s)
|
||||
-remoteWrite.maxBatchSize int
|
||||
Defines defines max number of timeseries to be flushed at once (default 1000)
|
||||
-remoteWrite.maxQueueSize int
|
||||
Defines the max number of pending datapoints to remote write endpoint (default 100000)
|
||||
-remoteWrite.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -remoteWrite.url. By default system CA is used
|
||||
-remoteWrite.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -remoteWrite.url
|
||||
-remoteWrite.tlsServerName string
|
||||
Optional TLS server name to use for connections to -remoteWrite.url. By default the server name from -remoteWrite.url is used
|
||||
-remoteWrite.url string
|
||||
Optional URL to VictoriaMetrics or vminsert where to persist alerts state and recording rules results in form of timeseries. For example, if -remoteWrite.url=http://127.0.0.1:8428 is specified, then the alerts state will be written to http://127.0.0.1:8428/api/v1/write . See also -remoteWrite.disablePathAppend
|
||||
-replay.maxDatapointsPerQuery int
|
||||
Max number of data points expected in one request. The higher the value, the less requests will be made during replay. (default 1000)
|
||||
-replay.ruleRetryAttempts int
|
||||
Defines how many retries to make before giving up on rule if request for it returns an error. (default 5)
|
||||
-replay.rulesDelay duration
|
||||
Delay between rules evaluation within the group. Could be important if there are chained rules inside of the groupand processing need to wait for previous rule results to be persisted by remote storage before evaluating the next rule.Keep it equal or bigger than -remoteWrite.flushInterval. (default 1s)
|
||||
-replay.timeFrom string
|
||||
The time filter in RFC3339 format to select time series with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
-replay.timeTo string
|
||||
The time filter in RFC3339 format to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
|
||||
-rule array
|
||||
Path to the file with alert rules.
|
||||
Supports patterns. Flag can be specified multiple times.
|
||||
Examples:
|
||||
-rule="/path/to/file". Path to a single file with alerting rules
|
||||
-rule="dir/*.yaml" -rule="/*.yaml". Relative path to all .yaml files in "dir" folder,
|
||||
absolute path to all .yaml files in root.
|
||||
Rule files may contain %{ENV_VAR} placeholders, which are substituted by the corresponding env vars.
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-rule.configCheckInterval duration
|
||||
Interval for checking for changes in '-rule' files. By default the checking is disabled. Send SIGHUP signal in order to force config check for changes
|
||||
-rule.maxResolveDuration duration
|
||||
Limits the maximum duration for automatic alert expiration, which is by default equal to 3 evaluation intervals of the parent group.
|
||||
-rule.validateExpressions
|
||||
Whether to validate rules expressions via MetricsQL engine (default true)
|
||||
-rule.validateTemplates
|
||||
Whether to validate annotation and label templates (default true)
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
### Hot config reload
|
||||
`vmalert` supports "hot" config reload via the following methods:
|
||||
* send SIGHUP signal to `vmalert` process;
|
||||
* send GET request to `/-/reload` endpoint;
|
||||
* configure `-rule.configCheckInterval` flag for periodic reload
|
||||
on config change.
|
||||
|
||||
### URL params
|
||||
|
||||
To set additional URL params for `datasource.url`, `remoteWrite.url` or `remoteRead.url`
|
||||
just add them in address: `-datasource.url=http://localhost:8428?nocache=1`.
|
||||
|
||||
To set additional URL params for specific [group of rules](#Groups) modify
|
||||
the `params` group:
|
||||
```yaml
|
||||
groups:
|
||||
- name: TestGroup
|
||||
params:
|
||||
denyPartialResponse: ["true"]
|
||||
extra_label: ["env=dev"]
|
||||
```
|
||||
Please note, `params` are used only for executing rules expressions (requests to `datasource.url`).
|
||||
If there would be a conflict between URL params set in `datasource.url` flag and params in group definition
|
||||
the latter will have higher priority.
|
||||
|
||||
|
||||
## Contributing
|
||||
|
||||
`vmalert` is mostly designed and built by VictoriaMetrics community.
|
||||
Feel free to share your experience and ideas for improving this
|
||||
software. Please keep simplicity as the main priority.
|
||||
|
||||
## How to build from sources
|
||||
|
||||
It is recommended using
|
||||
[binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
|
||||
- `vmalert` is located in `vmutils-*` archives there.
|
||||
|
||||
|
||||
### Development build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmalert` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmalert` binary and puts it into the `bin` folder.
|
||||
|
||||
### Production build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmalert-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmalert-prod` binary and puts it into the `bin` folder.
|
||||
|
||||
|
||||
### ARM build
|
||||
|
||||
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
|
||||
|
||||
### Development ARM build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmalert-arm` or `make vmalert-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmalert-arm` or `vmalert-arm64` binary respectively and puts it into the `bin` folder.
|
||||
|
||||
### Production ARM build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmalert-arm-prod` or `make vmalert-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmalert-arm-prod` or `vmalert-arm64-prod` binary respectively and puts it into the `bin` folder.
|
||||
BIN
vmalert_cluster.png
Normal file
|
After Width: | Height: | Size: 73 KiB |
BIN
vmalert_ha.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
vmalert_multicluster.png
Normal file
|
After Width: | Height: | Size: 122 KiB |
BIN
vmalert_single.png
Normal file
|
After Width: | Height: | Size: 62 KiB |
287
vmauth.md
Normal file
@@ -0,0 +1,287 @@
|
||||
---
|
||||
sort: 5
|
||||
---
|
||||
|
||||
# vmauth
|
||||
|
||||
`vmauth` is a simple auth proxy, router and [load balancer](#load-balancing) for [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It reads auth credentials from `Authorization` http header ([Basic Auth](https://en.wikipedia.org/wiki/Basic_access_authentication) and `Bearer token` is supported),
|
||||
matches them against configs pointed by [-auth.config](#auth-config) command-line flag and proxies incoming HTTP requests to the configured per-user `url_prefix` on successful match.
|
||||
The `-auth.config` can point to either local file or to http url.
|
||||
|
||||
## Quick start
|
||||
|
||||
Just download `vmutils-*` archive from [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases), unpack it
|
||||
and pass the following flag to `vmauth` binary in order to start authorizing and routing requests:
|
||||
|
||||
```
|
||||
/path/to/vmauth -auth.config=/path/to/auth/config.yml
|
||||
```
|
||||
|
||||
After that `vmauth` starts accepting HTTP requests on port `8427` and routing them according to the provided [-auth.config](#auth-config).
|
||||
The port can be modified via `-httpListenAddr` command-line flag.
|
||||
|
||||
The auth config can be reloaded either by passing `SIGHUP` signal to `vmauth` or by querying `/-/reload` http endpoint.
|
||||
|
||||
Docker images for `vmauth` are available [here](https://hub.docker.com/r/victoriametrics/vmauth/tags).
|
||||
|
||||
Pass `-help` to `vmauth` in order to see all the supported command-line flags with their descriptions.
|
||||
|
||||
Feel free [contacting us](mailto:info@victoriametrics.com) if you need customized auth proxy for VictoriaMetrics with the support of LDAP, SSO, RBAC, SAML,
|
||||
accounting and rate limiting such as [vmgateway](https://docs.victoriametrics.com/vmgateway.html).
|
||||
|
||||
## Load balancing
|
||||
|
||||
Each `url_prefix` in the [-auth.config](#auth-config) may contain either a single url or a list of urls. In the latter case `vmauth` balances load among the configured urls in a round-robin manner. This feature is useful for balancing the load among multiple `vmselect` and/or `vminsert` nodes in [VictoriaMetrics cluster](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html).
|
||||
|
||||
## Auth config
|
||||
|
||||
`-auth.config` is represented in the following simple `yml` format:
|
||||
|
||||
```yml
|
||||
# Arbitrary number of usernames may be put here.
|
||||
# Username and bearer_token values must be unique.
|
||||
|
||||
users:
|
||||
# Requests with the 'Authorization: Bearer XXXX' header are proxied to http://localhost:8428 .
|
||||
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
|
||||
# Requests with the Basic Auth username=XXXX are proxied to http://localhost:8428 as well.
|
||||
- bearer_token: "XXXX"
|
||||
url_prefix: "http://localhost:8428"
|
||||
|
||||
# Requests with the 'Authorization: Bearer YYY' header are proxied to http://localhost:8428 ,
|
||||
# The `X-Scope-OrgID: foobar` http header is added to every proxied request.
|
||||
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
|
||||
- bearer_token: "YYY"
|
||||
url_prefix: "http://localhost:8428"
|
||||
headers:
|
||||
- "X-Scope-OrgID: foobar"
|
||||
|
||||
# The user for querying local single-node VictoriaMetrics.
|
||||
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
|
||||
# will be proxied to http://localhost:8428 .
|
||||
# For example, http://vmauth:8427/api/v1/query is proxied to http://localhost:8428/api/v1/query
|
||||
- username: "local-single-node"
|
||||
password: "***"
|
||||
url_prefix: "http://localhost:8428"
|
||||
|
||||
# The user for querying local single-node VictoriaMetrics with extra_label team=dev.
|
||||
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
|
||||
# will be routed to http://localhost:8428 with extra_label=team=dev query arg.
|
||||
# For example, http://vmauth:8427/api/v1/query is routed to http://localhost:8428/api/v1/query?extra_label=team=dev
|
||||
- username: "local-single-node"
|
||||
password: "***"
|
||||
url_prefix: "http://localhost:8428?extra_label=team=dev"
|
||||
|
||||
# The user for querying account 123 in VictoriaMetrics cluster
|
||||
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
|
||||
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
|
||||
# will be load-balanced among http://vmselect1:8481/select/123/prometheus and http://vmselect2:8481/select/123/prometheus
|
||||
# For example, http://vmauth:8427/api/v1/query is proxied to the following urls in a round-robin manner:
|
||||
# - http://vmselect1:8481/select/123/prometheus/api/v1/select
|
||||
# - http://vmselect2:8481/select/123/prometheus/api/v1/select
|
||||
- username: "cluster-select-account-123"
|
||||
password: "***"
|
||||
url_prefix:
|
||||
- "http://vmselect1:8481/select/123/prometheus"
|
||||
- "http://vmselect2:8481/select/123/prometheus"
|
||||
|
||||
# The user for inserting Prometheus data into VictoriaMetrics cluster under account 42
|
||||
# See https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format
|
||||
# All the requests to http://vmauth:8427 with the given Basic Auth (username:password)
|
||||
# will be load-balanced between http://vminsert1:8480/insert/42/prometheus and http://vminsert2:8480/insert/42/prometheus
|
||||
# For example, http://vmauth:8427/api/v1/write is proxied to the following urls in a round-robin manner:
|
||||
# - http://vminsert1:8480/insert/42/prometheus/api/v1/write
|
||||
# - http://vminsert2:8480/insert/42/prometheus/api/v1/write
|
||||
- username: "cluster-insert-account-42"
|
||||
password: "***"
|
||||
url_prefix:
|
||||
- "http://vminsert1:8480/insert/42/prometheus"
|
||||
- "http://vminsert2:8480/insert/42/prometheus"
|
||||
|
||||
# A single user for querying and inserting data:
|
||||
# - Requests to http://vmauth:8427/api/v1/query, http://vmauth:8427/api/v1/query_range
|
||||
# and http://vmauth:8427/api/v1/label/<label_name>/values are proxied to the following urls in a round-robin manner:
|
||||
# - http://vmselect1:8481/select/42/prometheus
|
||||
# - http://vmselect2:8481/select/42/prometheus
|
||||
# For example, http://vmauth:8427/api/v1/query is proxied to http://vmselect1:8480/select/42/prometheus/api/v1/query
|
||||
# or to http://vmselect2:8480/select/42/prometheus/api/v1/query .
|
||||
# - Requests to http://vmauth:8427/api/v1/write are proxied to http://vminsert:8480/insert/42/prometheus/api/v1/write .
|
||||
# The "X-Scope-OrgID: abc" http header is added to these requests.
|
||||
- username: "foobar"
|
||||
url_map:
|
||||
- src_paths:
|
||||
- "/api/v1/query"
|
||||
- "/api/v1/query_range"
|
||||
- "/api/v1/label/[^/]+/values"
|
||||
url_prefix:
|
||||
- "http://vmselect1:8481/select/42/prometheus"
|
||||
- "http://vmselect2:8481/select/42/prometheus"
|
||||
- src_paths: ["/api/v1/write"]
|
||||
url_prefix: "http://vminsert:8480/insert/42/prometheus"
|
||||
headers:
|
||||
- "X-Scope-OrgID: abc"
|
||||
```
|
||||
|
||||
The config may contain `%{ENV_VAR}` placeholders, which are substituted by the corresponding `ENV_VAR` environment variable values.
|
||||
This may be useful for passing secrets to the config.
|
||||
|
||||
## Security
|
||||
|
||||
It is expected that all the backend services protected by `vmauth` are located in an isolated private network, so they can be accessed by external users only via `vmauth`.
|
||||
|
||||
Do not transfer Basic Auth headers in plaintext over untrusted networks. Enable https. This can be done by passing the following `-tls*` command-line flags to `vmauth`:
|
||||
|
||||
```
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
```
|
||||
|
||||
Alternatively, [https termination proxy](https://en.wikipedia.org/wiki/TLS_termination_proxy) may be put in front of `vmauth`.
|
||||
|
||||
It is recommended protecting `/-/reload` endpoint with `-reloadAuthKey` command-line flag, so external users couldn't trigger config reload.
|
||||
|
||||
## Monitoring
|
||||
|
||||
`vmauth` exports various metrics in Prometheus exposition format at `http://vmauth-host:8427/metrics` page. It is recommended setting up regular scraping of this page
|
||||
either via [vmagent](https://docs.victoriametrics.com/vmagent.html) or via Prometheus, so the exported metrics could be analyzed later.
|
||||
|
||||
`vmauth` exports `vmauth_user_requests_total` metric with `username` label. The `username` label value equals to `username` field value set in the `-auth.config` file. It is possible to override or hide the value in the label by specifying `name` field. For example, the following config will result in `vmauth_user_requests_total{username="foobar"}` instead of `vmauth_user_requests_total{username="secret_user"}`:
|
||||
|
||||
```yml
|
||||
users:
|
||||
- username: "secret_user"
|
||||
name: "foobar"
|
||||
# other config options here
|
||||
```
|
||||
|
||||
## How to build from sources
|
||||
|
||||
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmauth` is located in `vmutils-*` archives there.
|
||||
|
||||
### Development build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmauth` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmauth` binary and puts it into the `bin` folder.
|
||||
|
||||
### Production build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmauth-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmauth-prod` binary and puts it into the `bin` folder.
|
||||
|
||||
### Building docker images
|
||||
|
||||
Run `make package-vmauth`. It builds `victoriametrics/vmauth:<PKG_TAG>` docker image locally.
|
||||
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
|
||||
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmauth`.
|
||||
|
||||
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
|
||||
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
|
||||
|
||||
```bash
|
||||
ROOT_IMAGE=scratch make package-vmauth
|
||||
```
|
||||
|
||||
## Profiling
|
||||
|
||||
`vmauth` provides handlers for collecting the following [Go profiles](https://blog.golang.org/profiling-go-programs):
|
||||
|
||||
* Memory profile. It can be collected with the following command:
|
||||
|
||||
```bash
|
||||
curl -s http://<vmauth-host>:8427/debug/pprof/heap > mem.pprof
|
||||
```
|
||||
|
||||
* CPU profile. It can be collected with the following command:
|
||||
|
||||
```bash
|
||||
curl -s http://<vmauth-host>:8427/debug/pprof/profile > cpu.pprof
|
||||
```
|
||||
|
||||
The command for collecting CPU profile waits for 30 seconds before returning.
|
||||
|
||||
The collected profiles may be analyzed with [go tool pprof](https://github.com/google/pprof).
|
||||
|
||||
## Advanced usage
|
||||
|
||||
Pass `-help` command-line arg to `vmauth` in order to see all the configuration options:
|
||||
|
||||
```
|
||||
./vmauth -help
|
||||
|
||||
vmauth authenticates and authorizes incoming requests and proxies them to VictoriaMetrics.
|
||||
|
||||
See the docs at https://docs.victoriametrics.com/vmauth.html .
|
||||
|
||||
-auth.config string
|
||||
Path to auth config. It can point either to local file or to http url. See https://docs.victoriametrics.com/vmauth.html for details on the format of this auth config
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections (default ":8427")
|
||||
-logInvalidAuthTokens
|
||||
Whether to log requests with invalid auth tokens. Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-maxIdleConnsPerBackend int
|
||||
The maximum number of idle connections vmauth can open per each backend host (default 100)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-reloadAuthKey string
|
||||
Auth key for /-/reload http endpoint. It must be passed as authKey=...
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
295
vmbackup.md
Normal file
@@ -0,0 +1,295 @@
|
||||
---
|
||||
sort: 6
|
||||
---
|
||||
|
||||
# vmbackup
|
||||
|
||||
`vmbackup` creates VictoriaMetrics data backups from [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
|
||||
|
||||
Supported storage systems for backups:
|
||||
|
||||
* [GCS](https://cloud.google.com/storage/). Example: `gs://<bucket>/<path/to/backup>`
|
||||
* [S3](https://aws.amazon.com/s3/). Example: `s3://<bucket>/<path/to/backup>`
|
||||
* Any S3-compatible storage such as [MinIO](https://github.com/minio/minio), [Ceph](https://docs.ceph.com/en/pacific/radosgw/s3/) or [Swift](https://platform.swiftstack.com/docs/admin/middleware/s3_middleware.html). See [these docs](#advanced-usage) for details.
|
||||
* Local filesystem. Example: `fs://</absolute/path/to/backup>`
|
||||
|
||||
`vmbackup` supports incremental and full backups. Incremental backups are created automatically if the destination path already contains data from the previous backup.
|
||||
Full backups can be sped up with `-origin` pointing to an already existing backup on the same remote storage. In this case `vmbackup` makes server-side copy for the shared
|
||||
data between the existing backup and new backup. It saves time and costs on data transfer.
|
||||
|
||||
Backup process can be interrupted at any time. It is automatically resumed from the interruption point when restarting `vmbackup` with the same args.
|
||||
|
||||
Backed up data can be restored with [vmrestore](https://docs.victoriametrics.com/vmrestore.html).
|
||||
|
||||
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
|
||||
|
||||
See also [vmbackupmanager](https://docs.victoriametrics.com/vmbackupmanager.html) tool built on top of `vmbackup`. This tool simplifies
|
||||
creation of hourly, daily, weekly and monthly backups.
|
||||
|
||||
|
||||
## Use cases
|
||||
|
||||
### Regular backups
|
||||
|
||||
Regular backup can be performed with the following command:
|
||||
|
||||
```
|
||||
vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-snapshot> -dst=gs://<bucket>/<path/to/new/backup>
|
||||
```
|
||||
|
||||
* `</path/to/victoria-metrics-data>` - path to VictoriaMetrics data pointed by `-storageDataPath` command-line flag in single-node VictoriaMetrics or in cluster `vmstorage`.
|
||||
There is no need to stop VictoriaMetrics for creating backups since they are performed from immutable [instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
|
||||
* `<local-snapshot>` is the snapshot to back up. See [how to create instant snapshots](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots). `vmbackup` can create the snapshot on itself if `-snapshot.createURL` command-line flag is set to an url for creating snapshots. In this case `-snapshotName` flag isn't needed.
|
||||
* `<bucket>` is an already existing name for [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets).
|
||||
* `<path/to/new/backup>` is the destination path where new backup will be placed.
|
||||
|
||||
|
||||
### Regular backups with server-side copy from existing backup
|
||||
|
||||
If the destination GCS bucket already contains the previous backup at `-origin` path, then new backup can be sped up
|
||||
with the following command:
|
||||
|
||||
```
|
||||
vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-snapshot> -dst=gs://<bucket>/<path/to/new/backup> -origin=gs://<bucket>/<path/to/existing/backup>
|
||||
```
|
||||
|
||||
It saves time and network bandwidth costs by performing server-side copy for the shared data from the `-origin` to `-dst`.
|
||||
|
||||
|
||||
### Incremental backups
|
||||
|
||||
Incremental backups are performed if `-dst` points to an already existing backup. In this case only new data is uploaded to remote storage.
|
||||
It saves time and network bandwidth costs when working with big backups:
|
||||
|
||||
```
|
||||
vmbackup -storageDataPath=</path/to/victoria-metrics-data> -snapshotName=<local-snapshot> -dst=gs://<bucket>/<path/to/existing/backup>
|
||||
```
|
||||
|
||||
|
||||
### Smart backups
|
||||
|
||||
Smart backups mean storing full daily backups into `YYYYMMDD` folders and creating incremental hourly backup into `latest` folder:
|
||||
|
||||
* Run the following command every hour:
|
||||
|
||||
```
|
||||
vmbackup -snapshotName=<latest-snapshot> -dst=gs://<bucket>/latest
|
||||
```
|
||||
|
||||
Where `<latest-snapshot>` is the latest [snapshot](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots).
|
||||
The command will upload only changed data to `gs://<bucket>/latest`.
|
||||
|
||||
* Run the following command once a day:
|
||||
|
||||
```
|
||||
vmbackup -snapshotName=<daily-snapshot> -dst=gs://<bucket>/<YYYYMMDD> -origin=gs://<bucket>/latest
|
||||
```
|
||||
|
||||
Where `<daily-snapshot>` is the snapshot for the last day `<YYYYMMDD>`.
|
||||
|
||||
|
||||
This apporach saves network bandwidth costs on hourly backups (since they are incremental) and allows recovering data from either the last hour (`latest` backup)
|
||||
or from any day (`YYYYMMDD` backups). Note that hourly backup shouldn't run when creating daily backup.
|
||||
|
||||
Do not forget to remove old snapshots and backups when they are no longer needed in order to save storage costs.
|
||||
|
||||
See also [vmbackupmanager tool](https://docs.victoriametrics.com/vmbackupmanager.html) for automating smart backups.
|
||||
|
||||
|
||||
## How does it work?
|
||||
|
||||
The backup algorithm is the following:
|
||||
|
||||
1. Collect information about files in the `-snapshotName`, in the `-dst` and in the `-origin`.
|
||||
2. Determine which files in `-dst` are missing in `-snapshotName`, and delete them. These are usually small files, which are already merged into bigger files in the snapshot.
|
||||
3. Determine which files in `-snapshotName` are missing in `-dst`. These are usually small new files and bigger merged files.
|
||||
4. Determine which files from step 3 exist in the `-origin`, and perform server-side copy of these files from `-origin` to `-dst`.
|
||||
These are usually the biggest and the oldest files, which are shared between backups.
|
||||
5. Upload the remaining files from step 3 from `-snapshotName` to `-dst`.
|
||||
|
||||
The algorithm splits source files into 1 GiB chunks in the backup. Each chunk is stored as a separate file in the backup.
|
||||
Such splitting minimizes the amounts of data to re-transfer after temporary errors.
|
||||
|
||||
`vmbackup` relies on [instant snapshot](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282) properties:
|
||||
|
||||
- All the files in the snapshot are immutable.
|
||||
- Old files are periodically merged into new files.
|
||||
- Smaller files have higher probability to be merged.
|
||||
- Consecutive snapshots share many identical files.
|
||||
|
||||
These properties allow performing fast and cheap incremental backups and server-side copying from `-origin` paths.
|
||||
See [this article](https://medium.com/@valyala/speeding-up-backups-for-big-time-series-databases-533c1a927883) for more details.
|
||||
`vmbackup` can work improperly or slowly when these properties are violated.
|
||||
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
* If the backup is slow, then try setting higher value for `-concurrency` flag. This will increase the number of concurrent workers that upload data to backup storage.
|
||||
* If `vmbackup` eats all the network bandwidth, then set `-maxBytesPerSecond` to the desired value.
|
||||
* If `vmbackup` has been interrupted due to temporary error, then just restart it with the same args. It will resume the backup process.
|
||||
* Backups created from [single-node VictoriaMetrics](https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html) cannot be restored
|
||||
at [cluster VictoriaMetrics](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html) and vice versa.
|
||||
|
||||
|
||||
## Advanced usage
|
||||
|
||||
|
||||
* Obtaining credentials from a file.
|
||||
|
||||
Add flag `-credsFilePath=/etc/credentials` with the following content:
|
||||
|
||||
for s3 (aws, minio or other s3 compatible storages):
|
||||
```bash
|
||||
[default]
|
||||
aws_access_key_id=theaccesskey
|
||||
aws_secret_access_key=thesecretaccesskeyvalue
|
||||
```
|
||||
|
||||
for gce cloud storage:
|
||||
```json
|
||||
{
|
||||
"type": "service_account",
|
||||
"project_id": "project-id",
|
||||
"private_key_id": "key-id",
|
||||
"private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
|
||||
"client_email": "service-account-email",
|
||||
"client_id": "client-id",
|
||||
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
|
||||
"token_uri": "https://accounts.google.com/o/oauth2/token",
|
||||
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
|
||||
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
|
||||
}
|
||||
```
|
||||
|
||||
* Usage with s3 custom url endpoint. It is possible to use `vmbackup` with s3 compatible storages like minio, cloudian, etc.
|
||||
You have to add a custom url endpoint via flag:
|
||||
```
|
||||
# for minio
|
||||
-customS3Endpoint=http://localhost:9000
|
||||
|
||||
# for aws gov region
|
||||
-customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com
|
||||
```
|
||||
|
||||
* Run `vmbackup -help` in order to see all the available options:
|
||||
|
||||
```
|
||||
-concurrency int
|
||||
The number of concurrent workers. Higher concurrency may reduce backup duration (default 10)
|
||||
-configFilePath string
|
||||
Path to file with S3 configs. Configs are loaded from default location if not set.
|
||||
See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
|
||||
-configProfile string
|
||||
Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used
|
||||
-credsFilePath string
|
||||
Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
|
||||
See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
|
||||
-customS3Endpoint string
|
||||
Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set
|
||||
-dst string
|
||||
Where to put the backup on the remote storage. Example: gs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir
|
||||
-dst can point to the previous backup. In this case incremental backup is performed, i.e. only changed data is uploaded
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address for exporting metrics at /metrics page (default ":8420")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-maxBytesPerSecond size
|
||||
The maximum upload speed. There is no limit if it is set to 0
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-origin string
|
||||
Optional origin directory on the remote storage with old backup for server-side copying when performing full backup. This speeds up full backups
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-s3ForcePathStyle
|
||||
Prefixing endpoint with bucket name when set false, true by default. (default true)
|
||||
-snapshot.createURL string
|
||||
VictoriaMetrics create snapshot url. When this is given a snapshot will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create . There is no need in setting -snapshotName if -snapshot.createURL is set
|
||||
-snapshot.deleteURL string
|
||||
VictoriaMetrics delete snapshot url. Optional. Will be generated from -snapshot.createURL if not provided. All created snapshots will be automatically deleted. Example: http://victoriametrics:8428/snapshot/delete
|
||||
-snapshotName string
|
||||
Name for the snapshot to backup. See https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#how-to-work-with-snapshots. There is no need in setting -snapshotName if -snapshot.createURL is set
|
||||
-storageDataPath string
|
||||
Path to VictoriaMetrics data. Must match -storageDataPath from VictoriaMetrics or vmstorage (default "victoria-metrics-data")
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
|
||||
## How to build from sources
|
||||
|
||||
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - see `vmutils-*` archives there.
|
||||
|
||||
|
||||
### Development build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmbackup` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmbackup` binary and puts it into the `bin` folder.
|
||||
|
||||
### Production build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmbackup-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmbackup-prod` binary and puts it into the `bin` folder.
|
||||
|
||||
### Building docker images
|
||||
|
||||
Run `make package-vmbackup`. It builds `victoriametrics/vmbackup:<PKG_TAG>` docker image locally.
|
||||
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
|
||||
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmbackup`.
|
||||
|
||||
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
|
||||
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
|
||||
|
||||
```bash
|
||||
ROOT_IMAGE=scratch make package-vmbackup
|
||||
```
|
||||
150
vmbackupmanager.md
Normal file
@@ -0,0 +1,150 @@
|
||||
---
|
||||
sort: 10
|
||||
---
|
||||
|
||||
## vmbackupmanager
|
||||
|
||||
***vmbackupmanager is a part of [enterprise package](https://victoriametrics.com/products/enterprise/). It is available for download and evaluation at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)***
|
||||
|
||||
The VictoriaMetrics backup manager automates regular backup procedures. It supports the following backup intervals: **hourly**, **daily**, **weekly** and **monthly**. Multiple backup intervals may be configured simultaneously. I.e. the backup manager creates hourly backups every hour, while it creates daily backups every day, etc. Backup manager must have read access to the storage data, so best practice is to install it on the same machine (or as a sidecar) where the storage node is installed.
|
||||
The backup service makes a backup every hour and puts it to the latest folder and then copies data to the folders which represent the backup intervals (hourly, daily, weekly and monthly)
|
||||
|
||||
The required flags for running the service are as follows:
|
||||
|
||||
* -eula - should be true and means that you have the legal right to run a backup manager. That can either be a signed contract or an email with confirmation to run the service in a trial period
|
||||
* -storageDataPath - path to VictoriaMetrics or vmstorage data path to make backup from
|
||||
* -snapshot.createURL - VictoriaMetrics creates snapshot URL which will automatically be created during backup. Example: http://victoriametrics:8428/snapshot/create
|
||||
* -dst - backup destination at s3, gcs or local filesystem
|
||||
* -credsFilePath - path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set. See [https://cloud.google.com/iam/docs/creating-managing-service-account-keys](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and [https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html](https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html)
|
||||
|
||||
|
||||
Backup schedule is controlled by the following flags:
|
||||
|
||||
* -disableHourly - disable hourly run. Default false
|
||||
* -disableDaily - disable daily run. Default false
|
||||
* -disableWeekly - disable weekly run. Default false
|
||||
* -disableMonthly - disable monthly run. Default false
|
||||
|
||||
By default, all flags are turned on and Backup Manager backups data every hour for every interval (hourly, daily, weekly and monthly).
|
||||
|
||||
|
||||
The backup manager creates the following directory hierarchy at **-dst**:
|
||||
|
||||
* /latest/ - contains the latest backup
|
||||
* /hourly/ - contains hourly backups. Each backup is named as *YYYY-MM-DD:HH*
|
||||
* /daily/ - contains daily backups. Each backup is named as *YYYY-MM-DD*
|
||||
* /weekly/ - contains weekly backups. Each backup is named as *YYYY-WW*
|
||||
* /monthly/ - contains monthly backups. Each backup is named as *YYYY-MM*
|
||||
|
||||
|
||||
To get the full list of supported flags please run the following command:
|
||||
|
||||
```console
|
||||
./vmbackupmanager --help
|
||||
```
|
||||
|
||||
The service creates a **full** backup each run. This means that the system can be restored fully from any particular backup using vmrestore. Backup manager uploads only the data that has been changed or created since the most recent backup (incremental backup).
|
||||
|
||||
*Please take into account that the first backup upload could take a significant amount of time as it needs to upload all of the data.*
|
||||
|
||||
There are two flags which could help with performance tuning:
|
||||
|
||||
* -maxBytesPerSecond - the maximum upload speed. There is no limit if it is set to 0
|
||||
* -concurrency - The number of concurrent workers. Higher concurrency may improve upload speed (default 10)
|
||||
|
||||
|
||||
## Example of Usage
|
||||
|
||||
GCS and cluster version. You need to have a credentials file in json format with following structure
|
||||
|
||||
credentials.json
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "service_account",
|
||||
"project_id": "<project>",
|
||||
"private_key_id": "",
|
||||
"private_key": "-----BEGIN PRIVATE KEY-----\-----END PRIVATE KEY-----\n",
|
||||
"client_email": “test@<project>.iam.gserviceaccount.com",
|
||||
"client_id": "",
|
||||
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
|
||||
"token_uri": "https://oauth2.googleapis.com/token",
|
||||
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
|
||||
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%40<project>.iam.gserviceaccount.com"
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
Backup manager launched with the following configuration:
|
||||
|
||||
```console
|
||||
export NODE_IP=192.168.0.10
|
||||
export VMSTORAGE_ENDPOINT=http://127.0.0.1:8428
|
||||
./vmbackupmanager -dst=gs://vmstorage-data/$NODE_IP -credsFilePath=credentials.json -storageDataPath=/vmstorage-data -snapshot.createURL=$VMSTORAGE_ENDPOINT/snapshot/create -eula
|
||||
```
|
||||
|
||||
Expected logs in vmbackupmanager:
|
||||
|
||||
```console
|
||||
info lib/backup/actions/backup.go:131 server-side copied 81 out of 81 parts from GCS{bucket: "vmstorage-data", dir: "192.168.0.10//latest/"} to GCS{bucket: "vmstorage-data", dir: "192.168.0.10//weekly/2020-34/"} in 2.549833008s
|
||||
info lib/backup/actions/backup.go:169 backed up 853315 bytes in 2.882 seconds; deleted 0 bytes; server-side copied 853315 bytes; uploaded 0 bytes
|
||||
```
|
||||
|
||||
Expected logs in vmstorage:
|
||||
|
||||
```console
|
||||
info VictoriaMetrics/lib/storage/table.go:146 creating table snapshot of "/vmstorage-data/data"...
|
||||
info VictoriaMetrics/lib/storage/storage.go:311 deleting snapshot "/vmstorage-data/snapshots/20200818201959-162C760149895DDA"...
|
||||
info VictoriaMetrics/lib/storage/storage.go:319 deleted snapshot "/vmstorage-data/snapshots/20200818201959-162C760149895DDA" in 0.169 seconds
|
||||
```
|
||||
|
||||
The result on the GCS bucket
|
||||
|
||||
- The root folder
|
||||
|
||||

|
||||
|
||||
- The latest folder
|
||||
|
||||

|
||||
|
||||
## Backup Retention Policy
|
||||
|
||||
Backup retention policy is controlled by:
|
||||
|
||||
* -keepLastHourly - keep the last N hourly backups. Disabled by default
|
||||
* -keepLastDaily - keep the last N daily backups. Disabled by default
|
||||
* -keepLastWeekly - keep the last N weekly backups. Disabled by default
|
||||
* -keepLastMonthly - keep the last N monthly backups. Disabled by default
|
||||
|
||||
*Note*: 0 value in every keepLast flag results into deletion ALL backups for particular type (hourly, daily, weekly and monthly)
|
||||
|
||||
Let’s assume we have a backup manager collecting daily backups for the past 10 days.
|
||||
|
||||

|
||||
|
||||
|
||||
We enable backup retention policy for backup manager by using following configuration:
|
||||
|
||||
```console
|
||||
export NODE_IP=192.168.0.10
|
||||
export VMSTORAGE_ENDPOINT=http://127.0.0.1:8428
|
||||
./vmbackupmanager -dst=gs://vmstorage-data/$NODE_IP -credsFilePath=credentials.json -storageDataPath=/vmstorage-data -snapshot.createURL=$VMSTORAGE_ENDPOINT/snapshot/create
|
||||
-keepLastDaily=3 -eula
|
||||
```
|
||||
|
||||
Expected logs in backup manager on start:
|
||||
|
||||
```console
|
||||
info lib/logger/flag.go:20 flag "keepLastDaily" = "3"
|
||||
```
|
||||
|
||||
Expected logs in backup manager during retention cycle:
|
||||
|
||||
```console
|
||||
info app/vmbackupmanager/retention.go:106 daily backups to delete [daily/2021-02-13 daily/2021-02-12 daily/2021-02-11 daily/2021-02-10 daily/2021-02-09 daily/2021-02-08 daily/2021-02-07]
|
||||
```
|
||||
|
||||
The result on the GCS bucket. We see only 3 daily backups:
|
||||
|
||||

|
||||
BIN
vmbackupmanager_latest_folder.png
Normal file
|
After Width: | Height: | Size: 29 KiB |
BIN
vmbackupmanager_root_folder.png
Normal file
|
After Width: | Height: | Size: 25 KiB |
BIN
vmbackupmanager_rp_daily_1.png
Normal file
|
After Width: | Height: | Size: 99 KiB |
BIN
vmbackupmanager_rp_daily_2.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
620
vmctl.md
Normal file
@@ -0,0 +1,620 @@
|
||||
---
|
||||
sort: 8
|
||||
---
|
||||
|
||||
# vmctl
|
||||
|
||||
VictoriaMetrics command-line tool
|
||||
|
||||
Features:
|
||||
- [x] Prometheus: migrate data from Prometheus to VictoriaMetrics using snapshot API
|
||||
- [x] Thanos: migrate data from Thanos to VictoriaMetrics
|
||||
- [ ] ~~Prometheus: migrate data from Prometheus to VictoriaMetrics by query~~(discarded)
|
||||
- [x] InfluxDB: migrate data from InfluxDB to VictoriaMetrics
|
||||
- [x] OpenTSDB: migrate data from OpenTSDB to VictoriaMetrics
|
||||
- [ ] Storage Management: data re-balancing between nodes
|
||||
|
||||
vmctl acts as a proxy between data source ([Prometheus](#migrating-data-from-prometheus),
|
||||
[InfluxDB](#migrating-data-from-influxdb-1x), [VictoriaMetrics](##migrating-data-from-victoriametrics), etc.)
|
||||
and destination - VictoriaMetrics single or cluster version. To see the full list of supported modes
|
||||
run the following command:
|
||||
```
|
||||
./vmctl --help
|
||||
NAME:
|
||||
vmctl - VictoriaMetrics command-line tool
|
||||
|
||||
USAGE:
|
||||
vmctl [global options] command [command options] [arguments...]
|
||||
|
||||
COMMANDS:
|
||||
opentsdb Migrate timeseries from OpenTSDB
|
||||
influx Migrate timeseries from InfluxDB
|
||||
prometheus Migrate timeseries from Prometheus
|
||||
vm-native Migrate time series between VictoriaMetrics installations via native binary format
|
||||
```
|
||||
|
||||
Each mode has its own unique set of flags specific (e.g. prefixed with `influx` for influx mode)
|
||||
to the data source and common list of flags for destination (prefixed with `vm` for VictoriaMetrics):
|
||||
```
|
||||
./vmctl influx --help
|
||||
OPTIONS:
|
||||
--influx-addr value InfluxDB server addr (default: "http://localhost:8086")
|
||||
--influx-user value InfluxDB user [$INFLUX_USERNAME]
|
||||
...
|
||||
--vm-addr vmctl VictoriaMetrics address to perform import requests.
|
||||
Should be the same as --httpListenAddr value for single-node version or vminsert component.
|
||||
When importing into the clustered version do not forget to set additionally --vm-account-id flag.
|
||||
Please note, that vmctl performs initial readiness check for the given address by checking `/health` endpoint. (default: "http://localhost:8428")
|
||||
--vm-user value VictoriaMetrics username for basic auth [$VM_USERNAME]
|
||||
--vm-password value VictoriaMetrics password for basic auth [$VM_PASSWORD]
|
||||
```
|
||||
|
||||
When doing a migration user needs to specify flags for source (where and how to fetch data) and for
|
||||
destination (where to migrate data). Every mode has additional details and nuances, please see
|
||||
them below in corresponding sections.
|
||||
|
||||
For the destination flags see the full description by running the following command:
|
||||
```
|
||||
./vmctl influx --help | grep vm-
|
||||
```
|
||||
|
||||
Some flags like [--vm-extra-label](#adding-extra-labels) or [--vm-significant-figures](#significant-figures)
|
||||
has additional sections with description below. Details about tweaking and adjusting settings
|
||||
are explained in [Tuning](#tuning) section.
|
||||
|
||||
Please note, that if you're going to import data into VictoriaMetrics cluster do not
|
||||
forget to specify the `--vm-account-id` flag. See more details for cluster version
|
||||
[here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
|
||||
|
||||
## Articles
|
||||
|
||||
* [How to migrate data from Prometheus](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043)
|
||||
* [How to migrate data from Prometheus. Filtering and modifying time series](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-filtering-and-modifying-time-series-6d40cea4bf21)
|
||||
|
||||
|
||||
## Migrating data from OpenTSDB
|
||||
|
||||
`vmctl` supports the `opentsdb` mode to migrate data from OpenTSDB to VictoriaMetrics time-series database.
|
||||
|
||||
See `./vmctl opentsdb --help` for details and full list of flags.
|
||||
|
||||
*OpenTSDB migration is not possible without a functioning [meta](http://opentsdb.net/docs/build/html/user_guide/metadata.html) table to search for metrics/series.*
|
||||
|
||||
OpenTSDB migration works like so:
|
||||
|
||||
1. Find metrics based on selected filters (or the default filter set ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'])
|
||||
* e.g. `curl -Ss "http://opentsdb:4242/api/suggest?type=metrics&q=sys"`
|
||||
2. Find series associated with each returned metric
|
||||
* e.g. `curl -Ss "http://opentsdb:4242/api/search/lookup?m=system.load5&limit=1000000"`
|
||||
3. Download data for each series in chunks defined in the CLI switches
|
||||
* e.g. `-retention=sum-1m-avg:1h:90d` ==
|
||||
* `curl -Ss "http://opentsdb:4242/api/query?start=1h-ago&end=now&m=sum:1m-avg-none:system.load5\{host=host1\}"`
|
||||
* `curl -Ss "http://opentsdb:4242/api/query?start=2h-ago&end=1h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
|
||||
* `curl -Ss "http://opentsdb:4242/api/query?start=3h-ago&end=2h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
|
||||
* ...
|
||||
* `curl -Ss "http://opentsdb:4242/api/query?start=2160h-ago&end=2159h-ago&m=sum:1m-avg-none:system.load5\{host=host1\}"`
|
||||
|
||||
This means that we must stream data from OpenTSDB to VictoriaMetrics in chunks. This is where concurrency for OpenTSDB comes in. We can query multiple chunks at once, but we shouldn't perform too many chunks at a time to avoid overloading the OpenTSDB cluster.
|
||||
|
||||
```
|
||||
$ bin/vmctl opentsdb --otsdb-addr http://opentsdb:4242/ --otsdb-retentions sum-1m-avg:1h:1d --otsdb-filters system --otsdb-normalize --vm-addr http://victoria/
|
||||
OpenTSDB import mode
|
||||
2021/04/09 11:52:50 Will collect data starting at TS 1617990770
|
||||
2021/04/09 11:52:50 Loading all metrics from OpenTSDB for filters: [system]
|
||||
Found 9 metrics to import. Continue? [Y/n]
|
||||
2021/04/09 11:52:51 Starting work on system.load1
|
||||
23 / 402200 [>____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________] 0.01% 2 p/s
|
||||
```
|
||||
|
||||
### Retention strings
|
||||
|
||||
Starting with a relatively simple retention string (`sum-1m-avg:1h:30d`), let's describe how this is converted into actual queries.
|
||||
|
||||
There are two essential parts of a retention string:
|
||||
1. [aggregation](#aggregation)
|
||||
2. [windows/time ranges](#windows)
|
||||
|
||||
#### Aggregation
|
||||
|
||||
Retention strings essentially define the two levels of aggregation for our collected series.
|
||||
|
||||
`sum-1m-avg` would become:
|
||||
* First order: `sum`
|
||||
* Second order: `1m-avg-none`
|
||||
|
||||
##### First Order Aggregations
|
||||
|
||||
First-order aggregation addresses how to aggregate any un-mentioned tags.
|
||||
|
||||
This is, conceptually, directly opposite to how PromQL deals with tags. In OpenTSDB, if a tag isn't explicitly mentioned, all values assocaited with that tag will be aggregated.
|
||||
|
||||
It is recommended to use `sum` for the first aggregation because it is relatively quick and should not cause any changes to the incoming data (because we collect each individual series).
|
||||
|
||||
##### Second Order Aggregations
|
||||
|
||||
Second-order aggregation (`1m-avg` in our example) defines any windowing that should occur before returning the data
|
||||
|
||||
It is recommended to match the stat collection interval so we again avoid transforming incoming data.
|
||||
|
||||
We do not allow for defining the "null value" portion of the rollup window (e.g. in the aggreagtion, `1m-avg-none`, the user cannot change `none`), as the goal of this tool is to avoid modifying incoming data.
|
||||
|
||||
#### Windows
|
||||
|
||||
There are two important windows we define in a retention string:
|
||||
1. the "chunk" range of each query
|
||||
2. The time range we will be querying on with that "chunk"
|
||||
|
||||
From our example, our windows are `1h:30d`.
|
||||
|
||||
##### Window "chunks"
|
||||
|
||||
The window `1h` means that each individual query to OpenTSDB should only span 1 hour of time (e.g. `start=2h-ago&end=1h-ago`).
|
||||
|
||||
It is important to ensure this window somewhat matches the row size in HBase to help improve query times.
|
||||
|
||||
For example, if the query is hitting a rollup table with a 4 hour row size, we should set a chunk size of a multiple of 4 hours (e.g. `4h`, `8h`, etc.) to avoid requesting data across row boundaries. Landing on row boundaries allows for more consistent request times to HBase.
|
||||
|
||||
The default table created in HBase for OpenTSDB has a 1 hour row size, so if you aren't sure on a correct row size to use, `1h` is a reasonable choice.
|
||||
|
||||
##### Time range
|
||||
|
||||
The time range `30d` simply means we are asking for the last 30 days of data. This time range can be written using `h`, `d`, `w`, or `y`. (We can't use `m` for month because it already means `minute` in time parsing).
|
||||
|
||||
#### Results of retention string
|
||||
|
||||
The resultant queries that will be created, based on our example retention string of `sum-1m-avg:1h:30d` look like this:
|
||||
|
||||
```
|
||||
http://opentsdb:4242/api/query?start=1h-ago&end=now&m=sum:1m-avg-none:<series>
|
||||
http://opentsdb:4242/api/query?start=2h-ago&end=1h-ago&m=sum:1m-avg-none:<series>
|
||||
http://opentsdb:4242/api/query?start=3h-ago&end=2h-ago&m=sum:1m-avg-none:<series>
|
||||
...
|
||||
http://opentsdb:4242/api/query?start=721h-ago&end=720h-ago&m=sum:1m-avg-none:<series>
|
||||
```
|
||||
|
||||
Chunking the data like this means each individual query returns faster, so we can start populating data into VictoriaMetrics quicker.
|
||||
|
||||
### Restarting OpenTSDB migrations
|
||||
|
||||
One important note for OpenTSDB migration: Queries/HBase scans can "get stuck" within OpenTSDB itself. This can cause instability and performance issues within an OpenTSDB cluster, so stopping the migrator to deal with it may be necessary. Because of this, we provide the timstamp we started collecting data from at thebeginning of the run. You can stop and restart the importer using this "hard timestamp" to ensure you collect data from the same time range over multiple runs.
|
||||
|
||||
## Migrating data from InfluxDB (1.x)
|
||||
|
||||
`vmctl` supports the `influx` mode to migrate data from InfluxDB to VictoriaMetrics time-series database.
|
||||
|
||||
See `./vmctl influx --help` for details and full list of flags.
|
||||
|
||||
To use migration tool please specify the InfluxDB address `--influx-addr`, the database `--influx-database` and VictoriaMetrics address `--vm-addr`.
|
||||
Flag `--vm-addr` for single-node VM is usually equal to `--httpListenAddr`, and for cluster version
|
||||
is equal to `--httpListenAddr` flag of vminsert component. Please note, that vmctl performs initial readiness check for the given address
|
||||
by checking `/health` endpoint. For cluster version it is additionally required to specify the `--vm-account-id` flag.
|
||||
See more details for cluster version [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
|
||||
|
||||
As soon as required flags are provided and all endpoints are accessible, `vmctl` will start the InfluxDB scheme exploration.
|
||||
Basically, it just fetches all fields and timeseries from the provided database and builds up registry of all available timeseries.
|
||||
Then `vmctl` sends fetch requests for each timeseries to InfluxDB one by one and pass results to VM importer.
|
||||
VM importer then accumulates received samples in batches and sends import requests to VM.
|
||||
|
||||
The importing process example for local installation of InfluxDB(`http://localhost:8086`)
|
||||
and single-node VictoriaMetrics(`http://localhost:8428`):
|
||||
```
|
||||
./vmctl influx --influx-database benchmark
|
||||
InfluxDB import mode
|
||||
2020/01/18 20:47:11 Exploring scheme for database "benchmark"
|
||||
2020/01/18 20:47:11 fetching fields: command: "show field keys"; database: "benchmark"; retention: "autogen"
|
||||
2020/01/18 20:47:11 found 10 fields
|
||||
2020/01/18 20:47:11 fetching series: command: "show series "; database: "benchmark"; retention: "autogen"
|
||||
Found 40000 timeseries to import. Continue? [Y/n] y
|
||||
40000 / 40000 [-----------------------------------------------------------------------------------------------------------------------------------------------] 100.00% 21 p/s
|
||||
2020/01/18 21:19:00 Import finished!
|
||||
2020/01/18 21:19:00 VictoriaMetrics importer stats:
|
||||
idle duration: 13m51.461434876s;
|
||||
time spent while importing: 17m56.923899847s;
|
||||
total samples: 345600000;
|
||||
samples/s: 320914.04;
|
||||
total bytes: 5.9 GB;
|
||||
bytes/s: 5.4 MB;
|
||||
import requests: 40001;
|
||||
2020/01/18 21:19:00 Total time: 31m48.467044016s
|
||||
```
|
||||
|
||||
### Data mapping
|
||||
|
||||
Vmctl maps InfluxDB data the same way as VictoriaMetrics does by using the following rules:
|
||||
|
||||
* `influx-database` arg is mapped into `db` label value unless `db` tag exists in the InfluxDB line.
|
||||
* Field names are mapped to time series names prefixed with {measurement}{separator} value,
|
||||
where {separator} equals to _ by default.
|
||||
It can be changed with `--influx-measurement-field-separator` command-line flag.
|
||||
* Field values are mapped to time series values.
|
||||
* Tags are mapped to Prometheus labels format as-is.
|
||||
|
||||
For example, the following InfluxDB line:
|
||||
```
|
||||
foo,tag1=value1,tag2=value2 field1=12,field2=40
|
||||
```
|
||||
|
||||
is converted into the following Prometheus format data points:
|
||||
```
|
||||
foo_field1{tag1="value1", tag2="value2"} 12
|
||||
foo_field2{tag1="value1", tag2="value2"} 40
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
The configuration flags should contain self-explanatory descriptions.
|
||||
|
||||
### Filtering
|
||||
|
||||
The filtering consists of two parts: timeseries and time.
|
||||
The first step of application is to select all available timeseries
|
||||
for given database and retention. User may specify additional filtering
|
||||
condition via `--influx-filter-series` flag. For example:
|
||||
```
|
||||
./vmctl influx --influx-database benchmark \
|
||||
--influx-filter-series "on benchmark from cpu where hostname='host_1703'"
|
||||
InfluxDB import mode
|
||||
2020/01/26 14:23:29 Exploring scheme for database "benchmark"
|
||||
2020/01/26 14:23:29 fetching fields: command: "show field keys"; database: "benchmark"; retention: "autogen"
|
||||
2020/01/26 14:23:29 found 12 fields
|
||||
2020/01/26 14:23:29 fetching series: command: "show series on benchmark from cpu where hostname='host_1703'"; database: "benchmark"; retention: "autogen"
|
||||
Found 10 timeseries to import. Continue? [Y/n]
|
||||
```
|
||||
The timeseries select query would be following:
|
||||
`fetching series: command: "show series on benchmark from cpu where hostname='host_1703'"; database: "benchmark"; retention: "autogen"`
|
||||
|
||||
The second step of filtering is a time filter and it applies when fetching the datapoints from Influx.
|
||||
Time filtering may be configured with two flags:
|
||||
* --influx-filter-time-start
|
||||
* --influx-filter-time-end
|
||||
Here's an example of importing timeseries for one day only:
|
||||
`./vmctl influx --influx-database benchmark --influx-filter-series "where hostname='host_1703'" --influx-filter-time-start "2020-01-01T10:07:00Z" --influx-filter-time-end "2020-01-01T15:07:00Z"`
|
||||
|
||||
Please see more about time filtering [here](https://docs.influxdata.com/influxdb/v1.7/query_language/schema_exploration#filter-meta-queries-by-time).
|
||||
|
||||
## Migrating data from InfluxDB (2.x)
|
||||
|
||||
Migrating data from InfluxDB v2.x is not supported yet ([#32](https://github.com/VictoriaMetrics/vmctl/issues/32)).
|
||||
You may find useful a 3rd party solution for this - https://github.com/jonppe/influx_to_victoriametrics.
|
||||
|
||||
|
||||
## Migrating data from Prometheus
|
||||
|
||||
`vmctl` supports the `prometheus` mode for migrating data from Prometheus to VictoriaMetrics time-series database.
|
||||
Migration is based on reading Prometheus snapshot, which is basically a hard-link to Prometheus data files.
|
||||
|
||||
See `./vmctl prometheus --help` for details and full list of flags. Also see Prometheus related articles [here](#articles).
|
||||
|
||||
To use migration tool please specify the file path to Prometheus snapshot `--prom-snapshot` (see how to make a snapshot [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data)) and VictoriaMetrics address `--vm-addr`.
|
||||
Please note, that `vmctl` *do not make a snapshot from Prometheus*, it uses an already prepared snapshot. More about Prometheus snapshots may be found [here](https://www.robustperception.io/taking-snapshots-of-prometheus-data) and [here](https://medium.com/@romanhavronenko/victoriametrics-how-to-migrate-data-from-prometheus-d44a6728f043).
|
||||
Flag `--vm-addr` for single-node VM is usually equal to `--httpListenAddr`, and for cluster version
|
||||
is equal to `--httpListenAddr` flag of vminsert component. Please note, that vmctl performs initial readiness check for the given address
|
||||
by checking `/health` endpoint. For cluster version it is additionally required to specify the `--vm-account-id` flag.
|
||||
See more details for cluster version [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster).
|
||||
|
||||
As soon as required flags are provided and all endpoints are accessible, `vmctl` will start the Prometheus snapshot exploration.
|
||||
Basically, it just fetches all available blocks in provided snapshot and read the metadata. It also does initial filtering by time
|
||||
if flags `--prom-filter-time-start` or `--prom-filter-time-end` were set. The exploration procedure prints some stats from read blocks.
|
||||
Please note that stats are not taking into account timeseries or samples filtering. This will be done during importing process.
|
||||
|
||||
The importing process takes the snapshot blocks revealed from Explore procedure and processes them one by one
|
||||
accumulating timeseries and samples. Please note, that `vmctl` relies on responses from InfluxDB on this stage,
|
||||
so ensure that Explore queries are executed without errors or limits. Please see this
|
||||
[issue](https://github.com/VictoriaMetrics/vmctl/issues/30) for details.
|
||||
The data processed in chunks and then sent to VM.
|
||||
|
||||
The importing process example for local installation of Prometheus
|
||||
and single-node VictoriaMetrics(`http://localhost:8428`):
|
||||
```
|
||||
./vmctl prometheus --prom-snapshot=/path/to/snapshot \
|
||||
--vm-concurrency=1 \
|
||||
--vm-batch-size=200000 \
|
||||
--prom-concurrency=3
|
||||
Prometheus import mode
|
||||
Prometheus snapshot stats:
|
||||
blocks found: 14;
|
||||
blocks skipped: 0;
|
||||
min time: 1581288163058 (2020-02-09T22:42:43Z);
|
||||
max time: 1582409128139 (2020-02-22T22:05:28Z);
|
||||
samples: 32549106;
|
||||
series: 27289.
|
||||
Found 14 blocks to import. Continue? [Y/n] y
|
||||
14 / 14 [-------------------------------------------------------------------------------------------] 100.00% 0 p/s
|
||||
2020/02/23 15:50:03 Import finished!
|
||||
2020/02/23 15:50:03 VictoriaMetrics importer stats:
|
||||
idle duration: 6.152953029s;
|
||||
time spent while importing: 44.908522491s;
|
||||
total samples: 32549106;
|
||||
samples/s: 724786.84;
|
||||
total bytes: 669.1 MB;
|
||||
bytes/s: 14.9 MB;
|
||||
import requests: 323;
|
||||
import requests retries: 0;
|
||||
2020/02/23 15:50:03 Total time: 51.077451066s
|
||||
```
|
||||
|
||||
### Data mapping
|
||||
|
||||
VictoriaMetrics has very similar data model to Prometheus and supports [RemoteWrite integration](https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage).
|
||||
So no data changes will be applied.
|
||||
|
||||
### Configuration
|
||||
|
||||
The configuration flags should contain self-explanatory descriptions.
|
||||
|
||||
### Filtering
|
||||
|
||||
The filtering consists of three parts: by timeseries and time.
|
||||
|
||||
Filtering by time may be configured via flags `--prom-filter-time-start` and `--prom-filter-time-end`
|
||||
in in RFC3339 format. This filter applied twice: to drop blocks out of range and to filter timeseries in blocks with
|
||||
overlapping time range.
|
||||
|
||||
Example of applying time filter:
|
||||
```
|
||||
./vmctl prometheus --prom-snapshot=/path/to/snapshot \
|
||||
--prom-filter-time-start=2020-02-07T00:07:01Z \
|
||||
--prom-filter-time-end=2020-02-11T00:07:01Z
|
||||
Prometheus import mode
|
||||
Prometheus snapshot stats:
|
||||
blocks found: 2;
|
||||
blocks skipped: 12;
|
||||
min time: 1581288163058 (2020-02-09T22:42:43Z);
|
||||
max time: 1581328800000 (2020-02-10T10:00:00Z);
|
||||
samples: 1657698;
|
||||
series: 3930.
|
||||
Found 2 blocks to import. Continue? [Y/n] y
|
||||
```
|
||||
|
||||
Please notice, that total amount of blocks in provided snapshot is 14, but only 2 of them were in provided
|
||||
time range. So other 12 blocks were marked as `skipped`. The amount of samples and series is not taken into account,
|
||||
since this is heavy operation and will be done during import process.
|
||||
|
||||
|
||||
Filtering by timeseries is configured with following flags:
|
||||
* `--prom-filter-label` - the label name, e.g. `__name__` or `instance`;
|
||||
* `--prom-filter-label-value` - the regular expression to filter the label value. By default matches all `.*`
|
||||
|
||||
For example:
|
||||
```
|
||||
./vmctl prometheus --prom-snapshot=/path/to/snapshot \
|
||||
--prom-filter-label="__name__" \
|
||||
--prom-filter-label-value="promhttp.*" \
|
||||
--prom-filter-time-start=2020-02-07T00:07:01Z \
|
||||
--prom-filter-time-end=2020-02-11T00:07:01Z
|
||||
Prometheus import mode
|
||||
Prometheus snapshot stats:
|
||||
blocks found: 2;
|
||||
blocks skipped: 12;
|
||||
min time: 1581288163058 (2020-02-09T22:42:43Z);
|
||||
max time: 1581328800000 (2020-02-10T10:00:00Z);
|
||||
samples: 1657698;
|
||||
series: 3930.
|
||||
Found 2 blocks to import. Continue? [Y/n] y
|
||||
14 / 14 [------------------------------------------------------------------------------------------------------------------------------------------------------] 100.00% ? p/s
|
||||
2020/02/23 15:51:07 Import finished!
|
||||
2020/02/23 15:51:07 VictoriaMetrics importer stats:
|
||||
idle duration: 0s;
|
||||
time spent while importing: 37.415461ms;
|
||||
total samples: 10128;
|
||||
samples/s: 270690.24;
|
||||
total bytes: 195.2 kB;
|
||||
bytes/s: 5.2 MB;
|
||||
import requests: 2;
|
||||
import requests retries: 0;
|
||||
2020/02/23 15:51:07 Total time: 7.153158218s
|
||||
```
|
||||
|
||||
## Migrating data from Thanos
|
||||
|
||||
Thanos uses the same storage engine as Prometheus and the data layout on-disk should be the same. That means
|
||||
`vmctl` in mode `prometheus` may be used for Thanos historical data migration as well.
|
||||
These instructions may vary based on the details of your Thanos configuration.
|
||||
Please read carefully and verify as you go. We assume you're using Thanos Sidecar on your Prometheus pods,
|
||||
and that you have a separate Thanos Store installation.
|
||||
|
||||
### Current data
|
||||
|
||||
1. For now, keep your Thanos Sidecar and Thanos-related Prometheus configuration, but add this to also stream
|
||||
metrics to VictoriaMetrics:
|
||||
```
|
||||
remote_write:
|
||||
- url: http://victoria-metrics:8428/api/v1/write
|
||||
```
|
||||
2. Make sure VM is running, of course. Now check the logs to make sure that Prometheus is sending and VM is receiving.
|
||||
In Prometheus, make sure there are no errors. On the VM side, you should see messages like this:
|
||||
```
|
||||
2020-04-27T18:38:46.474Z info VictoriaMetrics/lib/storage/partition.go:207 creating a partition "2020_04" with smallPartsPath="/victoria-metrics-data/data/small/2020_04", bigPartsPath="/victoria-metrics-data/data/big/2020_04"
|
||||
2020-04-27T18:38:46.506Z info VictoriaMetrics/lib/storage/partition.go:222 partition "2020_04" has been created
|
||||
```
|
||||
3. Now just wait. Within two hours, Prometheus should finish its current data file and hand it off to Thanos Store for long term
|
||||
storage.
|
||||
|
||||
### Historical data
|
||||
|
||||
Let's assume your data is stored on S3 served by minio. You first need to copy that out to a local filesystem,
|
||||
then import it into VM using `vmctl` in `prometheus` mode.
|
||||
1. Copy data from minio.
|
||||
1. Run the `minio/mc` Docker container.
|
||||
1. `mc config host add minio http://minio:9000 accessKey secretKey`, substituting appropriate values for the last 3 items.
|
||||
1. `mc cp -r minio/prometheus thanos-data`
|
||||
1. Import using `vmctl`.
|
||||
1. Follow the [instructions](#how-to-build) to compile `vmctl` on your machine.
|
||||
1. Use [prometheus](#migrating-data-from-prometheus) mode to import data:
|
||||
```
|
||||
vmctl prometheus --prom-snapshot thanos-data --vm-addr http://victoria-metrics:8428
|
||||
```
|
||||
|
||||
## Migrating data from VictoriaMetrics
|
||||
|
||||
### Native protocol
|
||||
|
||||
The [native binary protocol](https://docs.victoriametrics.com/#how-to-export-data-in-native-format)
|
||||
was introduced in [1.42.0 release](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.42.0)
|
||||
and provides the most efficient way to migrate data between VM instances: single to single, cluster to cluster,
|
||||
single to cluster and vice versa. Please note that both instances (source and destination) should be of v1.42.0
|
||||
or higher.
|
||||
|
||||
See `./vmctl vm-native --help` for details and full list of flags.
|
||||
|
||||
In this mode `vmctl` acts as a proxy between two VM instances, where time series filtering is done by "source" (`src`)
|
||||
and processing is done by "destination" (`dst`). Because of that, `vmctl` doesn't actually know how much data will be
|
||||
processed and can't show the progress bar. It will show the current processing speed and total number of processed bytes:
|
||||
|
||||
```
|
||||
./vmctl vm-native --vm-native-src-addr=http://localhost:8528 \
|
||||
--vm-native-dst-addr=http://localhost:8428 \
|
||||
--vm-native-filter-match='{job="vmagent"}' \
|
||||
--vm-native-filter-time-start='2020-01-01T20:07:00Z'
|
||||
VictoriaMetrics Native import mode
|
||||
Initing export pipe from "http://localhost:8528" with filters:
|
||||
filter: match[]={job="vmagent"}
|
||||
Initing import process to "http://localhost:8428":
|
||||
Total: 336.75 KiB ↖ Speed: 454.46 KiB p/s
|
||||
2020/10/13 17:04:59 Total time: 952.143376ms
|
||||
```
|
||||
|
||||
Importing tips:
|
||||
1. Migrating all the metrics from one VM to another may collide with existing application metrics
|
||||
(prefixed with `vm_`) at destination and lead to confusion when using
|
||||
[official Grafana dashboards](https://grafana.com/orgs/victoriametrics/dashboards).
|
||||
To avoid such situation try to filter out VM process metrics via `--vm-native-filter-match` flag.
|
||||
2. Migration is a backfilling process, so it is recommended to read
|
||||
[Backfilling tips](https://github.com/VictoriaMetrics/VictoriaMetrics#backfilling) section.
|
||||
3. `vmctl` doesn't provide relabeling or other types of labels management in this mode.
|
||||
Instead, use [relabeling in VictoriaMetrics](https://github.com/VictoriaMetrics/vmctl/issues/4#issuecomment-683424375).
|
||||
4. When importing in or from cluster version remember to use correct [URL format](https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#url-format)
|
||||
and specify `accountID` param.
|
||||
|
||||
## Tuning
|
||||
|
||||
### InfluxDB mode
|
||||
|
||||
The flag `--influx-concurrency` controls how many concurrent requests may be sent to InfluxDB while fetching
|
||||
timeseries. Please set it wisely to avoid InfluxDB overwhelming.
|
||||
|
||||
The flag `--influx-chunk-size` controls the max amount of datapoints to return in single chunk from fetch requests.
|
||||
Please see more details [here](https://docs.influxdata.com/influxdb/v1.7/guides/querying_data/#chunking).
|
||||
The chunk size is used to control InfluxDB memory usage, so it won't OOM on processing large timeseries with
|
||||
billions of datapoints.
|
||||
|
||||
### Prometheus mode
|
||||
|
||||
The flag `--prom-concurrency` controls how many concurrent readers will be reading the blocks in snapshot.
|
||||
Since snapshots are just files on disk it would be hard to overwhelm the system. Please go with value equal
|
||||
to number of free CPU cores.
|
||||
|
||||
### VictoriaMetrics importer
|
||||
|
||||
The flag `--vm-concurrency` controls the number of concurrent workers that process the input from InfluxDB query results.
|
||||
Please note that each import request can load up to a single vCPU core on VictoriaMetrics. So try to set it according
|
||||
to allocated CPU resources of your VictoriMetrics installation.
|
||||
|
||||
The flag `--vm-batch-size` controls max amount of samples collected before sending the import request.
|
||||
For example, if `--influx-chunk-size=500` and `--vm-batch-size=2000` then importer will process not more
|
||||
than 4 chunks before sending the request.
|
||||
|
||||
### Importer stats
|
||||
|
||||
After successful import `vmctl` prints some statistics for details.
|
||||
The important numbers to watch are following:
|
||||
- `idle duration` - shows time that importer spent while waiting for data from InfluxDB/Prometheus
|
||||
to fill up `--vm-batch-size` batch size. Value shows total duration across all workers configured
|
||||
via `--vm-concurrency`. High value may be a sign of too slow InfluxDB/Prometheus fetches or too
|
||||
high `--vm-concurrency` value. Try to improve it by increasing `--<mode>-concurrency` value or
|
||||
decreasing `--vm-concurrency` value.
|
||||
- `import requests` - shows how many import requests were issued to VM server.
|
||||
The import request is issued once the batch size(`--vm-batch-size`) is full and ready to be sent.
|
||||
Please prefer big batch sizes (50k-500k) to improve performance.
|
||||
- `import requests retries` - shows number of unsuccessful import requests. Non-zero value may be
|
||||
a sign of network issues or VM being overloaded. See the logs during import for error messages.
|
||||
|
||||
### Silent mode
|
||||
|
||||
By default `vmctl` waits confirmation from user before starting the import. If this is unwanted
|
||||
behavior and no user interaction required - pass `-s` flag to enable "silence" mode:
|
||||
```
|
||||
-s Whether to run in silent mode. If set to true no confirmation prompts will appear. (default: false)
|
||||
```
|
||||
|
||||
### Significant figures
|
||||
|
||||
`vmctl` allows to limit the number of [significant figures](https://en.wikipedia.org/wiki/Significant_figures)
|
||||
before importing. For example, the average value for response size is `102.342305` bytes and it has 9 significant figures.
|
||||
If you ask a human to pronounce this value then with high probability value will be rounded to first 4 or 5 figures
|
||||
because the rest aren't really that important to mention. In most cases, such a high precision is too much.
|
||||
Moreover, such values may be just a result of [floating point arithmetic](https://en.wikipedia.org/wiki/Floating-point_arithmetic),
|
||||
create a [false precision](https://en.wikipedia.org/wiki/False_precision) and result into bad compression ratio
|
||||
according to [information theory](https://en.wikipedia.org/wiki/Information_theory).
|
||||
|
||||
`vmctl` provides the following flags for improving data compression:
|
||||
|
||||
* `--vm-round-digits` flag for rounding processed values to the given number of decimal digits after the point.
|
||||
For example, `--vm-round-digits=2` would round `1.2345` to `1.23`. By default the rounding is disabled.
|
||||
|
||||
* `--vm-significant-figures` flag for limiting the number of significant figures in processed values. It takes no effect if set
|
||||
to 0 (by default), but set `--vm-significant-figures=5` and `102.342305` will be rounded to `102.34`.
|
||||
|
||||
The most common case for using these flags is to improve data compression for time series storing aggregation
|
||||
results such as `average`, `rate`, etc.
|
||||
|
||||
### Adding extra labels
|
||||
|
||||
`vmctl` allows to add extra labels to all imported series. It can be achived with flag `--vm-extra-label label=value`.
|
||||
If multiple labels needs to be added, set flag for each label, for example, `--vm-extra-label label1=value1 --vm-extra-label label2=value2`.
|
||||
If timeseries already have label, that must be added with `--vm-extra-label` flag, flag has priority and will override label value from timeseries.
|
||||
|
||||
### Rate limiting
|
||||
|
||||
Limiting the rate of data transfer could help to reduce pressure on disk or on destination database.
|
||||
The rate limit may be set in bytes-per-second via `--vm-rate-limit` flag.
|
||||
|
||||
Please note, you can also use [vmagent](https://docs.victoriametrics.com/vmagent.html)
|
||||
as a proxy between `vmctl` and destination with `-remoteWrite.rateLimit` flag enabled.
|
||||
|
||||
|
||||
## How to build
|
||||
|
||||
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - `vmctl` is located in `vmutils-*` archives there.
|
||||
|
||||
|
||||
### Development build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmctl` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmctl` binary and puts it into the `bin` folder.
|
||||
|
||||
### Production build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmctl-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmctl-prod` binary and puts it into the `bin` folder.
|
||||
|
||||
### Building docker images
|
||||
|
||||
Run `make package-vmctl`. It builds `victoriametrics/vmctl:<PKG_TAG>` docker image locally.
|
||||
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
|
||||
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmctl`.
|
||||
|
||||
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
|
||||
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
|
||||
|
||||
```bash
|
||||
ROOT_IMAGE=scratch make package-vmctl
|
||||
```
|
||||
|
||||
### ARM build
|
||||
|
||||
ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
|
||||
|
||||
#### Development ARM build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmctl-arm` or `make vmctl-arm64` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmctl-arm` or `vmctl-arm64` binary respectively and puts it into the `bin` folder.
|
||||
|
||||
#### Production ARM build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmctl-arm-prod` or `make vmctl-arm64-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmctl-arm-prod` or `vmctl-arm64-prod` binary respectively and puts it into the `bin` folder.
|
||||
BIN
vmgateway-access-control.jpg
Normal file
|
After Width: | Height: | Size: 40 KiB |
BIN
vmgateway-overview.jpeg
Normal file
|
After Width: | Height: | Size: 48 KiB |
BIN
vmgateway-rate-limiting.jpg
Normal file
|
After Width: | Height: | Size: 35 KiB |
295
vmgateway.md
Normal file
@@ -0,0 +1,295 @@
|
||||
---
|
||||
sort: 9
|
||||
---
|
||||
|
||||
# vmgateway
|
||||
|
||||
***vmgateway is a part of [enterprise package](https://victoriametrics.com/products/enterprise/). It is available for download and evaluation at [releases page](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)***
|
||||
|
||||
|
||||
<img alt="vmgateway" src="vmgateway-overview.jpeg">
|
||||
|
||||
`vmgateway` is a proxy for the VictoriaMetrics Time Series Database (TSDB). It provides the following features:
|
||||
|
||||
* Rate Limiter
|
||||
* Based on cluster tenant's utilization, it supports multiple time interval limits for both the ingestion and retrieval of metrics
|
||||
* Token Access Control
|
||||
* Supports additional per-label access control for both the Single and Cluster versions of the VictoriaMetrics TSDB
|
||||
* Provides access by tenantID in the Cluster version
|
||||
* Allows for separate write/read/admin access to data
|
||||
|
||||
`vmgateway` is included in our [enterprise packages](https://victoriametrics.com/products/enterprise/).
|
||||
|
||||
|
||||
## Access Control
|
||||
|
||||
<img alt="vmgateway-ac" src="vmgateway-access-control.jpg">
|
||||
|
||||
`vmgateway` supports jwt based authentication. With jwt payload can be configured to give access to specific tenants and labels as well as to read/write.
|
||||
|
||||
jwt token must be in following format:
|
||||
```json
|
||||
{
|
||||
"exp": 1617304574,
|
||||
"vm_access": {
|
||||
"tenant_id": {
|
||||
"account_id": 1,
|
||||
"project_id": 5
|
||||
},
|
||||
"extra_labels": {
|
||||
"team": "dev",
|
||||
"project": "mobile"
|
||||
},
|
||||
"extra_filters": ["{env~=\"prod|dev\",team!=\"test\"}"],
|
||||
"mode": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
Where:
|
||||
- `exp` - required, expire time in unix_timestamp. If the token expires then `vmgateway` rejects the request.
|
||||
- `vm_access` - required, dict with claim info, minimum form: `{"vm_access": {"tenand_id": {}}`
|
||||
- `tenant_id` - optional, for cluster mode, routes requests to the corresponding tenant.
|
||||
- `extra_labels` - optional, key-value pairs for label filters added to the ingested or selected metrics. Multiple filters are added with `and` operation. If defined, `extra_label` from original request removed.
|
||||
- `extra_filters` - optional, [series selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-series-selectors) added to the select query requests. Multiple selectors are added with `or` operation. If defined, `extra_filter` from original request removed.
|
||||
- `mode` - optional, access mode for api - read, write, or full. Supported values: 0 - full (default value), 1 - read, 2 - write.
|
||||
|
||||
## QuickStart
|
||||
|
||||
Start the single version of VictoriaMetrics
|
||||
|
||||
```bash
|
||||
# single
|
||||
# start node
|
||||
./bin/victoria-metrics --selfScrapeInterval=10s
|
||||
```
|
||||
|
||||
Start vmgateway
|
||||
|
||||
```bash
|
||||
./bin/vmgateway -eula -enable.auth -read.url http://localhost:8428 --write.url http://localhost:8428
|
||||
```
|
||||
|
||||
Retrieve data from the database
|
||||
```bash
|
||||
curl 'http://localhost:8431/api/v1/series/count' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ2bV9hY2Nlc3MiOnsidGVuYW50X2lkIjp7fSwicm9sZSI6MX0sImV4cCI6MTkzOTM0NjIxMH0.5WUxEfdcV9hKo4CtQdtuZYOGpGXWwaqM9VuVivMMrVg'
|
||||
```
|
||||
|
||||
A request with an incorrect token or without any token will be rejected:
|
||||
```bash
|
||||
curl 'http://localhost:8431/api/v1/series/count'
|
||||
|
||||
curl 'http://localhost:8431/api/v1/series/count' -H 'Authorization: Bearer incorrect-token'
|
||||
```
|
||||
|
||||
|
||||
## Rate Limiter
|
||||
|
||||
<img alt="vmgateway-rl" src="vmgateway-rate-limiting.jpg">
|
||||
|
||||
Limits incoming requests by given, pre-configured limits. It supports read and write limiting by tenant.
|
||||
|
||||
`vmgateway` needs a datasource for rate limit queries. It can be either single-node or cluster version of `victoria-metrics`.
|
||||
The metrics that you want to rate limit must be scraped from the cluster.
|
||||
|
||||
List of supported limit types:
|
||||
- `queries` - count of api requests made at tenant to read the api, such as `/api/v1/query`, `/api/v1/series` and others.
|
||||
- `active_series` - count of current active series at any given tenant.
|
||||
- `new_series` - count of created series; aka churn rate
|
||||
- `rows_inserted` - count of inserted rows per tenant.
|
||||
|
||||
List of supported time windows:
|
||||
- `minute`
|
||||
- `hour`
|
||||
|
||||
Limits can be specified per tenant or at a global level if you omit `project_id` and `account_id`.
|
||||
|
||||
Example of configuration file:
|
||||
|
||||
```yaml
|
||||
limits:
|
||||
- type: queries
|
||||
value: 1000
|
||||
resolution: minute
|
||||
- type: queries
|
||||
value: 10000
|
||||
resolution: hour
|
||||
- type: queries
|
||||
value: 10
|
||||
resolution: minute
|
||||
project_id: 5
|
||||
account_id: 1
|
||||
```
|
||||
|
||||
## QuickStart
|
||||
|
||||
cluster version of VictoriaMetrics is required for rate limiting.
|
||||
```bash
|
||||
# start datasource for cluster metrics
|
||||
|
||||
cat << EOF > cluster.yaml
|
||||
scrape_configs:
|
||||
- job_name: cluster
|
||||
scrape_interval: 5s
|
||||
static_configs:
|
||||
- targets: ['127.0.0.1:8481','127.0.0.1:8482','127.0.0.1:8480']
|
||||
EOF
|
||||
|
||||
./bin/victoria-metrics --promscrape.config cluster.yaml
|
||||
|
||||
# start cluster
|
||||
|
||||
# start vmstorage, vmselect and vminsert
|
||||
./bin/vmstorage -eula
|
||||
./bin/vmselect -eula -storageNode 127.0.0.1:8401
|
||||
./bin/vminsert -eula -storageNode 127.0.0.1:8400
|
||||
|
||||
# create base rate limitng config:
|
||||
cat << EOF > limit.yaml
|
||||
limits:
|
||||
- type: queries
|
||||
value: 100
|
||||
- type: rows_inserted
|
||||
value: 100000
|
||||
- type: new_series
|
||||
value: 1000
|
||||
- type: active_series
|
||||
value: 100000
|
||||
- type: queries
|
||||
value: 1
|
||||
account_id: 15
|
||||
EOF
|
||||
|
||||
# start gateway with clusterMoe
|
||||
./bin/vmgateway -eula -enable.rateLimit -ratelimit.config limit.yaml -datasource.url http://localhost:8428 -enable.auth -clusterMode -write.url=http://localhost:8480 --read.url=http://localhost:8481
|
||||
|
||||
# ingest simple metric to tenant 1:5
|
||||
curl 'http://localhost:8431/api/v1/import/prometheus' -X POST -d 'foo{bar="baz1"} 123' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2MjAxNjIwMDAwMDAsInZtX2FjY2VzcyI6eyJ0ZW5hbnRfaWQiOnsiYWNjb3VudF9pZCI6MTV9fX0.PB1_KXDKPUp-40pxOGk6lt_jt9Yq80PIMpWVJqSForQ'
|
||||
# read metric from tenant 1:5
|
||||
curl 'http://localhost:8431/api/v1/labels' -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2MjAxNjIwMDAwMDAsInZtX2FjY2VzcyI6eyJ0ZW5hbnRfaWQiOnsiYWNjb3VudF9pZCI6MTV9fX0.PB1_KXDKPUp-40pxOGk6lt_jt9Yq80PIMpWVJqSForQ'
|
||||
|
||||
# check rate limit
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The shortlist of configuration flags include the following:
|
||||
```console
|
||||
-clusterMode
|
||||
enable this for the cluster version
|
||||
-datasource.appendTypePrefix
|
||||
Whether to add type prefix to -datasource.url based on the query type. Set to true if sending different query types to the vmselect URL.
|
||||
-datasource.basicAuth.password string
|
||||
Optional basic auth password for -datasource.url
|
||||
-datasource.basicAuth.username string
|
||||
Optional basic auth username for -datasource.url
|
||||
-datasource.lookback duration
|
||||
Lookback defines how far into the past to look when evaluating queries. For example, if the datasource.lookback=5m then param "time" with value now()-5m will be added to every query.
|
||||
-datasource.maxIdleConnections int
|
||||
Defines the number of idle (keep-alive connections) to each configured datasource. Consider setting this value equal to the value: groups_total * group.concurrency. Too low a value may result in a high number of sockets in TIME_WAIT state. (default 100)
|
||||
-datasource.queryStep duration
|
||||
queryStep defines how far a value can fallback to when evaluating queries. For example, if datasource.queryStep=15s then param "step" with value "15s" will be added to every query.
|
||||
-datasource.tlsCAFile string
|
||||
Optional path to TLS CA file to use for verifying connections to -datasource.url. By default, system CA is used
|
||||
-datasource.tlsCertFile string
|
||||
Optional path to client-side TLS certificate file to use when connecting to -datasource.url
|
||||
-datasource.tlsInsecureSkipVerify
|
||||
Whether to skip tls verification when connecting to -datasource.url
|
||||
-datasource.tlsKeyFile string
|
||||
Optional path to client-side TLS certificate key to use when connecting to -datasource.url
|
||||
-datasource.tlsServerName string
|
||||
Optional TLS server name to use for connections to -datasource.url. By default, the server name from -datasource.url is used
|
||||
-datasource.url string
|
||||
VictoriaMetrics or vmselect url. Required parameter. E.g. http://127.0.0.1:8428
|
||||
-enable.auth
|
||||
enables auth with jwt token
|
||||
-enable.rateLimit
|
||||
enables rate limiter
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-eula
|
||||
By specifying this flag, you confirm that you have an enterprise license and accept the EULA https://victoriametrics.com/legal/eula/
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default, mmap() is used for 64-bit arches and pread() is used for 32-bit arches as they cannot read data files larger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address to listen for http connections (default ":8431")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It overrides httpAuth settings
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It overrides httpAuth settings
|
||||
-ratelimit.config string
|
||||
path for configuration file
|
||||
-ratelimit.extraLabels array
|
||||
additional labels, that will be applied to fetchdata from datasource
|
||||
Supports an array of values separated by comma or specified via multiple flags.
|
||||
-ratelimit.refreshInterval duration
|
||||
(default 5s)
|
||||
-read.url string
|
||||
read access url address, example: http://vmselect:8481
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
-write.url string
|
||||
write access url address, example: http://vminsert:8480
|
||||
|
||||
```
|
||||
|
||||
## TroubleShooting
|
||||
|
||||
* Access control:
|
||||
* incorrect `jwt` format, try https://jwt.io/#debugger-io with our tokens
|
||||
* expired token, check `exp` field.
|
||||
* Rate Limiting:
|
||||
* `scrape_interval` at datasource, reduce it to apply limits faster.
|
||||
|
||||
|
||||
## Limitations
|
||||
|
||||
* Access Control:
|
||||
* `jwt` token must be validated by external system, currently `vmgateway` can't validate the signature.
|
||||
* RateLimiting:
|
||||
* limits applied based on queries to `datasource.url`
|
||||
* only cluster version can be rate-limited.
|
||||
191
vmrestore.md
Normal file
@@ -0,0 +1,191 @@
|
||||
---
|
||||
sort: 7
|
||||
---
|
||||
|
||||
# vmrestore
|
||||
|
||||
`vmrestore` restores data from backups created by [vmbackup](https://docs.victoriametrics.com/vmbackup.html).
|
||||
VictoriaMetrics `v1.29.0` and newer versions must be used for working with the restored data.
|
||||
|
||||
Restore process can be interrupted at any time. It is automatically resumed from the interruption point
|
||||
when restarting `vmrestore` with the same args.
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
VictoriaMetrics must be stopped during the restore process.
|
||||
|
||||
```
|
||||
vmrestore -src=gs://<bucket>/<path/to/backup> -storageDataPath=<local/path/to/restore>
|
||||
|
||||
```
|
||||
|
||||
* `<bucket>` is [GCS bucket](https://cloud.google.com/storage/docs/creating-buckets) name.
|
||||
* `<path/to/backup>` is the path to backup made with [vmbackup](https://docs.victoriametrics.com/vmbackup.html) on GCS bucket.
|
||||
* `<local/path/to/restore>` is the path to folder where data will be restored. This folder must be passed
|
||||
to VictoriaMetrics in `-storageDataPath` command-line flag after the restore process is complete.
|
||||
|
||||
The original `-storageDataPath` directory may contain old files. They will be substituted by the files from backup,
|
||||
i.e. the end result would be similar to [rsync --delete](https://askubuntu.com/questions/476041/how-do-i-make-rsync-delete-files-that-have-been-deleted-from-the-source-folder).
|
||||
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
* If `vmrestore` eats all the network bandwidth, then set `-maxBytesPerSecond` to the desired value.
|
||||
* If `vmrestore` has been interrupted due to temporary error, then just restart it with the same args. It will resume the restore process.
|
||||
|
||||
|
||||
## Advanced usage
|
||||
|
||||
* Obtaining credentials from a file.
|
||||
|
||||
Add flag `-credsFilePath=/etc/credentials` with following content:
|
||||
|
||||
for s3 (aws, minio or other s3 compatible storages):
|
||||
```bash
|
||||
[default]
|
||||
aws_access_key_id=theaccesskey
|
||||
aws_secret_access_key=thesecretaccesskeyvalue
|
||||
```
|
||||
|
||||
for gce cloud storage:
|
||||
```json
|
||||
{
|
||||
"type": "service_account",
|
||||
"project_id": "project-id",
|
||||
"private_key_id": "key-id",
|
||||
"private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
|
||||
"client_email": "service-account-email",
|
||||
"client_id": "client-id",
|
||||
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
|
||||
"token_uri": "https://accounts.google.com/o/oauth2/token",
|
||||
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
|
||||
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
|
||||
}
|
||||
```
|
||||
|
||||
* Usage with s3 custom url endpoint. It is possible to use `vmrestore` with s3 api compatible storages, like minio, cloudian and other.
|
||||
You have to add custom url endpoint with a flag:
|
||||
```
|
||||
# for minio:
|
||||
-customS3Endpoint=http://localhost:9000
|
||||
|
||||
# for aws gov region
|
||||
-customS3Endpoint=https://s3-fips.us-gov-west-1.amazonaws.com
|
||||
```
|
||||
|
||||
* Run `vmrestore -help` in order to see all the available options:
|
||||
|
||||
```
|
||||
-concurrency int
|
||||
The number of concurrent workers. Higher concurrency may reduce restore duration (default 10)
|
||||
-configFilePath string
|
||||
Path to file with S3 configs. Configs are loaded from default location if not set.
|
||||
See https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
|
||||
-configProfile string
|
||||
Profile name for S3 configs. If no set, the value of the environment variable will be loaded (AWS_PROFILE or AWS_DEFAULT_PROFILE), or if both not set, DefaultSharedConfigProfile is used
|
||||
-credsFilePath string
|
||||
Path to file with GCS or S3 credentials. Credentials are loaded from default locations if not set.
|
||||
See https://cloud.google.com/iam/docs/creating-managing-service-account-keys and https://docs.aws.amazon.com/general/latest/gr/aws-security-credentials.html
|
||||
-customS3Endpoint string
|
||||
Custom S3 endpoint for use with S3-compatible storages (e.g. MinIO). S3 is used if not set
|
||||
-enableTCP6
|
||||
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP and UDP is used
|
||||
-envflag.enable
|
||||
Whether to enable reading flags from environment variables additionally to command line. Command line flag values have priority over values from environment vars. Flags are read only from command line if this flag isn't set. See https://docs.victoriametrics.com/#environment-variables for more details
|
||||
-envflag.prefix string
|
||||
Prefix for environment variables if -envflag.enable is set
|
||||
-fs.disableMmap
|
||||
Whether to use pread() instead of mmap() for reading data files. By default mmap() is used for 64-bit arches and pread() is used for 32-bit arches, since they cannot read data files bigger than 2^32 bytes in memory. mmap() is usually faster for reading small data chunks than pread()
|
||||
-http.connTimeout duration
|
||||
Incoming http connections are closed after the configured timeout. This may help to spread the incoming load among a cluster of services behind a load balancer. Please note that the real timeout may be bigger by up to 10% as a protection against the thundering herd problem (default 2m0s)
|
||||
-http.disableResponseCompression
|
||||
Disable compression of HTTP responses to save CPU resources. By default compression is enabled to save network bandwidth
|
||||
-http.idleConnTimeout duration
|
||||
Timeout for incoming idle http connections (default 1m0s)
|
||||
-http.maxGracefulShutdownDuration duration
|
||||
The maximum duration for a graceful shutdown of the HTTP server. A highly loaded server may require increased value for a graceful shutdown (default 7s)
|
||||
-http.pathPrefix string
|
||||
An optional prefix to add to all the paths handled by http server. For example, if '-http.pathPrefix=/foo/bar' is set, then all the http requests will be handled on '/foo/bar/*' paths. This may be useful for proxied requests. See https://www.robustperception.io/using-external-urls-and-proxies-with-prometheus
|
||||
-http.shutdownDelay duration
|
||||
Optional delay before http server shutdown. During this delay, the server returns non-OK responses from /health page, so load balancers can route new requests to other servers
|
||||
-httpAuth.password string
|
||||
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
|
||||
-httpAuth.username string
|
||||
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
|
||||
-httpListenAddr string
|
||||
TCP address for exporting metrics at /metrics page (default ":8421")
|
||||
-loggerDisableTimestamps
|
||||
Whether to disable writing timestamps in logs
|
||||
-loggerErrorsPerSecondLimit int
|
||||
Per-second limit on the number of ERROR messages. If more than the given number of errors are emitted per second, the remaining errors are suppressed. Zero values disable the rate limit
|
||||
-loggerFormat string
|
||||
Format for logs. Possible values: default, json (default "default")
|
||||
-loggerLevel string
|
||||
Minimum level of errors to log. Possible values: INFO, WARN, ERROR, FATAL, PANIC (default "INFO")
|
||||
-loggerOutput string
|
||||
Output for the logs. Supported values: stderr, stdout (default "stderr")
|
||||
-loggerTimezone string
|
||||
Timezone to use for timestamps in logs. Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local (default "UTC")
|
||||
-loggerWarnsPerSecondLimit int
|
||||
Per-second limit on the number of WARN messages. If more than the given number of warns are emitted per second, then the remaining warns are suppressed. Zero values disable the rate limit
|
||||
-maxBytesPerSecond size
|
||||
The maximum download speed. There is no limit if it is set to 0
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedBytes size
|
||||
Allowed size of system memory VictoriaMetrics caches may occupy. This option overrides -memory.allowedPercent if set to a non-zero value. Too low a value may increase the cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache resulting in higher disk IO usage
|
||||
Supports the following optional suffixes for size values: KB, MB, GB, KiB, MiB, GiB (default 0)
|
||||
-memory.allowedPercent float
|
||||
Allowed percent of system memory VictoriaMetrics caches may occupy. See also -memory.allowedBytes. Too low a value may increase cache miss rate usually resulting in higher CPU and disk IO usage. Too high a value may evict too much data from OS page cache which will result in higher disk IO usage (default 60)
|
||||
-metricsAuthKey string
|
||||
Auth key for /metrics. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-pprofAuthKey string
|
||||
Auth key for /debug/pprof. It must be passed via authKey query arg. It overrides httpAuth.* settings
|
||||
-s3ForcePathStyle
|
||||
Prefixing endpoint with bucket name when set false, true by default. (default true)
|
||||
-skipBackupCompleteCheck
|
||||
Whether to skip checking for 'backup complete' file in -src. This may be useful for restoring from old backups, which were created without 'backup complete' file
|
||||
-src string
|
||||
Source path with backup on the remote storage. Example: gs://bucket/path/to/backup/dir, s3://bucket/path/to/backup/dir or fs:///path/to/local/backup/dir
|
||||
-storageDataPath string
|
||||
Destination path where backup must be restored. VictoriaMetrics must be stopped when restoring from backup. -storageDataPath dir can be non-empty. In this case the contents of -storageDataPath dir is synchronized with -src contents, i.e. it works like 'rsync --delete' (default "victoria-metrics-data")
|
||||
-tls
|
||||
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and -tlsKeyFile must be set if -tls is set
|
||||
-tlsCertFile string
|
||||
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs as RSA certs are slower
|
||||
-tlsKeyFile string
|
||||
Path to file with TLS key. Used only if -tls is set
|
||||
-version
|
||||
Show VictoriaMetrics version
|
||||
```
|
||||
|
||||
|
||||
## How to build from sources
|
||||
|
||||
It is recommended using [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) - see `vmutils-*` archives there.
|
||||
|
||||
|
||||
### Development build
|
||||
|
||||
1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.17.
|
||||
2. Run `make vmrestore` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmrestore` binary and puts it into the `bin` folder.
|
||||
|
||||
### Production build
|
||||
|
||||
1. [Install docker](https://docs.docker.com/install/).
|
||||
2. Run `make vmrestore-prod` from the root folder of [the repository](https://github.com/VictoriaMetrics/VictoriaMetrics).
|
||||
It builds `vmrestore-prod` binary and puts it into the `bin` folder.
|
||||
|
||||
### Building docker images
|
||||
|
||||
Run `make package-vmrestore`. It builds `victoriametrics/vmrestore:<PKG_TAG>` docker image locally.
|
||||
`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
|
||||
The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-vmrestore`.
|
||||
|
||||
The base docker image is [alpine](https://hub.docker.com/_/alpine) but it is possible to use any other base image
|
||||
by setting it via `<ROOT_IMAGE>` environment variable. For example, the following command builds the image on top of [scratch](https://hub.docker.com/_/scratch) image:
|
||||
|
||||
```bash
|
||||
ROOT_IMAGE=scratch make package-vmrestore
|
||||
```
|
||||