Compare commits

...

336 Commits

Author SHA1 Message Date
Alexander Marshalov
881dffd6a2 [metricsql] for subqueries fixed behavior of functions that use query time range in their logic (like start, end, and running_*), from now on when using in subqueries they respect time range from query parameters (#5794) 2024-02-22 04:19:32 +01:00
Artem Navoiev
7332431b90 docs: change header from h1 to h2 1.97.2. The markdown requires the proper structure in hierachy of title so h1 can not be a child of h1,h2... only be a separate item, in our structure title is the parent h1
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-21 16:15:41 +01:00
Github Actions
2c98e8712c Automatic update operator docs from VictoriaMetrics/operator@d88157b (#5845) 2024-02-21 16:13:34 +01:00
Github Actions
c35c1b007c Automatic update operator docs from VictoriaMetrics/operator@c393852 (#5844) 2024-02-21 13:28:05 +01:00
Github Actions
0af5f88744 Automatic update operator docs from VictoriaMetrics/operator@4791fd1 (#5843) 2024-02-21 12:52:32 +01:00
hagen1778
7ee131de8b deployment/docker: add comments to components in docker-compose manifests
This should help readers to understand interconnectivity between components.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 18:31:45 +01:00
Dan Dascalescu
17cf031fa1 app/vmselect: simplify wording for too many samples error (#5827) 2024-02-20 16:26:38 +01:00
Roman Khavronenko
bb1279bfc4 vmctl : Provide TLS config options for Open TSDB datasource #5797 (#5832)
Originally implemented here https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5797

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: khushijain21 <khushij393@gmail.com>
2024-02-20 16:22:58 +01:00
Daria Karavaieva
4034d081f4 Vmanomaly Quickstart Fix absolute links (#5831)
* links fix

* typo fix
2024-02-20 16:19:18 +01:00
Daria Karavaieva
b2baf7d472 Vmanomaly QuickStart (#5800)
* first edit

* typo 1

* typo 2

* fixes 3

* fixes 4

* fixes 5

* fixes, cross links

* v1.10 config

* models why, self-monitoring fix

* config next steps

* fixes

* minor fix
2024-02-20 15:20:57 +01:00
hagen1778
dc25c30fdc docs: move recent changes to Tip
These changes were mistakenly put to existing release

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 14:58:15 +01:00
igorbernstein
cc5a274e4d deployment/docker: clean up loading of victoriametrics-datasource (#5793)
Currently the docker-compose examples for loading `victoriametrics-datasource` uses 2 environment variables:
-  `GF_ALLOW_LOADING_UNSIGNED_PLUGINS`
- `GF_DEFAULT_APP_MODE`

I believe both of the env vars are trying to achieve the same thing. `GF_DEFAULT_APP_MODE` disables code signing for all plugins and `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` intends to disable code signing for just `victoriametrics-datasource`.
Keeping the scope narrowed to just `victoriametrics-datasource` would be preferable in this case.

Unfortunately `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` is misspelled. According to [grafana docs](https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#override-configuration-with-environment-variables), the format is supposed to be `GF_<SectionName>_<KeyName>`. In other words the current env var is missing the section name.

This PR proposes to:
1. fix the typo
2. remove the global disablement of code signing

Alternatively, if you prefer to keep codesigning disabled globally, please remove `GF_ALLOW_LOADING_UNSIGNED_PLUGINS` env var as it confuses things
2024-02-20 14:31:15 +01:00
hagen1778
e2dad3a2ac app/vmalert: consistently sort groups by name and filename on /groups page
This should prevent non-deterministic sorting for groups with identical names.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 13:50:57 +01:00
hagen1778
11b03d9fc8 app/vmalert: follow-up after b60dcbe11f
* support case-insensitive search
* reflect search condition in URL, so link can be sharable
* support filtering on /alerts page
* fix collapseAll/expandAll logic to respect only shown entries
* add changelog

b60dcbe11f
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-20 13:07:05 +01:00
Victor Amorim dos Santos
b60dcbe11f vmalert: add filter by group or rule name to UI (#5791)
Co-authored-by: Yury Molodov <yurymolodov@gmail.com>
2024-02-20 12:31:41 +01:00
Yury Molodov
524c0a2e07 vmui: update package-lock.json (#5822)
This should address detected security vulnerabilities
2024-02-20 10:03:33 +01:00
Artem Navoiev
5b652bccad docs: mention slack inviter and slack channel (#5817)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-19 17:18:31 +01:00
Aliaksandr Valialkin
7c2c987ff9 docs/VictoriaLogs/CHANGELOG.md: document cafd6f08b3
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5400
2024-02-18 23:17:59 +02:00
Aliaksandr Valialkin
e525b98fbf docs/VictoriaLogs/CHANGELOG.md: document 333bda8702
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5447
2024-02-18 23:13:15 +02:00
Aliaksandr Valialkin
0514091948 app/vlselect: follow-up for 451d2abf50
- Consistently return the first `limit` log entries if the total size of found log entries doesn't exceed 1Mb.
  See app/vlselect/logsql/sort_writer.go . Previously random log entries could be returned with each request.
- Document the change at docs/VictoriaLogs/CHANGELOG.md
- Document the `limit` query arg at docs/VictoriaLogs/querying/README.md
- Make the change less intrusive.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5674
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5778
2024-02-18 23:05:51 +02:00
Dmytro Kozlov
451d2abf50 Enable the limit query param for the /select/logsql/query (#5778)
* app/vlselect: add limit for logs query

* app/vlselect: CHANGELOG.md

* app/vlselect: stop search process if limit is reached, update logic, remove default limit

* app/vlselect: fix tests

* app/vlselect: fix filter tests

* app/vlselect: fix tests
2024-02-18 22:58:47 +02:00
Aliaksandr Valialkin
c42ddce159 lib/promscrape: add support for enable_compression option in the same way as Prometheus does
Updates https://github.com/prometheus/prometheus/pull/13166
Updates https://github.com/prometheus/prometheus/issues/12319

Do not document enable_compression option at docs/sd_configs.md, since vmagent already supports
more clear disable_compression option - see https://docs.victoriametrics.com/vmagent/#scrape_config-enhancements
2024-02-18 19:40:39 +02:00
Aliaksandr Valialkin
5a092e161c lib/promscrape/discovery/kuma: add support for client_id option
See https://github.com/prometheus/prometheus/pull/13278
2024-02-18 19:19:40 +02:00
Aliaksandr Valialkin
2e30842582 vendor: update github.com/VictoriaMetrics/metricsql from v0.72.1 to v0.73.0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5383
2024-02-18 18:41:41 +02:00
Aliaksandr Valialkin
149d83e596 docs/MetricsQL.md: properly document how MetricsQL selects the lookbehind window
- rate(m) isn't equivalent to rate(m[1i]) when step is smaller than the interval between samples.
- default_rollup(m) isn't equivalent to default_rollup(m[1i]) when step is smaller than the interval between samples.

These changes have been made in v1.85.3 as a part of https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3483 .
See the corresponding commit - 9fa3f1dc57 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5816
2024-02-18 13:45:21 +02:00
Aliaksandr Valialkin
6a6ea89da5 vendor: update github.com/VictoriaMetrics/metrics from v1.31.0 to v1.32.0 2024-02-18 12:42:22 +02:00
Aliaksandr Valialkin
dfbb6e0826 docs/LTS-releases.md: cosmetic fixes 2024-02-17 18:09:36 +02:00
Aliaksandr Valialkin
d03719e72d docs/CHANGELOG.md: document f8207e33a2 2024-02-17 17:52:53 +02:00
Aliaksandr Valialkin
cee901cdf4 docs/CHANGELOG.md: add missing for in the description of the TLS configuration features for vmctl
This is a follow-up for 6a07cb1bdb and f973711e56
2024-02-17 17:44:22 +02:00
Aliaksandr Valialkin
c68bcddd13 docs/Single-server-VictoriaMetrics.md: enumerate all the VictoriaMetrics components 2024-02-17 17:33:56 +02:00
Aliaksandr Valialkin
3ed7d62627 docs/LTS-releases.md: add a dedicated page describing LTS lines of releases for VictoriaMetrics 2024-02-17 17:33:55 +02:00
Alexander Marshalov
f8207e33a2 lib/httputils: fixed error message for getting zero duration (#5795) (#5812) 2024-02-16 15:24:59 +01:00
hagen1778
f973711e56 app/vmctl: follow-up after 0c293a66ec
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 15:22:44 +01:00
Khushi Jain
0c293a66ec app/vmctl : support TLS config options for remote read mode (#5798) 2024-02-16 15:12:43 +01:00
hagen1778
6a07cb1bdb app/vmctl: follow-up after 7cd1b7d047
* cleanup code
* update docs

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 15:08:51 +01:00
Khushi Jain
7cd1b7d047 app/vmctl : support TLS config options for InfluxDB datasource (#5783)
* vmctl: TLS flags for influx DB

* added httputils function

* Add changelog and doc

---------

Co-authored-by: Khushi Jain <khushi.jain@nokia.com>
2024-02-16 14:59:18 +01:00
hagen1778
ecccd2a1cc dashboards: add legend details to network panels in cluster dash
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-16 10:20:38 +01:00
Fred Navruzov
172e196ac9 docs: vmanomaly - updates of v1.10.0 and model type section (#5813)
* - apply v1.10 changes
- chapter on model types (uni/multivariate and rolling)

* - update self-monitoring labels description
- fix typos

* fix duplicated text and link rendering
2024-02-16 11:02:41 +02:00
hagen1778
3170ad3f44 docs: update formatting for usage examples
- Use `sh` format for examples
- Reduce length of progress bars

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-15 14:55:42 +01:00
Aliaksandr Valialkin
6b9bedd0f9 app/vmstorage: expose vm_last_partition_parts metrics, which may help identifying performance issues related to the increased number of parts in the last partition 2024-02-15 14:51:19 +02:00
Aliaksandr Valialkin
eb1505ba14 lib/uint64set: go fmt after c0a9b87f46 2024-02-15 14:50:44 +02:00
Aliaksandr Valialkin
c0a9b87f46 lib/mergeset: optimize Set.AddMulti() a bit for len(items) < 10000
This should improve the search speed for time series matching the given label filters
2024-02-15 14:30:15 +02:00
Aliaksandr Valialkin
cdac04997a lib/uint64set: benchmark AddMulti on small number of items, since this case is the most frequent in lib/storage 2024-02-15 14:28:36 +02:00
Aliaksandr Valialkin
926854b0f3 docs/CHANGELOG.md: document v1.93.12 LTS release 2024-02-14 20:17:44 +02:00
Aliaksandr Valialkin
39cba7e4fa docs/CHANGELOG.md: document v1.97.2 LTS release 2024-02-14 18:50:50 +02:00
Aliaksandr Valialkin
08e6100050 all: update Docker image tag for VictoriaMetrics components from v1.97.1 to v1.98.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.98.0
2024-02-14 17:14:41 +02:00
Aliaksandr Valialkin
baaa88001e lib/promrelabel: store the original labels before returning them them to promutils.PutLabels()
This should reduce memory allocations.

This is a follow-up for b09bd6c42a

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2024-02-14 16:09:10 +02:00
Aliaksandr Valialkin
e4ad41b5ff docs/CHANGELOG.md: cut v1.98.0 release 2024-02-14 16:00:50 +02:00
Aliaksandr Valialkin
9c9021fd8b vendor: run make vendor-update 2024-02-14 15:44:03 +02:00
Aliaksandr Valialkin
b09bd6c42a lib/promrelabel: factor out applyInternal code into ApplyDebug and Apply functions
This improves readability and maintanability

Also remove memory allocation from SortLabels()
2024-02-14 14:27:08 +02:00
Aliaksandr Valialkin
b564729d75 lib/promscrape: avoid copying labels when -promscrape.dropOriginalLabels command-line flag is set
This should save some CPU

This regression has been introduced in 487f6380d0
when working on https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5389
2024-02-14 03:25:36 +02:00
Aliaksandr Valialkin
4954eee187 docs/CHANGELOG.md: various typo cleanups 2024-02-14 02:45:17 +02:00
Aliaksandr Valialkin
53643b620a app/vmselect/vmui: run make vmui-update after 1c9f13d6c7 2024-02-14 02:35:55 +02:00
Yury Molodov
1c9f13d6c7 vmui: improve the context for autocomplete #5736 #5737 #5739 (#5804)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-14 00:22:51 +00:00
Aliaksandr Valialkin
d6e22f2888 app/vmselect: add sum_eq_over_time, sum_gt_over_time and sum_le_over_time functions to MetricsQL
See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4641
2024-02-13 23:40:07 +02:00
Nikolay
88329d84ca app/vmauth: properly release memory during config reload (#5805)
* app/vmauth: properly release memory during config reload
previously metrics package hold a refrence for channels for users concurrent requests.
it case of churn at `name`  field of users configuration, new metric was created. But previous one wasn't deleted.
It prevented full parsed configuration from being garbace collected.

now all config related metrics are bound to corresponding metrics.Set and unregistered during config reload process.

It also must fix an issue with incorrect values for current concurrent user requests

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4690

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-13 18:49:17 +00:00
Aliaksandr Valialkin
3dddf700f1 docs: add link to https://docs.victoriametrics.com/scrape_config_examples/ to docs about configuring target scraping at vmagent and single-node VictoriaMetrics 2024-02-13 20:47:43 +02:00
Aliaksandr Valialkin
4281be4c19 docs/vmbackupmanager.md: mention -license command-line flag instead of deprecated -eula 2024-02-13 20:41:26 +02:00
Aliaksandr Valialkin
f120477231 docs/vmauth.md: add Config reload chapter, which explains how to reload -auth.config at vmauth
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1194
2024-02-13 20:29:35 +02:00
Aliaksandr Valialkin
582681ce58 .github/workflows: update actions/cache from v3 to v4
See https://github.com/actions/cache?tab=readme-ov-file#v4
2024-02-13 19:36:21 +02:00
Aliaksandr Valialkin
dc7256b304 docs/enterprise.md: remove the mention of deprecated -eula command-line flag 2024-02-13 18:46:56 +02:00
Aliaksandr Valialkin
2f3091460f vendor: update github.com/VictoriaMetrics/metricsql from v0.70.1 to v0.71.0
This adds an ability to propagate label filters across label_set() and alias() functions.

This should help https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1827#issuecomment-1654095358
2024-02-13 06:36:59 +02:00
Aliaksandr Valialkin
e963d6c789 app/vmagent/remotewrite: add -remoteWrite.tlsHandshakeTimeout command-line flag for tuning tls handshake timeout to -remoteWrite.url
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1699
2024-02-13 02:46:33 +02:00
Aliaksandr Valialkin
cafdc7e21a docs/vmauth.md: add missing dot 2024-02-13 01:09:04 +02:00
Aliaksandr Valialkin
062cbb1130 app/vmauth: add support for mTLS-based routing of incoming requests to different backends depending on the subject field in the TLS certificate provided by the user
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1547
2024-02-13 01:03:20 +02:00
Aliaksandr Valialkin
94252e1794 docs/keyConcepts.md: do not duplicate the list of supported data ingestion protocols - just refer to the original list at https://docs.victoriametrics.com/#how-to-import-time-series-data
This should simplify keeping docs in sync
2024-02-12 22:54:49 +02:00
Aliaksandr Valialkin
f0a62de3a9 Makefile: run go mod tidy with -compat=1.22 after 95222b2079 2024-02-12 22:36:23 +02:00
Aliaksandr Valialkin
c5f9d9f0d6 vendor: run make vendor-update 2024-02-12 22:31:30 +02:00
Aliaksandr Valialkin
b92c9a045d docs/vmbackup.md: remove the unneeded -storageDataPath command-line from the example for making server-side copy of the backup 2024-02-12 22:24:34 +02:00
Aliaksandr Valialkin
95222b2079 all: upgrade Go builder from Go1.21.7 to Go1.22.0
See https://go.dev/doc/go1.22
2024-02-12 21:59:51 +02:00
Aliaksandr Valialkin
a49a50701a lib/mergeset: do not panic on too long items passed to Table.AddItems()
Instead, log a sample of these long items once per 5 seconds into error log,
so users could notice and fix the issue with too long labels or too many labels.

Previously this panic could occur in production when ingesting samples with too long labels.
2024-02-12 19:32:18 +02:00
Aliaksandr Valialkin
5d69ba630e lib/mergeset: properly record the firstItem in metaindexRow at blockStreamWriter.WriteBlock
The 3c246cdf00 added an optimization where the previous metaindexRow
could be saved to disk when the current block header couldn't be added indexBlock because the resulting
indexBlock size became too big. This could result in an empty metaindexRow.firstItem for the next metaindexRow.
2024-02-12 18:06:50 +02:00
Aliaksandr Valialkin
2eb967231b lib/storage: do not append headerData to bsw.indexData if its size exceeds maxBlockSize
This is a follow-up optimization after 3c246cdf00
2024-02-12 18:04:37 +02:00
Artem Navoiev
7eb5c187b3 docs vmbackupmanager update flags
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-12 16:36:01 +01:00
Aliaksandr Valialkin
4a27bf41af docs/enterprise.md: remove duplicate Enterprise word in the same sentence 2024-02-12 11:00:35 +02:00
Aliaksandr Valialkin
4258f2e261 docs/Single-server-VictoriaMetrics.md: substitute duplicate cases studies list with the link to the original list 2024-02-12 10:59:31 +02:00
Aliaksandr Valialkin
e0890a84e0 docs: remove misleading either from the description of or label filters 2024-02-12 10:56:04 +02:00
Roman Khavronenko
8850c7431d app/vmalert: support filtering for /api/v1/rule like Prometheus does (#5787)
Follow-up after 62e5e2a4c8

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-09 14:35:31 +01:00
Github Actions
a379b2c016 Automatic update operator docs from VictoriaMetrics/operator@e261c37 (#5788) 2024-02-09 11:17:12 +01:00
Victor Amorim dos Santos
62e5e2a4c8 app/vmalert: support type param for filtering /api/v1/rules response by rule type (#5749)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2024-02-09 09:02:35 +01:00
Aliaksandr Valialkin
64723e591e docs: update docs after ae8a867924
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1470
2024-02-09 04:18:53 +02:00
Aliaksandr Valialkin
39e0007e14 lib/snapshot: move Time, Validate and NewName into lib/snapshot/snapshotutil package
This allows removing importing unneeded command-line flags into binaries, which import lib/storage,
which, in turn, was importing lib/snapshot in order to use Time, Validate and NewName functions.

This is a follow-up for 83e55456e2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5738
2024-02-09 04:18:45 +02:00
Aliaksandr Valialkin
ae8a867924 all: add support for specifying multiple -httpListenAddr options 2024-02-09 03:15:04 +02:00
Aliaksandr Valialkin
b161e889b5 docs/CHANGELOG.md: typo fixes 2024-02-08 21:18:45 +02:00
Aliaksandr Valialkin
d8c1db7953 lib/httpserver: do not close client connections every 2 minutes by default
Closing client connections every 2 minutes doesn't help load balancing -
this just leads to "jumpy" connections between multiple backend servers,
e.g. the load isn't spread evenly among backend servers, and instead jumps
between the servers every 2 minutes.

It is still possible periodically closing client connections by specifying non-zero -http.connTimeout command-line flag.

This should help with https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1304#issuecomment-1636997037

This is a follow-up for d387da142e
2024-02-08 21:10:25 +02:00
Aliaksandr Valialkin
ee745ab900 docs/{vmagent,vmalert}: mention that /-/reload endpoint may be protected with -reloadAuthKey command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1172
2024-02-08 18:49:06 +02:00
Aliaksandr Valialkin
0511d5c3a0 docs/Cluster-VictoriaMetrics.md: document that /api/v1/query?series_lector[d] returns raw samples
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1148
2024-02-08 18:32:28 +02:00
Aliaksandr Valialkin
b00a9132bb docs: sync with enterprise branch 2024-02-08 17:25:19 +02:00
Aliaksandr Valialkin
da4b30e8e5 docs: update -help output after 83e55456e2 2024-02-08 17:11:45 +02:00
Artem Navoiev
c300a636d6 docs: change ndjson links to https://jsonlines.org/ as original one was hacked (#5782)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-08 16:06:31 +01:00
hagen1778
76a4351012 Merge remote-tracking branch 'origin/master' into master 2024-02-08 16:04:22 +01:00
hagen1778
2a89a9e67b docs: sync flags description for vmbackup and vmbackupmanager
Sync description after changes in 83e55456e2

Remove extra text in flags description for vmbackupmanager as it breaks layout
when rendered at docs.victoriametrics.com

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 16:04:13 +01:00
Aliaksandr Valialkin
b718e555a6 docs/CHANGELOG.md: add a link to docs describing -disableReroutingOnUnavailable command-line flag 2024-02-08 17:03:19 +02:00
hagen1778
e1926f286b docs: follow-up after 83e55456e2
83e55456e2
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 15:55:57 +01:00
Khushi Jain
83e55456e2 app/vmbackup: support client-side TLS configuration for create/delete snapshot API (#5738) 2024-02-08 15:52:00 +01:00
Aliaksandr Valialkin
b135504e46 docs/vmagent.md: add debugging scrape targets chapter 2024-02-08 16:32:19 +02:00
hagen1778
8771e44d23 deployment/docker: update README with ToC
The change also moves commands for starting/stopping env to corresponding
sections.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 15:11:49 +01:00
Aliaksandr Valialkin
a354924b0d app/victoria-metrics: properly send staleness markers on victoriametrics shutdown if -selfScrapeInterval > 0
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/943
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1526
2024-02-08 15:29:19 +02:00
Aliaksandr Valialkin
aea8feee1a docs: update -help output after 61d9df4c36
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:50:49 +02:00
Aliaksandr Valialkin
61d9df4c36 app/vmselect: add ability to reset rollup result cache on startup by passing -search.resetRollupResultCacheOnStartup command-line flag
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/834
2024-02-08 14:40:40 +02:00
Aliaksandr Valialkin
0f5176380b lib/mergeset: add a test for too long item passed to Table.AddItems() 2024-02-08 14:12:56 +02:00
Aliaksandr Valialkin
5817e4c0c1 lib/mergeset: typo fix: indexdb/indexBlock -> indexdb/indexBlocks 2024-02-08 13:53:18 +02:00
Aliaksandr Valialkin
3c246cdf00 lib/{storage,mergeset}: do not create index blocks with sizes exceeding 64Kb in common case
This should reduce memory fragmentation and memory usage for indexdb/indexBlocks and storage/indexBlocks caches
2024-02-08 13:52:24 +02:00
Aliaksandr Valialkin
93ada2eaaf lib/mergeset: verify that the index block for in-memory part doesnt exceed the 3*maxIndexBlockSize 2024-02-08 13:50:14 +02:00
Aliaksandr Valialkin
077f84964a lib/mergeset: do not store commonPrefix in blockHeader if the block contains only a single item
There is no sense in storing commonPrefix for blockHeader containing only a single item,
since this only increases blockHeader size without any benefits.
2024-02-08 13:47:24 +02:00
Aliaksandr Valialkin
f7b68b466c docs/CHANGELOG.md: clarify the bugfix description for e1bf8440eb 2024-02-08 13:01:19 +02:00
Aliaksandr Valialkin
e1bf8440eb lib/mergeset: prevent from possible too big indexBlockSize panic
This panic could occur when samples with too long label values are ingested into VictoriaMetrics.
This could result in too long fistItem and commonPrefix values at blockHeader (up to 64kb each).
This may inflate the maximum index block size by 4 * maxIndexBlockSize.
2024-02-08 12:54:10 +02:00
hagen1778
3380043424 dashboards: follow-up 4369bc1df2
* add more details to changelog
* simplify panels description
* remove capacity planning recommendation, as it proves it incompetent

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-08 09:51:43 +01:00
Hui Wang
4369bc1df2 deployment/dashboards: fix Storage full ETA panels (#5747)
During background downsampling, rate(vm_deduplicated_samples_total{type="merge"}) could be much bigger than 
rate(vm_rows_added_to_storage_total) and it could last quite some time,
 which causes negative values of Storage full ETA and confuses users, see playground.

Instead of trying to get more accurate results during downsampling, I think it's ok to ignore 
vm_deduplicated_samples_total at all, it's more reasonable to see Storage full ETA increase after downsampling.

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-08 09:43:39 +01:00
Aliaksandr Valialkin
0cf56c1ba5 app/vmselect/promql: properly handle precision errors in rollup functions
changes(), increases_over_time() and resets() shouldn't take into account
value changes, which may occur because of precision errors.

The maximum guaranteed precision for raw samples stored in VictoriaMetrics is 12 decimal digits.
So do not count relative changes for values if they are smaller than 1e-12 comparing to the value.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767
2024-02-08 02:30:57 +02:00
Aliaksandr Valialkin
5458a4bcfc Makefile: add VictoriaMetrics Windows build into crossbuild 2024-02-07 22:27:05 +02:00
Aliaksandr Valialkin
19c1066a25 docs/CHANGELOG.md: properly document the change at b74006e2ca
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774
2024-02-07 22:05:12 +02:00
Nihal
b74006e2ca [vmsingle/vminsert]: change http success response code to 200 for -/reload request handler (#5776)
* change vmsingle's response code to 200 for reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* change vmsingle's response code to 200 for the reload request handler. See: https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5774

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-02-07 20:00:04 +00:00
Github Actions
7f7c118d26 Automatic update operator docs from VictoriaMetrics/operator@38dfad5 (#5772) 2024-02-07 19:54:52 +00:00
Aliaksandr Valialkin
bf9ea249a3 lib/protoparser/datadogsketches: use math.RoundToEven() for calculating the rank
The original code uses this function - see 48d52eeea6/pkg/quantile/sparse.go (L138)

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775
2024-02-07 21:43:26 +02:00
Aliaksandr Valialkin
093798375e lib/protoparser/datadogsketches: add more permalinks to the original source code
These permalinks should help verifying the correctness of the code

This is a follow-up after 07213f4e0c

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5775
2024-02-07 21:33:20 +02:00
Andrii Chubatiuk
07213f4e0c added ddsketch permalink (#5775)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
2024-02-07 19:01:09 +00:00
Aliaksandr Valialkin
e78e5ccfaa docs/CHANGELOG.md: support empty command-line flag values in short array notation
For example, -fooDuration=',10s,' is now supported - it sets three command-line flag values:

- the first and the last one are set to the default value for `-fooDuration`
- the second one is set to 10s
2024-02-07 20:53:13 +02:00
Aliaksandr Valialkin
05f0b707d1 docs/relabeling.md: add examples on how to rename scraped metrics and how to add/change labels in scraped metrics 2024-02-07 20:28:55 +02:00
Aliaksandr Valialkin
b431ccea5b all: update Go builder from Go1.21.6 to Go1.21.7
See https://github.com/golang/go/issues?q=milestone%3AGo1.21.7+label%3ACherryPickApproved
2024-02-07 04:00:37 +02:00
Aliaksandr Valialkin
d8f5be2290 docs/scrape_config_examples.md: add an example for scraping cadvisor in Kubernetes 2024-02-07 03:23:08 +02:00
Aliaksandr Valialkin
6a7c7ae391 app/{vmselect,vlselect}/vmui: run make vmui-update vmui-logs-update after the recent changes to app/vmui
This is a follow-up for the following commits:

- dcbdbc760e
- a81ccbd749
- 65b8002aeb
2024-02-07 01:48:46 +02:00
Aliaksandr Valialkin
b8b0d22d2d docs/Cluster-VictoriaMetrics.md: document the /datadog/api/beta/sketches endpoint
This is a follow-up for a1d1ccd6f2

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584
2024-02-07 01:35:32 +02:00
Aliaksandr Valialkin
541b644d3d app/{vmagent,vminsert}: follow-up after a1d1ccd6f2
- Document the change at docs/CHANGELOG.md
- Copy changes from docs/Single-server-VictoriaMetrics.md to README.md
- Add missing handler for processing multitenant requests ( https://docs.victoriametrics.com/vmagent/#multitenancy )
- Substitute github.com/stretchr/testify dependency with 3 lines of code in the added tests
- Comment unclear code at lib/protoparser/datadogsketches/parser.go , so @AndrewChubatiuk could update it
  and add permalinks to the original source code there.
- Various code cleanups

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5584
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3091
2024-02-07 01:28:05 +02:00
Daria Karavaieva
4a31bd9661 Vmanomaly Guide - dashboard and query change (#5771)
* dashboard fix

* query fix

* changed screenshots

* minor fixes
2024-02-06 22:40:40 +01:00
hagen1778
eaa2125f2c docs/vmalert: mention step param when comapring rule results to raw queries
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-06 22:32:46 +01:00
Andrii Chubatiuk
a1d1ccd6f2 support datadog /api/beta/sketches API (#5584)
Co-authored-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:58:11 +00:00
Yury Molodov
dcbdbc760e vmui: improve select component functionality (#5755)
* vmui: fix select closing on click outside (#5728)

* vmui: clear entered text in select after selecting a value (#5727)

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:50:04 +00:00
Yury Molodov
a81ccbd749 vmui: fix handling invalid timezone (#5758)
* vmui: fix handling invalid timezone (#5732)

* vmui: switch browser timezone flag to isValid

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:47:30 +00:00
Aliaksandr Valialkin
9e8e4cca6a lib/storage: move fixupTimestamps() call to Block.Init()
This is a follow-up for 0bf7921721
2024-02-06 22:43:48 +02:00
Zakhar Bessarab
0bf7921721 lib/storage/raw_row: properly initialize TS for tmp blocks (#5762)
Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
2024-02-06 20:42:32 +00:00
Yury Molodov
65b8002aeb vmui: fix graph dragging (#5769)
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-06 20:32:03 +00:00
Aliaksandr Valialkin
61524ad87b vendor: update github.com/VictoriaMetrics/metricsql from v0.70.0 to v0.70.1
This should help with the https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5604
2024-02-06 21:52:35 +02:00
Aliaksandr Valialkin
e159cc30df lib/fs: lazily open the file at ReaderAt on the first access
This should significantly reduce the number of open ReaderAt files
on VictoriaMetrics and VictoriaLogs startup.

The open files can be tracked via vm_fs_readers metric
2024-02-06 20:42:57 +02:00
Aliaksandr Valialkin
28f9fe5f65 docs/scrape_config_examples.md: document that full urls can be used as in the targets list 2024-02-06 19:49:15 +02:00
Aliaksandr Valialkin
28b737dc57 docs/scrape_config_examples.md: add missing to specify phrase 2024-02-06 19:43:23 +02:00
Aliaksandr Valialkin
f4f1caea2a docs/CHANGELOG.md: add link to mTLS feature request 2024-02-06 19:32:40 +02:00
Aliaksandr Valialkin
7bc3af1224 lib/httpserver: add support for mTLS for requests to -httpListenAddr 2024-02-06 17:46:19 +02:00
Aliaksandr Valialkin
c1e50848c5 docs/CHANGELOG.md: move descriptions for recently added features from v1.97.1 to tip 2024-02-06 16:31:18 +02:00
Aliaksandr Valialkin
1bed3a1da7 docs/scrape_config_examples.md: add examples for typical scrape_config usage 2024-02-06 15:58:16 +02:00
Aliaksandr Valialkin
075d3fb4bf docs/Cluster-VictoriaMetrics.md: add a warning that -disableReroutingOnUnavailable command-line flag may result in long pause for data ingestion when some of vmstorage nodes are unavailable for long time 2024-02-06 15:58:15 +02:00
Aliaksandr Valialkin
6bdedc4443 docs/enterprise.md: prioritize enterprise technical support over other enterprise features 2024-02-06 15:58:15 +02:00
hagen1778
3cab5a16a9 docs/vmalert: update limit description
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-06 10:49:56 +01:00
hagen1778
54ac16c883 docs/vmalert: mention limit option in group params
This param was supported for long time but was missing in the docs.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-06 10:47:19 +01:00
Fred Navruzov
f0e51dd01d update changelog with 1.0.0 helm chart ref (#5763) 2024-02-05 20:43:40 +02:00
Aliaksandr Valialkin
9dad983bb8 docs/sd_configs.md: improve readability a bit 2024-02-05 15:32:47 +02:00
Github Actions
bbcea93ff8 Automatic update operator docs from VictoriaMetrics/operator@bfc521d (#5761)
Co-authored-by: Hui Wang <haley@victoriametrics.com>
2024-02-05 21:21:24 +08:00
Aliaksandr Valialkin
209c96fc42 docs/Cluster-VictoriaMetrics.md: document -disableReroutingOnUnavailable command-line flag
This is a follow-up for 88f0d1572e
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5713
2024-02-05 15:18:01 +02:00
Aliaksandr Valialkin
ee6586f443 docs/Cluster-VictoriaMetrics.md: move the Improve re-routing performance during restart to more appropriate place
Previoulsy it was mistakenly inserted between `No downtime strategy` and 'Minimum downtime strategy' chapters.

This is a follow-up for 37997abd14
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5293
2024-02-05 14:43:11 +02:00
Aliaksandr Valialkin
de9a9546c2 lib/cgroup: remove SetGOGC() function
GOGC can be already set via environment variable. There is no need in adding
new approaches for setting the GOGC (such as command-line flag), since they complicate operations.
2024-02-05 12:11:08 +02:00
Aliaksandr Valialkin
fce313ae7f docs/Cluster-VictoriaMetrics.md: mention -metrics.exposeMetadata command-line flag in Monitoring section 2024-02-05 11:47:32 +02:00
Aliaksandr Valialkin
5f836c8729 docs/CHANGELOG.md: add a link to all VictoriaMetrics dashboards for Grafana 2024-02-05 11:42:05 +02:00
Aliaksandr Valialkin
64027074f9 docs/Troubleshooting.md: clarify the section about GOGC tuning
This is a follow-up for 487a94565b
2024-02-05 11:01:04 +02:00
Aliaksandr Valialkin
1684766152 docs/CHANGELOG.md: add missing links to the corresponding dashboards 2024-02-05 10:55:30 +02:00
Aliaksandr Valialkin
6136c19fbc docs: mention -metrics.exposeMetadata command-line flag in Monitoring sections
This is a follow-up for 326a77c697
2024-02-05 10:49:16 +02:00
Artem Navoiev
ba36472b6b docs: fix aliasis for url-example
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-02-04 14:31:26 +01:00
Fred Navruzov
3ac49e5c38 docs: vmanomaly - fix 1.9.2 references in pull cmd and docs (#5756)
* fix 1.9.2 references in pull cmd and docs
* better readability of upgrade note
2024-02-02 19:06:07 +02:00
hagen1778
487a94565b dashboards/all: add new panel CPU spent on GC
It should help identifying cases when too much CPU is spent on garbage collection,
 and advice users on how this can be addressed.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 16:21:21 +01:00
hagen1778
29a9b31584 dashboards: add Targets scraped/s
A new stat panel shows the number of targets scraped by the vmagent per-second.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:48:26 +01:00
hagen1778
db11b94e30 dashboards: update to grafana/grafana:10.3.1
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:41:08 +01:00
hagen1778
7b3183bf71 deployment/docker: bump grafana version to grafana/grafana:10.3.1
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-02-02 15:40:44 +01:00
Aliaksandr Valialkin
a40fcc8aa6 lib/prompbmarshal: code cleanup after 8aaa828ba3 2024-02-01 21:40:16 +02:00
Aliaksandr Valialkin
0e3c532bf7 app/vmselect/netstorage: prevent from disk write IO when closing temporary files
Remove temporary file before closing it in order to signal the OS that it shouldn't
store the file contents from page cache to disk when the file is closed.

Gracefully handle the case when the file cannot be removed before being closed -
in this case remove the file after closing it. This allows working on Windows.

Also remove superflouos opening of temporary file for reading - re-use already opened file handle for writing.

This is a follow-up for 9b1e002287
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4020
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/70
2024-02-01 19:12:44 +02:00
Aliaksandr Valialkin
deed8ddfb8 docs/CHANGELOG.md: document v1.93.11 LTS release 2024-02-01 18:21:28 +02:00
Aliaksandr Valialkin
87bf1900e4 docs/CHANGELOG.md: document v1.87.14 LTS release 2024-02-01 17:08:56 +02:00
Aliaksandr Valialkin
507744ebb4 all: update VictoriaMetrics Docker image from v1.97.0 to v1.97.1
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.1
2024-02-01 16:15:01 +02:00
Aliaksandr Valialkin
5b065441c8 vendor: run make vendor-update 2024-02-01 15:24:46 +02:00
Aliaksandr Valialkin
31c53adbde docs: mark v1.97.x as long-term support release 2024-02-01 15:16:39 +02:00
Aliaksandr Valialkin
bdfa4aee0d docs/CHANGELOG.md: cut v1.97.1 2024-02-01 15:08:40 +02:00
Aliaksandr Valialkin
8f9eddb1e4 docs: sync -help output after recent changes 2024-02-01 15:06:25 +02:00
Aliaksandr Valialkin
88dc6cff70 app/vmselect: add missing whitespace into the description for -vmui.defaultTimezone command-line flag
This is a follow-up for eb6def0695
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5611
2024-02-01 14:49:51 +02:00
Aliaksandr Valialkin
0fc1f98d28 docs/Single-server-VictoriaMetrics.md: clarify Security chapter a bit 2024-02-01 14:44:27 +02:00
Aliaksandr Valialkin
ff6f7142ec docs/Single-server-VictoriaMetrics.md: run make docs-sync after 49d5e7fef5
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5262
2024-02-01 14:41:53 +02:00
Dima Lazerka
49d5e7fef5 Improve docs on security http headers (#5262)
* Improve docs on security http headers

* Apply suggestions from code review

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-02-01 12:40:11 +00:00
noodles2hg
cafd6f08b3 lib/logstorage: proper exit during block search (#5400) 2024-02-01 12:11:05 +00:00
Jiajing LU
333bda8702 count inmemoryParts that have not been taken for merge (#5447) 2024-02-01 12:06:28 +00:00
dependabot[bot]
9d17fc7004 build(deps): bump codecov/codecov-action from 3 to 4 (#5745)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3 to 4.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-01 13:57:58 +02:00
Aliaksandr Valialkin
8aaa828ba3 lib/prompbmarshal: return back custom protobuf marshaler for lib/prompbmarshal.WriteRequest
The easyproto-based marshaler is 2x slower than the previous custom marshaler,
so let's stick with it. This improves the performance for sending data to remote storage at vmagent
and reduces CPU usage to pre-v1.97.0 levels.
2024-02-01 06:33:06 +02:00
Aliaksandr Valialkin
55b5c13839 lib/encoding: follow-up for 49e3665d6d
Improve performance for typical cases of varint marshaling / unmarshaling further.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5721
2024-02-01 05:37:40 +02:00
Fuchun Zhang
49e3665d6d make encoding.MarshalVarInt64s faster (#5721)
* make encoding.MarshalVarInt64s faster

* add fast path for MarshalVarInt64s

* make UnmarshalVarUint64s faster

* remove comment
2024-02-01 05:34:37 +02:00
Aliaksandr Valialkin
c91614b626 lib/encoding: added benchmarks for marshaling / unmarshaling of varints
This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5721
2024-02-01 05:11:01 +02:00
helen
c8a96ac241 clean unused code (#5735)
Signed-off-by: helen <haitao.zhang@daocloud.io>
2024-01-31 17:50:36 +00:00
Aliaksandr Valialkin
b7fd7ee0b6 lib/promauth: follow-up for fca3b14b7b
- Simplify the code for handling BasicAuthConfig at lib/promauth/config.go
- Move the description of the change into correct place at docs/CHANGELOG.md
- Put tests for username in front of tests for password at lib/promauth/config_test.go

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5720
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511
2024-01-31 19:45:16 +02:00
Nihal
fca3b14b7b Support for username_file in scrape config (basic_auth) similar to Prometheus for having config compatibility (#5720)
* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5511

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

* adding support for username_file in basic_auth of scrape config

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>

---------

Signed-off-by: Syed Nihal <syed.nihal@nokia.com>
2024-01-31 17:41:16 +00:00
Aliaksandr Valialkin
db4623efc2 app/vmselect/netstorage: properly handle the case when an empty brsPool points to the end of brs.brs
This case is possible after a new brsPool is allocated. The fix is to verify whether len(brsPool) >= len(brs.brs)
before trying to append a new item to brsPool and sharing its contents with brs.brs.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5733
2024-01-31 10:27:50 +02:00
hagen1778
02492bc1a4 dashboards/single: fix typo in query for version annotation
The typo falsely produced many version change events.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-31 09:13:46 +01:00
Aliaksandr Valialkin
ec0ca8e7eb app/vmselect/promql: really keep metric names when keep_metric_names modifier is applied to binary operator
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5556
2024-01-31 02:32:55 +02:00
Aliaksandr Valialkin
9922a486a6 docs/vmauth.md: typo fix after 68be182075 2024-01-31 00:13:50 +02:00
Aliaksandr Valialkin
9ce75ee11b all: update VictoriaMetrics Docker image from v1.96.0 to v1.97.0
See https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.97.0
2024-01-30 23:37:28 +02:00
Aliaksandr Valialkin
fcc8b14f86 deployment/docker: upgrade base Docker image from Alpine 3.19.0 to 3.19.1
See https://www.alpinelinux.org/posts/Alpine-3.19.1-released.html
2024-01-30 22:47:18 +02:00
Aliaksandr Valialkin
26488726a8 docs/CHANGELOG.md: cut v1.97.0 2024-01-30 22:45:04 +02:00
Dan Dascalescu
a090de492c docs: t...over_time functions return fractional seconds (#5715)
* docs: t...over_time functions return fractional seconds

* Apply suggestions from code review

---------

Co-authored-by: Aliaksandr Valialkin <valyala@gmail.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 20:18:53 +00:00
Roman Khavronenko
6939c53e48 app/vmselect: set proper timestamp for cached instant responses (#5723)
* app/vmselect: set proper timestamp for cached instant responses

The change updates `getSumInstantValues` to prefer timestamp
from the most recent results. Before, timestamp from cached series
was used.

The old behavior had negative impact on recording rules as they
were getting responses with shifted timestamps in past.
Subsequent recording or alerting rules fetching results of these
recording rules could get no result due to staleness interval.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5659
Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-30 20:03:34 +00:00
Aliaksandr Valialkin
c12bdd6c28 app/vmselect/vmui: run make vmui-update after 81b5db04f6 2024-01-30 21:13:01 +02:00
Yury Molodov
81b5db04f6 vmui: add the ability to expand all tracing entries (#5677) (#5726) 2024-01-30 19:10:10 +00:00
Github Actions
300d701df0 Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@40e4e15 (#5729) 2024-01-30 19:07:31 +00:00
Aliaksandr Valialkin
f768d5d797 docs/CHANGELOG.md: document the enhancement, which reduces initial memory usage when vmagent scrapes targets with large responses
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567
2024-01-30 20:51:13 +02:00
Aliaksandr Valialkin
17f8ed8948 docs/CHANGELOG.md: refer to the related pull request for the bugfix for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945 2024-01-30 20:21:44 +02:00
Aliaksandr Valialkin
ea2752ce62 docs/CHANGELOG.md: document the bugfix addressed by the commit bc7cf4950b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
2024-01-30 20:16:22 +02:00
Aliaksandr Valialkin
32e60fe09d vendor: run make vendor-update 2024-01-30 18:47:01 +02:00
Aliaksandr Valialkin
adf585f7ed app/vmselect/vmui: run make vmui-update after 6e8995cfb92fb5a87fc6ad78609bf9ea5e0e712f 2024-01-30 18:45:57 +02:00
Aliaksandr Valialkin
bc7cf4950b lib/promscrape: use the standard net/http.Client instead of fasthttp.Client for scraping targets in non-streaming mode
While fasthttp.Client uses less CPU and RAM when scraping targets with small responses (up to 10K metrics),
it doesn't work well when scraping targets with big responses such as kube-state-metrics.
In this case it could use big amounts of additional memory comparing to net/http.Client,
since fasthttp.Client reads the full response in memory and then tries re-using the large buffer
for further scrapes.

Additionally, fasthttp.Client-based scraping had various issues with proxying, redirects
and scrape timeouts like the following ones:

- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1945
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5425
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2794
- https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1017

This should help reducing memory usage for the case when target returns big response
and this response is scraped by fasthttp.Client at first before switching to stream parsing mode
for subsequent scrapes. Now the switch to stream parsing mode is performed on the first scrape
after reading the response body in memory and noticing that its size exceeds the value passed
to -promscrape.minResponseSizeForStreamParse command-line flag.
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5567

Overrides https://github.com/VictoriaMetrics/VictoriaMetrics/pull/4931
2024-01-30 18:39:10 +02:00
Artem Navoiev
a20c289228 docs: add alias for keyconcepts
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-30 17:05:58 +01:00
Aliaksandr Valialkin
c2373a8109 lib/promscrape: fix BenchmarkScrapeWorkScrapeInternal, which has been broken by the commit 65bc460323 2024-01-30 16:06:06 +02:00
Yury Molodov
7007c6a760 vmui: fix Enter key in query field (#5667) (#5717) 2024-01-30 14:36:19 +01:00
Aliaksandr Valialkin
583b6fe1e7 app/vmagent/remotewrite: limit the concurrency for marshaling time series before sending them to remote storage
There is no sense in running more than GOMAXPROCS concurrent marshalers,
since they are CPU-bound. More concurrent marshalers do not increase the marshaling bandwidth,
but they may result in more RAM usage.
2024-01-30 12:18:19 +02:00
Aliaksandr Valialkin
431aa16c8d lib/storage: keep (date, metricID) entries only for the last two dates
Entries for the previous dates is usually not used, so there is little sense in keeping them in memory.

This should reduce the size of storage/date_metricID cache, which can be monitored
via vm_cache_entries{type="storage/date_metricID"} metric.
2024-01-29 18:43:59 +01:00
Aliaksandr Valialkin
e7844f2efd docs/keyConcepts.md: clarify the information about which data is returned by instant and range queries
Do not use `raw samples` term there, since it adds more confusion than clarity:
the `raw samples` refers to real samples stored in the database, while neither range nor instant queries
do not return raw samples - they both return *calculated* samples at *the given* timestamps.

This is a follow-up for b5978ed8f9

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5710
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5708
2024-01-29 18:19:46 +01:00
Fred Navruzov
b2434ec340 - fix link/version of helm chart in update request (#5716) 2024-01-29 18:55:07 +02:00
Aliaksandr Valialkin
5d66ee88bd lib/storage: do not check the limit for -search.maxUniqueTimeseries when performing /api/v1/labels and /api/v1/label/.../values requests
This limit has little sense for these APIs, since:

- Thses APIs frequently result in scanning of all the time series on the given time range.
  For example, if extra_filters={datacenter="some_dc"} .

- Users expect these APIs shouldn't hit the -search.maxUniqueTimeseries limit,
  which is intended for limiting resource usage at /api/v1/query and /api/v1/query_range requests.

Also limit the concurrency for /api/v1/labels, /api/v1/label/.../values
and /api/v1/series requests in order to limit the maximum memory usage and CPU usage for these API.
This limit shouldn't affect typical use cases for these APIs:

- Grafana dashboard load when dashboard labels should be loaded
- Auto-suggestion list load when editing the query in Grafana or vmui

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5055
2024-01-29 16:45:12 +01:00
Artem Navoiev
b9b18b5fd8 docs: add backward compaitble redicrt for url examples page
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-29 16:01:32 +01:00
Fred Navruzov
b4aef0c141 - update versions to 1.9.2 (#5714)
- update guide asset urls to flat
2024-01-29 15:47:27 +02:00
hagen1778
b5978ed8f9 docs: specify results of Instant and Range queries
Mention explicitly what are value and timestamp field in returned
results from Instant and Range queries.

Updates
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5710
https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5708

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 14:00:14 +01:00
Roman Khavronenko
24eb1ad0c8 vmalert: set ActiveAt to evaluation timestamp in newAlert fn (#5657)
The change fixes flaky test `TestAlertingRule_Exec` which has dependency on the actual timestamps,
which resulted into inaccurate test states:
https://github.com/VictoriaMetrics/VictoriaMetrics/actions/runs/7608452967/job/20717699688

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 12:02:02 +01:00
hagen1778
98b805544e lib/streamaggr: fix incorrect err message for min interval value
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 09:53:05 +01:00
hagen1778
c23e8bee89 dashboards: specify where to see details about dropped labels
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-29 07:37:51 +01:00
Fred Navruzov
9b555a0034 update guide and changelog to 1.9.1 (#5706) 2024-01-28 09:43:28 +02:00
hagen1778
6c6c2c185f docs: follow-up after 491287ed15
* port un-synced changed from docs/readme to readme
* consistently use `sh` instead of `console` highlight, as it looks like
a more appropriate syntax highlight
* consistently use `sh` instead of `bash`, as it is shorter
* consistently use `yaml` instead of `yml`

See syntax codes here https://gohugo.io/content-management/syntax-highlighting/

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-27 19:29:11 +01:00
hagen1778
c20d68e28d docs: follow-up after 491287ed15
491287ed15
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-27 19:11:38 +01:00
Artem Navoiev
491287ed15 docs: remove witdh from images, remove <p>, remove <div> (#5705)
* docs: remove witdh from images, remove <p>, remove <div>

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* docs: remove <div> clarify language in code blocks

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-27 10:08:07 -08:00
Daria Karavaieva
4a9f8f4cb0 version 1.9.1 update, dashboard viz flag (#5704) 2024-01-27 14:16:02 +01:00
Aliaksandr Valialkin
0ed291102d lib/decimal: follow-up for e6bad5174f
- Add a benchmark for CalbirateAndScale.
- Reduce the decimal multipliers table size from 256Kb to 192bytes.
- Use more clear naming for variables.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5672
2024-01-27 00:08:57 +01:00
Fuchun Zhang
64780f4f02 Optimize the performance of data merge: decimal.CalibrateScale() (#5672)
* Optimize the performance of data merge: decimal.CalibrateScale() from 49633 ns/op to 9146 ns/op

* Optimize the performance of data merge: decimal.CalibrateScale()
2024-01-27 00:08:56 +01:00
Aliaksandr Valialkin
1a6c3370bf vendor: run make vendor-update 2024-01-26 22:56:37 +01:00
Aliaksandr Valialkin
b9dcaaa7f8 app/vmui: run make vmui-update after a7b11eff7c 2024-01-26 22:53:46 +01:00
Hui Wang
6ee1bfeb3c add inserting comma inside value instruction to flag description (#5666) 2024-01-26 22:46:49 +01:00
Roman Khavronenko
aaa526e8ff lib/streamaggr: skip unfinished aggregation state on shutdown by default (#5689)
Sending unfinished aggregate states tend to produce unexpected anomalies with lower values than expected.
The old behavior can be restored by specifying `flush_on_shutdown: true` setting in streaming aggregation config

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:45:23 +01:00
Roman Khavronenko
df59ac7f0e app/vmalert: fix data race during hot-config reload (#5698)
* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix caches the cancel context function into local variable first. And only after
performs the group update. With cached cancel function we can safely call it without
worrying that we cancel the evaluation for already updated group.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Revert "app/vmalert: fix data race during hot-config reload"

This reverts commit a4bb7e8932.

* app/vmalert: fix data race during hot-config reload

During hot-reload, the logic evokes the group update and rules evaluation
interruption simultaneously. Falsely assuming that interruption happens before
the update. However, it could happen that group will be updated first and only
after the rules evaluation will be cancelled. Which will result in permanent
interruption for all rules within the group.

The fix cancels the evaulation context before applying the update, making sure
that the context will be cancelled for old group always.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* wip

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-26 22:42:21 +01:00
Yury Molodov
a7b11eff7c vmui: fix Enter key in query field (#5667) (#5681) 2024-01-26 22:38:32 +01:00
Aliaksandr Valialkin
3bce55be0c docs: update -help output after bb7a419cc3 2024-01-26 22:28:40 +01:00
Aliaksandr Valialkin
bb7a419cc3 lib/{mergeset,storage}: make background merge more responsive and scalable
- Maintain a separate worker pool per each part type (in-memory, file, big and small).
  Previously a shared pool was used for merging all the part types.
  A single merge worker could merge parts with mixed types at once. For example,
  it could merge simultaneously an in-memory part plus a big file part.
  Such a merge could take hours for big file part. During the duration of this merge
  the in-memory part was pinned in memory and couldn't be persisted to disk
  under the configured -inmemoryDataFlushInterval .

  Another common issue, which could happen when parts with mixed types are merged,
  is uncontrolled growth of in-memory parts or small parts when all the merge workers
  were busy with merging big files. Such growth could lead to significant performance
  degradataion for queries, since every query needs to check ever growing list of parts.
  This could also slow down the registration of new time series, since VictoriaMetrics
  searches for the internal series_id in the indexdb for every new time series.

  The third issue is graceful shutdown duration, which could be very long when a background
  merge is running on in-memory parts plus big file parts. This merge couldn't be interrupted,
  since it merges in-memory parts.

  A separate pool of merge workers per every part type elegantly resolves both issues:
  - In-memory parts are merged to file-based parts in a timely manner, since the maximum
    size of in-memory parts is limited.
  - Long-running merges for big parts do not block merges for in-memory parts and small parts.
  - Graceful shutdown duration is now limited by the time needed for flushing in-memory parts to files.
    Merging for file parts is instantly canceled on graceful shutdown now.

- Deprecate -smallMergeConcurrency command-line flag, since the new background merge algorithm
  should automatically self-tune according to the number of available CPU cores.

- Deprecate -finalMergeDelay command-line flag, since it wasn't working correctly.
  It is better to run forced merge when needed - https://docs.victoriametrics.com/#forced-merge

- Tune the number of shards for pending rows and items before the data goes to in-memory parts
  and becomes visible for search. This improves the maximum data ingestion rate and the maximum rate
  for registration of new time series. This should reduce the duration of data ingestion slowdown
  in VictoriaMetrics cluster on e.g. re-routing events, when some of vmstorage nodes become temporarily
  unavailable.

- Prevent from possible "sync: WaitGroup misuse" panic on graceful shutdown.

This is a follow-up for fa566c68a6 .
Thanks @misutoth to for the inspiration at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3790
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3551
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3337
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3425
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3647
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3641
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/648
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/291
2024-01-26 22:27:47 +01:00
Artem Navoiev
97937d58c4 docs: remove <p> for imanges (#5702)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 13:06:48 -08:00
Artem Navoiev
3e0a117ddf remove all <div> as far they obsolete and can break markdown (#5701)
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 12:52:21 -08:00
Aliaksandr Valialkin
e84c877503 lib/mergeset: remove inmemoryBlock pooling, since it wasn't effecitve
This should reduce memory usage a bit when new time series are ingested at high rate (aka high churn rate)
2024-01-26 21:34:57 +01:00
Artem Navoiev
7de19c3748 docs: delete docs/provision_datasources.png as we support webp
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 21:25:39 +01:00
Github Actions
5a8daa725e Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@ef5cfe6 (#5700) 2024-01-26 12:22:17 -08:00
Aliaksandr Valialkin
2655c02d5e lib/logstorage: make sure that WaitGroup.Add isnt called after stopCh is closed and WaitGroup.Wait is called
This protects from rare panic, which may occur during graceful shutdown of VictoriaLogs
2024-01-26 21:17:02 +01:00
Aliaksandr Valialkin
8e03bc6b53 app/vmselect/promql: do not spend CPU time on verifying whether the rollup cache needs to be reset for the given metric rows when it has been already instructed to reset 2024-01-26 21:13:38 +01:00
Aliaksandr Valialkin
4ac7e3a355 docs/Makefile: mention that the Makefile rules must be run from VictoriaMetrics repository root 2024-01-26 21:01:40 +01:00
Aliaksandr Valialkin
32c064a401 app/vmauth: return 503 service unavailable status code when the backend returns response with unsupported status code, but the request cannot be re-tried.
While at it, properly close response body. This should prevent from possible http keep-alive connection leak to backends because of unclosed response bodies.

This is a follow-up for 3c0aa14b5b
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5688
2024-01-26 20:43:11 +01:00
Artem Navoiev
60fc2da6c1 docs: fix key concepts image and links
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 20:30:29 +01:00
Artem Navoiev
25165656bb docs: change [image] to img as far we support it in release guide
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 19:01:20 +01:00
Artem Navoiev
41e99765cc docs: remoev vmanomaly as far we have dedicated section with alredy exists redirects
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 18:38:01 +01:00
Artem Navoiev
bc033a2b30 docs: vmanomaly fix images
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 17:59:37 +01:00
Daria Karavaieva
105cb44884 Vmanomaly Guide dashboard provisioning (#5679)
* dashboard provisioning

* delete dashboard filter, new query

* dashboard screens, guide fixes
2024-01-26 17:12:58 +01:00
Artem Navoiev
9ded04e643 docs: remove raw and endraw tags as they are not needed for the new v… (#5696)
* docs: remove raw and endraw tags as they are not needed for the new version of site

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* revert formating in vmaler

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-26 07:30:45 -08:00
Github Actions
fae801edd3 Automatic update operator docs from VictoriaMetrics/operator@0628def (#5694) 2024-01-26 10:21:52 +01:00
Github Actions
2582b1e15a Automatic update Grafana datasource docs from VictoriaMetrics/grafana-datasource@c644bec (#5691) 2024-01-26 11:44:02 +04:00
Roman Khavronenko
b11f4ef5ea app/vmalert: autogenerate ALERTS_FOR_STATE time series for alerting rules with for: 0 (#5680)
* app/vmalert: autogenerate `ALERTS_FOR_STATE` time series for alerting rules with `for: 0`

 Previously, `ALERTS_FOR_STATE` was generated only for alerts with `for > 0`.
 This behavior differs from Prometheus behavior - it generates ALERTS_FOR_STATE
 time series for alerting rules with `for: 0` as well. Such time series can
 be useful for tracking the moment when alerting rule became active.

 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5648
 https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3056

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* app/vmalert: support ALERTS_FOR_STATE in `replay` mode

Signed-off-by: hagen1778 <roman@victoriametrics.com>

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 15:42:57 +01:00
Github Actions
a95246d885 Automatic update operator docs from VictoriaMetrics/operator@e75a096 (#5690) 2024-01-25 15:33:01 +01:00
hagen1778
d71218d6ce docs: simplify instructions for spinning up docker env
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 15:27:57 +01:00
Github Actions
e29fe0933b Automatic update operator docs from VictoriaMetrics/operator@f6b9c08 (#5676) 2024-01-25 15:10:23 +01:00
Alexander Marshalov
3c0aa14b5b vmauth: fix vmauth_user_request_backend_errors_total metric calc logic for use case when only one backend is available - if we get an error from the retry_status_codes list, but cannot execute retry, we increment vmauth_user_request_backend_errors_total as well (#5688) 2024-01-25 14:04:20 +01:00
hagen1778
56310ffb47 docs: fix the issue link
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-25 13:35:41 +01:00
Alexander Marshalov
495fa9800a Follow up after https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5685 (#5686) 2024-01-25 10:03:46 +01:00
Alexander Marshalov
d5682858c0 Make a makefile work on a MacOS and Linux (#5685) 2024-01-25 09:58:01 +01:00
Aliaksandr Valialkin
c3a585cfe5 lib/storage: rename *AssistedMerges to *AssistedMergesCount in order to make these field names less misleading
These fields are counters, not gauges, so adding Count suffix to them makes easier to understand this while reading the code
2024-01-25 10:19:32 +02:00
Alexander Marshalov
806c07ddd5 vmsingle/vmselect returns http status 429 (TooManyRequests) instead of 503 (ServiceUnavailable) when max concurrent requests limit is reached. (#5682) 2024-01-24 17:55:06 +01:00
Aliaksandr Valialkin
18df07e824 lib/mergeset: start assisted merge for file parts only if the number of file parts is bigger than maxFileParts
The maxFileParts usage has been accidentally removed in fa566c68a6

While at it, add Count suffix to *AssistedMerges counter names in order to make them less misleading.
Previously their names were falsely suggesting that these are gauges, which show the number of concurrently
executed assisted merges.
2024-01-24 15:08:42 +02:00
Aliaksandr Valialkin
ac5b740750 lib/promscrape/discovery/kubernetes: typo fix in the comment for ContainerStateTerminated struct
This is a follow-up for ef12598ad4
2024-01-24 15:06:46 +02:00
Aliaksandr Valialkin
ef12598ad4 lib/promscrape/discovery/kubernetes: do not generate targets for already terminated pods and containers
Already terminated pods and containers cannot be scraped and will never resurrect,
so there is zero sense in creating scrape targets for them.
2024-01-24 14:57:53 +02:00
Aliaksandr Valialkin
4d961c70f7 app/{vmselect,vmstorage}: return compression of the data passed from vmstorage to vmselect
This reverts cd4f641d32 , since it has been appeared that the disabled compression
for vmstorage->vmselect data increase network bandwidth usage by more than 10x on typical production workloads,
while it decreases CPU usage at vmstorage by up to 10% and improves query latency by up to 10%.

The 10x increase in network usage is too high price for 10% improvements on query latency and vmstorage CPU usage.
This may result in network bandwidth bottlenecks, which can reduce the overall performance and stability
of VictoriaMetrics cluster. That's why return back the vmstorage->vmselect data compression by default.

The vmstorage->vmselect compression can be disabled by passing -rpc.disableCompression command-line flag to vmstorage.
The vmselect->vmselect compression in multi-level cluster setup can be disabled by passing -clusternative.disableCompression command-line flag.
2024-01-24 13:39:28 +02:00
Aliaksandr Valialkin
f888a019fe lib/streamaggr: expand %{ENV} placeholders in stream aggregation configs 2024-01-24 12:31:27 +02:00
Aliaksandr Valialkin
fa566c68a6 lib/mergeset: really limit the number of in-memory parts to 15
It has been appeared that the registration of new time series slows down linearly
with the number of indexdb parts, since VictoriaMetrics needs to check every indexdb part
when it searches for TSID by newly ingested metric name.

The number of in-memory parts grows when new time series are registered
at high rate. The number of in-memory parts grows faster on systems with big number
of CPU cores, because the mergeset maintains per-CPU buffers with newly added entries
for the indexdb, and every such entry is transformed eventually into a separate in-memory part.

The solution has been suggested in https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212
by @misutoth - to limit the number of in-memory parts with buffered channel.
This solution is implemented in this commit. Additionally, this commit merges per-CPU parts
into a single part before adding it to the list of in-memory parts. This reduces CPU load
when searching for TSID by newly ingested metric name.

The https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5212 recommends setting the limit on the number
of in-memory parts to 100, but my internal testing shows that much lower limit 15 works with the same efficiency
on a system with 16 CPU cores while reducing memory usage for `indexdb/dataBlocks` cache by up to 50%.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5190
2024-01-24 03:38:12 +02:00
Aliaksandr Valialkin
5543c04061 docs/Cluster-VictoriaMetrics.md: document that vmstorage doesnt compress data it sends to vmselect by default
This is a follow-up for cd4f641d32
2024-01-23 23:22:18 +02:00
Aliaksandr Valialkin
8fb8b71295 lib/encoding: remove uneeded re-slicing of byte slice before passing it to binary.BigEndian.Uint* 2024-01-23 22:50:29 +02:00
Aliaksandr Valialkin
1c58c00618 app/vmselect/netstorage: limit the initial size for brsPoolCap with 32Kb
This should reduce the number of expensive memory allocations with sizes bigger than 32Kb
2024-01-23 22:29:39 +02:00
Aliaksandr Valialkin
43ecd5d258 app/vmselect/netstorage: pre-allocate memory for metricNamesBuf
This should reduce the number of metricNamesBuf re-allocations in append()
2024-01-23 21:34:16 +02:00
Aliaksandr Valialkin
ae643ef1f1 lib/{storage,mergeset}: reduce the maxium compression level for the stored data
This reduces CPU usage a bit, while doesn't increase resulting file sizes according to synthetic tests.
2024-01-23 17:46:50 +02:00
Github Actions
05c9a4d7ce Automatic update operator docs from VictoriaMetrics/operator@1470569 (#5668) 2024-01-23 16:22:16 +01:00
Aliaksandr Valialkin
6c214397ed lib/storage: compress metricIDs, which match the given filters, before storing them in tagFiltersToMetricIDsCache
This allows reducing the indexdb/tagFiltersToMetricIDs cache size by 8 on average.
The cache size can be checked via vm_cache_size_bytes{type="indexdb/tagFiltersToMetricIDs"} metric exposed at /metrics page.
2024-01-23 16:09:55 +02:00
Aliaksandr Valialkin
4d78954158 lib/storage: do not sort metricIDs passed to Storage.prefetchMetricNames, since the caller is responsible for the sorting 2024-01-23 16:08:38 +02:00
Aliaksandr Valialkin
6d84b1beef lib/filestream: do not measure read / write duration from / to in-memory buffers
Measuring read / write duration from / to in-memory buffers has little sense,
since it will be always fast. It is better to measure read / write duration from / to
real files at vm_filestream_write_duration_seconds_total and vm_filestream_read_duration_seconds_total metrics.
This also reduces overhead on time.Now() and Histogram.UpdateDuration() calls
per each filestream.Reader.Read() and filestream.Writer.Write() call when the data is read / written from / to in-memory buffers.

This is a follow-up for 2f63dec2e3
2024-01-23 14:52:22 +02:00
Aliaksandr Valialkin
41456d9569 app/vmselect/netstorage: limit the maximum brsPool size to 32Kb at ProcessSearchQuery()
This avoids slow path in Go runtime for allocating objects bigger than 32Kb -
see 704401ffa0/src/runtime/malloc.go (L11)

This also reduces memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 5dd37ad836 .

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 14:04:49 +02:00
Aliaksandr Valialkin
1f1768d7af app/vmselect/netstorage: limit the size of metricNamesBuf to 32Kb in order to avoid slow path at Go runtime for allocating a byte slice of bigger size
See 704401ffa0/src/runtime/malloc.go (L11)

This also reduces the average memory usage a bit for vmselect and single-node VictoriaMetrics
after the commit 508c608062

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 13:46:37 +02:00
Aliaksandr Valialkin
fac7c30f4e docs/vmagent.md: clarify how -promscrape.seriesLimitPerTarget command-line flag, series_limit config option and __series_limit__ label interact with each other
This is a follow-up for https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5663
See also 89e3c70ccd
2024-01-23 13:14:50 +02:00
Roman Khavronenko
89e3c70ccd lib/promscrape: respect 0 value for series_limit param (#5663)
* lib/promscrape: respect `0` value for `series_limit` param

Respect `0` value for `series_limit` param in `scrape_config`
even if global limit was set via `-promscrape.seriesLimitPerTarget`.
Previously, `0` value will be ignored in favor of `-promscrape.seriesLimitPerTarget`.

This behavior aligns with possibility to override `series_limit` value via
relabeling with `__series_limit__` label.

Signed-off-by: hagen1778 <roman@victoriametrics.com>

* Update docs/CHANGELOG.md

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 13:09:14 +02:00
Aliaksandr Valialkin
1c5163ae51 lib/mergeset: make sure that the first and the last items are in the original range after prepareBlock()
Previously the checks were to strict by requiring to leave the same first and last items by prepareBlock()

Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5655
2024-01-23 12:58:32 +02:00
Fred Navruzov
2adb38a9c4 - fix 404 errors after page remaning (#5664)
- slight text fixes
2024-01-23 01:56:42 -08:00
Aliaksandr Valialkin
15a15e5b99 app/vmselect/vmui: run make vmui-update in order to sync recent changes in app/vmui 2024-01-23 04:31:44 +02:00
Aliaksandr Valialkin
114822d585 app/{vmstorage,vmselect}: disable vmstorage->vmselect RPC compression by default in order to improve query performance 2024-01-23 04:24:57 +02:00
Zakhar Bessarab
bf4742526d lib/storage: print tenant ID in log when discarding or truncating labels (#5658)
Previously, it was not possible to determine which tenant sends metrics with excessive amount of labels of label values.

Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:24:56 +02:00
Yury Molodov
38231d5994 vmui: query report (#5497)
* vmui: add query analyzer page

* vmui: fix tabs for query analyzer

* vmui: add help to export query

* vmui: add time params to query analyzer

* docs/vmui: add query analyzer

* vmui: fix validation JSON form

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:23:26 +02:00
Yury Molodov
eb6def0695 vmui: add flag for default timezone setting (#5611)
* vmui: add flag for default timezone setting #5375

* vmui: validate timezone before client return

* Update app/vmselect/vmui.go

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:11:19 +02:00
Yury Molodov
633e6b48ad vmui: fix cache autocomplete (#5591)
* vmui: fix the logic of closing the popper #5470

* vmui: fix the logic of caching autocomplete results #5472

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-23 04:06:14 +02:00
Aliaksandr Valialkin
980338861f lib/mergeset: skip comparison for every item in the block during merge if the last item in the block is smaller than the first item in the next block
Thanks to @ahfuzhang for the suggestion at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5651
2024-01-23 03:15:52 +02:00
Aliaksandr Valialkin
bc7d19c8ca app/vmselect/promql: remove superflouos memory allocations at aggrPrepareSeries()
While at it, also remove unneeded map lookup
2024-01-23 02:28:31 +02:00
Aliaksandr Valialkin
9240bc36a3 app/vmselect/promql/aggr_incremental.go: eliminate unnecessary memory allocation in incrementalAggrFuncContext.updateTimeseries 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
e0399ec29a app/vmselect/netstorage: remove tswPool, since it isnt efficient 2024-01-23 02:28:30 +02:00
Aliaksandr Valialkin
72a838a2a1 app/vmselect/netstorage: avoid metricName->blockRef lookup when processing multiple blocks for the same time series
This saves a few CPU cycles for common case
2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
5dd37ad836 app/vmselect/netstorage: use []blockRef from blockRefPool in order to reduce memory allocations 2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
7345567c29 app/vmselect/netstorage: substitute pointer to blockRefs by brssPool index at the metricName->blockRefs map
This should reduce the pressure on Go GC, since it will see lower number of pointers.

This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:29 +02:00
Aliaksandr Valialkin
678234e9f0 app/vmselect/netstorage: reduce the number of allocations for blockRefs objects in ProcessSearchQuery()
This should reduce pressure on Go GC at vmselect

The change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:28 +02:00
Aliaksandr Valialkin
508c608062 app/vmselect/netstorage: reduce the number of memory allocations in ProcessSearchQuery() by storing all the metric names in a single byte slice
This reduces the number of memory allocations at the cost of possible memory usage increase,
since now different metric name strings may hold references to the previous byte slice.
This is good tradeoff, since ProcessSearchQuery is called in vmselect, and vmselect isn't usually limited by memory.

This change has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5527
2024-01-23 02:28:28 +02:00
Daria Karavaieva
ffaf48b99e add 1.8.0 notes to changelog (#5616)
* add 1.8.0 notes to changelog

* added release date

* MAD internal link

* monitoring health deprecation
2024-01-22 23:51:12 +01:00
Jaskeerat Singh Randhawa
b606521745 custom-resources: fix link text for alertmanager (#5660) 2024-01-22 18:06:40 +01:00
Aliaksandr Valialkin
3449d563bd all: add up to 10% random jitter to the interval between periodic tasks performed by various components
This should smooth CPU and RAM usage spikes related to these periodic tasks,
by reducing the probability that multiple concurrent periodic tasks are performed at the same time.
2024-01-22 18:40:32 +02:00
Aliaksandr Valialkin
9b4294e53e lib/storage: reduce the contention on dateMetricIDCache mutex when new time series are registered at high rate
The dateMetricIDCache puts recently registered (date, metricID) entries into mutable cache protected by the mutex.
The dateMetricIDCache.Has() checks for the entry in the mutable cache when it isn't found in the immutable cache.
Access to the mutable cache is protected by the mutex. This means this access is slow on systems with many CPU cores.
The mutabe cache was merged into immutable cache every 10 seconds in order to avoid slow access to mutable cache.
This means that ingestion of new time series to VictoriaMetrics could result in significant slowdown for up to 10 seconds
because of bottleneck at the mutex.

Fix this by merging the mutable cache into immutable cache after len(cacheItems) / 2
cache hits under the mutex, e.g. when the entry is found in the mutable cache.
This should automatically adjust intervals between merges depending on the addition rate
for new time series (aka churn rate):

- The interval will be much smaller than 10 seconds under high churn rate.
  This should reduce the mutex contention for mutable cache.
- The interval will be bigger than 10 seconds under low churn rate.
  This should reduce the uneeded work on merging of mutable cache into immutable cache.
2024-01-22 18:40:32 +02:00
hagen1778
8b8d0e3677 deployment/docker: fix typo in commands example
Follow up after 38b2a5bc44

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 16:56:27 +01:00
hagen1778
b25ef138ce dashboards: reflect dashboard rename in copy script
This is a follow-up for ff33e60a3d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 16:51:24 +01:00
hagen1778
0e5e502b3c deployment/docker: follow-up 38b2a5bc44
* Simplify folder structure
* mention datasource in README

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 16:05:44 +01:00
Dmytro Kozlov
38b2a5bc44 deployment/docker: add grafana datasource to the docker-compose files (#5363)
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3920
https://github.com/VictoriaMetrics/grafana-datasource/issues/113
2024-01-22 15:45:31 +01:00
hagen1778
1075fcfc8c app/vmctl/backoff: fix flaky test
The change removes artificial delay before returning error, which sometimes
caused less retry events than expected.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 12:21:14 +01:00
hagen1778
da556cc329 docs: fix Grafana link example for vmalert
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-22 09:35:18 +01:00
dependabot[bot]
df197723ae build(deps): bump github/codeql-action from 2 to 3 (#5462)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2 to 3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v2...v3)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-22 01:49:17 +02:00
Aliaksandr Valialkin
d3ee3e0ef5 Revert "lib/promscrape: do not store last scrape response when stale markers … (#5577)"
This reverts commit cfec258803.

Reason for revert: the original code already doesn't store the last scrape response when stale markers are disabled.
The scrapeWork.areIdenticalSeries() function always returns true is stale markers are disabled.
This prevents from storing the last response at scrapeWork.processScrapedData().

It looks like the reverted commit could also return back the issue https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3660

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5577
2024-01-22 00:43:48 +02:00
Aliaksandr Valialkin
9c0863babc docs: use persistent links to Grafana dashboards
These links do not depend on the dashboard name, so they do not break after the renaming of the dashboard.

This is a follow-up for ff33e60a3d
2024-01-22 00:17:17 +02:00
Aliaksandr Valialkin
1c7f990fad app/vmselect: handle negative time range start in a generic manner inside NewSearchQuery()
This is a follow-up for cf03e11d89

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5553
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5630
2024-01-21 23:45:31 +02:00
Artem Navoiev
3f7ed7e6b2 docs vmanomaly fix anchor
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-21 22:21:37 +01:00
Hui Wang
4e3242b02d lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice (#5557)
* lib/promscrape/discovery/kubernetes: fix watcher start order for roles endpoints and endpointslice

Previously the groupWatcher could be mistakenly stopped when requests for pod or services resources take too long.

* remove mislead comment

* docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

* wip

* lib/promscrape/kubernetes: prevent from stopping groupWatcher when there are in-flight apiWatcher.mustStart() calls

groupWatcher is stopped if it has zero registered apiWatchers during 14 seconds.
But such a groupWatcher can be still in use if apiWatcher for `role: endpoints` or `role: endpointslice`
is being registered and the discovery of the associated `pod` and/or `service` objects takes longer
than 14 seconds - see the beginning of groupWatcher.startWatchersForRole() function for details.

Track the number of in-flight calls to apiWatcher.mustStart() and prevent from stopping the associated groupWatcher
if the number of in-flight calls is non-zero.

P.S. postponing the discovery of `pod` and/or `service` objects associated with `endpoints` or `endpointslice` roles
isn't the best solution, since it slows down initial discovery of `endpoints` and `endpointslice` targets.

* typo fix

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 23:13:15 +02:00
Aliaksandr Valialkin
1f105dde98 all: allow dynamically reading *AuthKey flag values from files and urls
Examples:

1) -metricsAuthKey=file:///abs/path/to/file - reads flag value from the given absolute filepath
2) -metricsAuthKey=file://./relative/path/to/file - reads flag value from the given relative filepath
3) -metricsAuthKey=http://some-host/some/path?query_arg=abc - reads flag value from the given url

The flag value is automatically updated when the file contents changes.
2024-01-21 22:03:38 +02:00
Fred Navruzov
7e68722686 - fix 404 links after renaming (#5653)
- improve wording on diagram
- swap enterprise/about chapters for page clarity
2024-01-21 21:24:29 +02:00
Fred Navruzov
0038102b98 docs: vmanomaly - add component interaction diagram (#5652)
* add interaction diagram for vmanomaly components
* small docs fixes
* resolve suggestions
2024-01-21 19:33:59 +02:00
Aliaksandr Valialkin
0b2ea1a7c7 all: call atomic.Load* in front of atomic.CompareAndSwap* at places where the atomic.CompareAndSwap* returns false most of the time
This allows avoiding slow inter-CPU synchornization induced by atomic.CompareAndSwap*
2024-01-21 14:04:54 +02:00
Aliaksandr Valialkin
3d83f3347d docs/sd_configs.md: mention -promscrape.kubernetes.attachNodeMetadataAll flag in the description for attach_metadata section
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640
2024-01-21 13:26:03 +02:00
Aliaksandr Valialkin
4eb9926125 lib/promscrape: code cleanup: send stale markers immediately after generating automatic metrics
This cleanup has been extracted from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5557/files#diff-6b205cf6637d7b65a5c45d9417d08822d4efad94227268cb196f61aa2a0fc0f7
2024-01-21 05:18:22 +02:00
Aliaksandr Valialkin
12f2c5679b all: consistently clear prompbmarshal.Label by assigning an empty struct instead of zeroing Name and Value individually 2024-01-21 05:11:05 +02:00
Aliaksandr Valialkin
90768aa418 lib/storage/partition.go: remove misleading comment, which falsely states that inmemoryParts isn't visible to search
Thanks to @satjd for raising attention to this comment at https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5410
2024-01-21 04:49:35 +02:00
Nikolay
b3598ba2c1 app/vmauth: adds metric_labels and backend_errors counter (#5585)
* app/vmauth: adds metric_labels and backend_errors counter
it must improve observability for user requests with new metric - per user backend errors counter.
it's needed to calculate requests fail rate to the configured backends.
metric_labels configuration allows to perform additional aggregations on top of multiple users from configuration section.
It could be multiple clients or clients with separate read/write tokens
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5565

* wip

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 04:40:52 +02:00
Yury Molodov
3ea1294ad2 vmui: add autofocus to input for desktop version #5479 (#5592) 2024-01-21 03:24:16 +02:00
Aliaksandr Valialkin
7fba73ce11 lib/promscrape/discovery/kubernetes: add -promscrape.kubernetes.attachNodeMetadataAll command-line flag
This flag allows setting attach_metadata.node=true for all the kubernetes_sd_configs defined at -promscrape.config

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4640

Thanks to wasim-nihal for the initial implementation at https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5593
2024-01-21 03:13:56 +02:00
Hui Wang
fad212c39c app/vmselect/promql: properly handle possible negative results caused… (#5608)
* app/vmselect/promql: properly handle possible negative results caused by float operations precision error in rollup functions like rate() or increase()

* fix test
2024-01-21 02:53:29 +02:00
Nikolay
c9f39fd51f app/vmselect/netstorage (#5649)
* app/vmselect/netstorage

correctly handle errGlobal set

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5649

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 02:47:29 +02:00
Nikolay
8ab0ce3ded app/vmselect: abort streaming connections for vmselect (#5650)
* app/vmselect: abort streaming connections for vmselect
due to streaming nature of export APIs, curl and simmilr tools cannot
detect errors that happened after http.Header with status 200 was
written to it.

This PR tracks if body write was already started and closes connection.

It allows client to detect not expected chunk sequence and return error
to the caller.

Mostly it affects vmselect at cluster version

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645

* wip

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5645
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5650

---------

Co-authored-by: Aliaksandr Valialkin <valyala@victoriametrics.com>
2024-01-21 02:12:51 +02:00
Aliaksandr Valialkin
74448a7e57 lib/promscrape/discovery/hetzner: follow-up after 03a97dc678
- docs/sd_configs.md: moved hetzner_sd_configs docs to the correct place according to alphabetical order of SD names,
  document missing __meta_hetzner_role label.
- lib/promscrape/config.go: added missing MustStop() call for Hetzner SD,
  and moved the code to the correct place according to alphabetical order of SD names.
- lib/promscrape/discovery/hetzner: properly handle pagination for hloud API responses,
  populate missing __meta_hetzner_role label like Prometheus does.
- Properly populate __meta_hetzner_public_ipv6_network label like Prometheus does.
- Remove unused SDConfig.Token.
- Remove "omitempty" annotation from SDConfig.Role field, since this field is mandatory.

Updates https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5550
Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3154
2024-01-20 17:01:53 +02:00
Artem Navoiev
873483a782 vmanomly use proper title for overiview doc
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 19:59:17 +01:00
Hui Wang
cfec258803 lib/promscrape: do not store last scrape response when stale markers … (#5577)
* lib/promscrape: do not store last scrape response when stale markers are disabled

* update changelog
2024-01-20 00:53:41 +08:00
Artem Navoiev
6a2a8cd426 docs add docs titles
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 17:02:20 +01:00
Artem Navoiev
dfa43da1a2 vmanonamly docs fix sorting for jekill as far it doesnt support of sorting the folders
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:55:44 +01:00
Artem Navoiev
1af5faa4af vmanomaly remove unused pages from menu
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:50:12 +01:00
Artem Navoiev
5e17636994 vmanomly specify the right menu parent for overview page
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:43:18 +01:00
Artem Navoiev
c425ec3088 docs vmanmoly fix sorting
Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 16:15:40 +01:00
Artem Navoiev
ec85d32e21 Move vmanomaly page to its own section (#5646)
* docs: move vmanomaly overview page to its section

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* add alias for backward compatibility

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* fix links

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

* change title

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>

---------

Signed-off-by: Artem Navoiev <tenmozes@gmail.com>
2024-01-19 07:00:41 -08:00
Roman Khavronenko
7e374c227f app/vmui: send step param for instant queries (#5639)
The change reverts https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3896
 due to reasons explained in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3896#issuecomment-1896704401

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-19 08:48:16 +01:00
Fred Navruzov
69ae1d30bf docs: vmanomaly slight improvements (#5637)
* - better messaging
- update links to dockerhub in guides
- added anomaly_score to FAQ
- improve model section (sort + use cases)
- slight refactor of a guide

* rename guide & change refs

* change wording in installation options

* - update remaining text reference
- add cross-link to component sections in guide

* add docs/.jekyll-metadata to .gitignore
2024-01-18 02:37:36 -08:00
hagen1778
0a5ffb3bc1 docs: remove slug from Grafana dashboard URLs
Each Grafana dashboard has unique ID which can be used to fetch the dashboard
from grafana.com: https://grafana.com/grafana/dashboards/11176
The same dashboard can be accessed via URL with slug: https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/
But using slug implies that any change to dashboard name will break the link.
So it is better to just use ID, so the dashboard URL will never break.

This is follow-up for ff33e60a3d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2024-01-18 11:19:53 +01:00
858 changed files with 37351 additions and 29240 deletions

View File

@@ -60,8 +60,8 @@ body:
For VictoriaMetrics health-state issues please provide full-length screenshots
of Grafana dashboards if possible:
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229-victoriametrics-single-node/)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176-victoriametrics-cluster/)
* [Grafana dashboard for single-node VictoriaMetrics](https://grafana.com/grafana/dashboards/10229/)
* [Grafana dashboard for VictoriaMetrics cluster](https://grafana.com/grafana/dashboards/11176/)
See how to setup monitoring here:
* [monitoring for single-node VictoriaMetrics](https://docs.victoriametrics.com/#monitoring)

View File

@@ -25,7 +25,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build

View File

@@ -36,11 +36,11 @@ jobs:
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3
with:
category: "javascript"

View File

@@ -63,7 +63,7 @@ jobs:
if: ${{ matrix.language == 'go' }}
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -75,7 +75,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -86,7 +86,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
uses: github/codeql-action/autobuild@v3
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
@@ -100,4 +100,4 @@ jobs:
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v3

View File

@@ -41,7 +41,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -71,7 +71,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -102,7 +102,7 @@ jobs:
cache: false
- name: Cache Go artifacts
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: |
~/.cache/go-build
@@ -115,6 +115,6 @@ jobs:
run: make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
file: ./coverage.txt

1
.gitignore vendored
View File

@@ -22,3 +22,4 @@ Gemfile.lock
/_site
_site
*.tmp
/docs/.jekyll-metadata

View File

@@ -1,6 +1,6 @@
PKG_PREFIX := github.com/VictoriaMetrics/VictoriaMetrics
MAKE_CONCURRENCY ?= $(shell cat /proc/cpuinfo | grep -c processor)
MAKE_CONCURRENCY ?= $(shell getconf _NPROCESSORS_ONLN)
MAKE_PARALLEL := $(MAKE) -j $(MAKE_CONCURRENCY)
DATEINFO_TAG ?= $(shell date -u +'%Y%m%d-%H%M%S')
BUILDINFO_TAG ?= $(shell echo $$(git describe --long --all | tr '/' '-')$$( \
@@ -178,7 +178,8 @@ victoria-metrics-crossbuild: \
victoria-metrics-darwin-amd64 \
victoria-metrics-darwin-arm64 \
victoria-metrics-freebsd-amd64 \
victoria-metrics-openbsd-amd64
victoria-metrics-openbsd-amd64 \
victoria-metrics-windows-amd64
vmutils-crossbuild: \
vmutils-linux-386 \
@@ -465,7 +466,7 @@ benchmark-pure:
vendor-update:
go get -u -d ./lib/...
go get -u -d ./app/...
go mod tidy -compat=1.20
go mod tidy -compat=1.22
go mod vendor
app-local:

464
README.md

File diff suppressed because it is too large Load Diff

View File

@@ -2,13 +2,17 @@
## Supported Versions
The following versions of VictoriaMetrics receive regular security fixes:
| Version | Supported |
|---------|--------------------|
| [latest release](https://docs.victoriametrics.com/CHANGELOG.html) | :white_check_mark: |
| v1.93.x LTS release | :white_check_mark: |
| v1.87.x LTS release | :white_check_mark: |
| v1.97.x [LTS line](https://docs.victoriametrics.com/LTS-releases.html) | :white_check_mark: |
| v1.93.x [LTS line](https://docs.victoriametrics.com/LTS-releases.html) | :white_check_mark: |
| other releases | :x: |
See [this page](https://victoriametrics.com/security/) for more details.
## Reporting a Vulnerability
Please report any security issues to security@victoriametrics.com

View File

@@ -11,7 +11,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlselect"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vlstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
@@ -22,11 +21,10 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":9428", "TCP address to listen for http connections. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. See also -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the given -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
gogc = flag.Int("gogc", 100, "GOGC to use. See https://tip.golang.org/doc/gc-guide")
)
func main() {
@@ -34,18 +32,21 @@ func main() {
flag.CommandLine.SetOutput(os.Stdout)
flag.Usage = usage
envflag.Parse()
cgroup.SetGOGC(*gogc)
buildinfo.Init()
logger.Init()
logger.Infof("starting VictoriaLogs at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":9428"}
}
logger.Infof("starting VictoriaLogs at %q...", listenAddrs)
startTime := time.Now()
vlstorage.Init()
vlselect.Init()
vlinsert.Init()
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started VictoriaLogs in %.3f seconds; see https://docs.victoriametrics.com/VictoriaLogs/", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -53,9 +54,9 @@ func main() {
logger.Infof("received signal %s", sig)
pushmetrics.Stop()
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
startTime = time.Now()
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())

View File

@@ -26,8 +26,8 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8428", "TCP address to listen for http connections. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP addresses to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
minScrapeInterval = flag.Duration("dedup.minScrapeInterval", 0, "Leave only the last sample in every time series per each discrete interval "+
@@ -66,7 +66,11 @@ func main() {
return
}
logger.Infof("starting VictoriaMetrics at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8428"}
}
logger.Infof("starting VictoriaMetrics at %q...", listenAddrs)
startTime := time.Now()
storage.SetDedupInterval(*minScrapeInterval)
storage.SetDataFlushInterval(*inmemoryDataFlushInterval)
@@ -76,7 +80,7 @@ func main() {
startSelfScraper()
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started VictoriaMetrics in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -86,9 +90,9 @@ func main() {
stopSelfScraper()
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
startTime = time.Now()
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
@@ -124,6 +128,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
{"api/v1/status/tsdb", "tsdb status page"},
{"api/v1/status/top_queries", "top queries"},
{"api/v1/status/active_queries", "active queries"},
{"-/reload", "reload configuration"},
})
return true
}

View File

@@ -7,6 +7,7 @@ import (
"fmt"
"io"
"log"
"math/rand"
"net"
"net/http"
"os"
@@ -180,7 +181,7 @@ func setUp() {
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
vmselect.Init()
vminsert.Init()
go httpserver.Serve(*httpListenAddr, false, requestHandler)
go httpserver.Serve(*httpListenAddrs, useProxyProtocol, requestHandler)
readyStorageCheckFunc := func() bool {
resp, err := http.Get(testHealthHTTPPath)
if err != nil {
@@ -226,7 +227,7 @@ func waitFor(timeout time.Duration, f func() bool) error {
}
func tearDown() {
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(*httpListenAddrs); err != nil {
log.Printf("cannot stop the webservice: %s", err)
}
vminsert.Stop()
@@ -413,6 +414,7 @@ func httpReadMetrics(t *testing.T, address, query string) []Metric {
}
return rows
}
func httpReadStruct(t *testing.T, address, query string, dst interface{}) {
t.Helper()
s := newSuite(t)
@@ -497,3 +499,73 @@ func (s *suite) greaterThan(a, b int) {
s.t.FailNow()
}
}
func TestImportJSONLines(t *testing.T) {
f := func(labelsCount, labelLen int) {
t.Helper()
reqURL := fmt.Sprintf("http://localhost%s/api/v1/import", testHTTPListenAddr)
line := generateJSONLine(labelsCount, labelLen)
req, err := http.NewRequest("POST", reqURL, bytes.NewBufferString(line))
if err != nil {
t.Fatalf("cannot create request: %s", err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
t.Fatalf("cannot perform request for labelsCount=%d, labelLen=%d: %s", labelsCount, labelLen, err)
}
if resp.StatusCode != 204 {
t.Fatalf("unexpected statusCode for labelsCount=%d, labelLen=%d; got %d; want 204", labelsCount, labelLen, resp.StatusCode)
}
}
// labels with various lengths
for i := 0; i < 500; i++ {
f(10, i*5)
}
// Too many labels
f(1000, 100)
// Too long labels
f(1, 100_000)
f(10, 100_000)
f(10, 10_000)
}
func generateJSONLine(labelsCount, labelLen int) string {
m := make(map[string]string, labelsCount)
m["__name__"] = generateSizedRandomString(labelLen)
for j := 1; j < labelsCount; j++ {
labelName := generateSizedRandomString(labelLen)
labelValue := generateSizedRandomString(labelLen)
m[labelName] = labelValue
}
type jsonLine struct {
Metric map[string]string `json:"metric"`
Values []float64 `json:"values"`
Timestamps []int64 `json:"timestamps"`
}
line := &jsonLine{
Metric: m,
Values: []float64{1.34},
Timestamps: []int64{time.Now().UnixNano() / 1e6},
}
data, err := json.Marshal(&line)
if err != nil {
panic(fmt.Errorf("cannot marshal JSON: %w", err))
}
data = append(data, '\n')
return string(data)
}
const alphabetSample = `qwertyuiopasdfghjklzxcvbnm`
func generateSizedRandomString(size int) string {
dst := make([]byte, size)
for i := range dst {
dst[i] = alphabetSample[rand.Intn(len(alphabetSample))]
}
return string(dst)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/appmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompb"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
@@ -49,16 +50,8 @@ func selfScraper(scrapeInterval time.Duration) {
var mrs []storage.MetricRow
var labels []prompb.Label
t := time.NewTicker(scrapeInterval)
var currentTimestamp int64
for {
select {
case <-selfScraperStopCh:
t.Stop()
logger.Infof("stopped self-scraping `/metrics` page")
return
case currentTime := <-t.C:
currentTimestamp = currentTime.UnixNano() / 1e6
}
f := func(currentTime time.Time, sendStaleMarkers bool) {
currentTimestamp := currentTime.UnixNano() / 1e6
bb.Reset()
appmetrics.WritePrometheusMetrics(&bb)
s := bytesutil.ToUnsafeString(bb.B)
@@ -83,12 +76,27 @@ func selfScraper(scrapeInterval time.Duration) {
mr := &mrs[len(mrs)-1]
mr.MetricNameRaw = storage.MarshalMetricNameRaw(mr.MetricNameRaw[:0], labels)
mr.Timestamp = currentTimestamp
mr.Value = r.Value
if sendStaleMarkers {
mr.Value = decimal.StaleNaN
} else {
mr.Value = r.Value
}
}
if err := vmstorage.AddRows(mrs); err != nil {
logger.Errorf("cannot store self-scraped metrics: %s", err)
}
}
for {
select {
case <-selfScraperStopCh:
f(time.Now(), true)
t.Stop()
logger.Infof("stopped self-scraping `/metrics` page")
return
case currentTime := <-t.C:
f(currentTime, false)
}
}
}
func addLabel(dst []prompb.Label, key, value string) []prompb.Label {

View File

@@ -7,6 +7,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logstorage"
)
@@ -17,13 +18,18 @@ var (
)
// ProcessQueryRequest handles /select/logsql/query request
func ProcessQueryRequest(w http.ResponseWriter, r *http.Request, stopCh <-chan struct{}) {
func ProcessQueryRequest(w http.ResponseWriter, r *http.Request, stopCh <-chan struct{}, cancel func()) {
// Extract tenantID
tenantID, err := logstorage.GetTenantIDFromRequest(r)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
limit, err := httputils.GetInt(r, "limit")
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return
}
qStr := r.FormValue("query")
q, err := logstorage.ParseQuery(qStr)
@@ -34,7 +40,7 @@ func ProcessQueryRequest(w http.ResponseWriter, r *http.Request, stopCh <-chan s
w.Header().Set("Content-Type", "application/stream+json; charset=utf-8")
sw := getSortWriter()
sw.Init(w, maxSortBufferSize.IntN())
sw.Init(w, maxSortBufferSize.IntN(), limit)
tenantIDs := []logstorage.TenantID{tenantID}
vlstorage.RunQuery(tenantIDs, q, stopCh, func(columns []logstorage.BlockColumn) {
if len(columns) == 0 {
@@ -46,7 +52,11 @@ func ProcessQueryRequest(w http.ResponseWriter, r *http.Request, stopCh <-chan s
for rowIdx := 0; rowIdx < rowsCount; rowIdx++ {
WriteJSONRow(bb, columns, rowIdx)
}
sw.MustWrite(bb.B)
if !sw.TryWrite(bb.B) {
cancel()
}
blockResultPool.Put(bb)
})
sw.FinalFlush()

View File

@@ -36,8 +36,12 @@ var sortWriterPool sync.Pool
// If the buf isn't empty at FinalFlush() call, then the buffered data
// is sorted by _time field.
type sortWriter struct {
mu sync.Mutex
w io.Writer
mu sync.Mutex
w io.Writer
maxLines int
linesWritten int
maxBufLen int
buf []byte
bufFlushed bool
@@ -47,58 +51,121 @@ type sortWriter struct {
func (sw *sortWriter) reset() {
sw.w = nil
sw.maxLines = 0
sw.linesWritten = 0
sw.maxBufLen = 0
sw.buf = sw.buf[:0]
sw.bufFlushed = false
sw.hasErr = false
}
func (sw *sortWriter) Init(w io.Writer, maxBufLen int) {
// Init initializes sw.
//
// If maxLines is set to positive value, then sw accepts up to maxLines
// and then rejects all the other lines by returning false from TryWrite.
func (sw *sortWriter) Init(w io.Writer, maxBufLen, maxLines int) {
sw.reset()
sw.w = w
sw.maxBufLen = maxBufLen
sw.maxLines = maxLines
}
func (sw *sortWriter) MustWrite(p []byte) {
// TryWrite writes p to sw.
//
// True is returned on successful write, false otherwise.
//
// Unsuccessful write may occur on underlying write error or when maxLines lines are already written to sw.
func (sw *sortWriter) TryWrite(p []byte) bool {
sw.mu.Lock()
defer sw.mu.Unlock()
if sw.hasErr {
return
return false
}
if sw.bufFlushed {
if _, err := sw.w.Write(p); err != nil {
if !sw.writeToUnderlyingWriterLocked(p) {
sw.hasErr = true
return false
}
return
return true
}
if len(sw.buf)+len(p) < sw.maxBufLen {
sw.buf = append(sw.buf, p...)
return
return true
}
sw.bufFlushed = true
if len(sw.buf) > 0 {
if _, err := sw.w.Write(sw.buf); err != nil {
sw.hasErr = true
return
if !sw.writeToUnderlyingWriterLocked(sw.buf) {
sw.hasErr = true
return false
}
sw.buf = sw.buf[:0]
if !sw.writeToUnderlyingWriterLocked(p) {
sw.hasErr = true
return false
}
return true
}
func (sw *sortWriter) writeToUnderlyingWriterLocked(p []byte) bool {
if len(p) == 0 {
return true
}
if sw.maxLines > 0 {
if sw.linesWritten >= sw.maxLines {
return false
}
sw.buf = sw.buf[:0]
var linesLeft int
p, linesLeft = trimLines(p, sw.maxLines-sw.linesWritten)
println("DEBUG: end trimLines", string(p), linesLeft)
sw.linesWritten += linesLeft
}
if _, err := sw.w.Write(p); err != nil {
sw.hasErr = true
return false
}
return true
}
func trimLines(p []byte, maxLines int) ([]byte, int) {
println("DEBUG: start trimLines", string(p), maxLines)
if maxLines <= 0 {
return nil, 0
}
n := bytes.Count(p, newline)
if n < maxLines {
return p, n
}
for n >= maxLines {
idx := bytes.LastIndexByte(p, '\n')
p = p[:idx]
n--
}
return p[:len(p)+1], maxLines
}
var newline = []byte("\n")
func (sw *sortWriter) FinalFlush() {
if sw.hasErr || sw.bufFlushed {
return
}
rs := getRowsSorter()
rs.parseRows(sw.buf)
rs.sort()
WriteJSONRows(sw.w, rs.rows)
rows := rs.rows
if sw.maxLines > 0 && len(rows) > sw.maxLines {
rows = rows[:sw.maxLines]
}
WriteJSONRows(sw.w, rows)
putRowsSorter(rs)
}

View File

@@ -7,15 +7,16 @@ import (
)
func TestSortWriter(t *testing.T) {
f := func(maxBufLen int, data string, expectedResult string) {
f := func(maxBufLen, maxLines int, data string, expectedResult string) {
t.Helper()
var bb bytes.Buffer
sw := getSortWriter()
sw.Init(&bb, maxBufLen)
sw.Init(&bb, maxBufLen, maxLines)
for _, s := range strings.Split(data, "\n") {
sw.MustWrite([]byte(s + "\n"))
if !sw.TryWrite([]byte(s + "\n")) {
break
}
}
sw.FinalFlush()
putSortWriter(sw)
@@ -26,14 +27,20 @@ func TestSortWriter(t *testing.T) {
}
}
f(100, "", "")
f(100, "{}", "{}\n")
f(100, 0, "", "")
f(100, 0, "{}", "{}\n")
data := `{"_time":"def","_msg":"xxx"}
{"_time":"abc","_msg":"foo"}`
resultExpected := `{"_time":"abc","_msg":"foo"}
{"_time":"def","_msg":"xxx"}
`
f(100, data, resultExpected)
f(10, data, data+"\n")
f(100, 0, data, resultExpected)
f(10, 0, data, data+"\n")
// Test with the maxLines
f(100, 1, data, `{"_time":"abc","_msg":"foo"}`+"\n")
f(10, 1, data, `{"_time":"def","_msg":"xxx"}`+"\n")
f(10, 2, data, data+"\n")
f(100, 2, data, resultExpected)
}

View File

@@ -1,6 +1,7 @@
package vlselect
import (
"context"
"embed"
"flag"
"fmt"
@@ -101,7 +102,8 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
// Limit the number of concurrent queries, which can consume big amounts of CPU.
startTime := time.Now()
stopCh := r.Context().Done()
ctx := r.Context()
stopCh := ctx.Done()
select {
case concurrencyLimitCh <- struct{}{}:
defer func() { <-concurrencyLimitCh }()
@@ -139,11 +141,15 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
}
ctxWithCancel, cancel := context.WithCancel(ctx)
defer cancel()
stopCh = ctxWithCancel.Done()
switch {
case path == "/logsql/query":
logsqlQueryRequests.Inc()
httpserver.EnableCORS(w, r)
logsql.ProcessQueryRequest(w, r, stopCh)
logsql.ProcessQueryRequest(w, r, stopCh, cancel)
return true
default:
return false

View File

@@ -1,13 +1,13 @@
{
"files": {
"main.css": "./static/css/main.d1313636.css",
"main.js": "./static/js/main.1919fefe.js",
"main.css": "./static/css/main.1e22ee10.css",
"main.js": "./static/js/main.92cf3903.js",
"static/js/522.da77e7b3.chunk.js": "./static/js/522.da77e7b3.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.8644fd7c964802dd34a9.md",
"static/media/MetricsQL.md": "./static/media/MetricsQL.48b7b7105a48d7775f01.md",
"index.html": "./index.html"
},
"entrypoints": [
"static/css/main.d1313636.css",
"static/js/main.1919fefe.js"
"static/css/main.1e22ee10.css",
"static/js/main.92cf3903.js"
]
}

View File

@@ -1 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.1919fefe.js"></script><link href="./static/css/main.d1313636.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.92cf3903.js"></script><link href="./static/css/main.1e22ee10.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -810,7 +810,7 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
#### timestamp
`timestamp(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last raw sample
`timestamp(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -819,7 +819,7 @@ This function is supported by PromQL. See also [timestamp_with_name](#timestamp_
#### timestamp_with_name
`timestamp_with_name(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last raw sample
`timestamp_with_name(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are preserved in the resulting rollups.
@@ -828,7 +828,7 @@ See also [timestamp](#timestamp).
#### tfirst_over_time
`tfirst_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the first raw sample
`tfirst_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the first raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -837,7 +837,7 @@ See also [first_over_time](#first_over_time).
#### tlast_change_over_time
`tlast_change_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last change
`tlast_change_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last change
per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering) on the given lookbehind window `d`.
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -852,7 +852,7 @@ See also [tlast_change_over_time](#tlast_change_over_time).
#### tmax_over_time
`tmax_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the raw sample
`tmax_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the raw sample
with the maximum value on the given lookbehind window `d`. It is calculated independently per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
@@ -862,7 +862,7 @@ See also [max_over_time](#max_over_time).
#### tmin_over_time
`tmin_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the raw sample
`tmin_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the raw sample
with the minimum value on the given lookbehind window `d`. It is calculated independently per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).

View File

@@ -0,0 +1,95 @@
package datadogsketches
import (
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogsketches"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogsketches/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogutils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/tenantmetrics"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vmagent_rows_inserted_total{type="datadogsketches"}`)
rowsTenantInserted = tenantmetrics.NewCounterMap(`vmagent_tenant_inserted_rows_total{type="datadogsketches"}`)
rowsPerInsert = metrics.NewHistogram(`vmagent_rows_per_insert{type="datadogsketches"}`)
)
// InsertHandlerForHTTP processes remote write for DataDog POST /api/beta/sketches request.
func InsertHandlerForHTTP(at *auth.Token, req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
ce := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, ce, func(sketches []*datadogsketches.Sketch) error {
return insertRows(at, sketches, extraLabels)
})
}
func insertRows(at *auth.Token, sketches []*datadogsketches.Sketch, extraLabels []prompbmarshal.Label) error {
ctx := common.GetPushCtx()
defer common.PutPushCtx(ctx)
rowsTotal := 0
tssDst := ctx.WriteRequest.Timeseries[:0]
labels := ctx.Labels[:0]
samples := ctx.Samples[:0]
for _, sketch := range sketches {
ms := sketch.ToSummary()
for _, m := range ms {
labelsLen := len(labels)
labels = append(labels, prompbmarshal.Label{
Name: "__name__",
Value: m.Name,
})
for _, label := range m.Labels {
labels = append(labels, prompbmarshal.Label{
Name: label.Name,
Value: label.Value,
})
}
for _, tag := range sketch.Tags {
name, value := datadogutils.SplitTag(tag)
if name == "host" {
name = "exported_host"
}
labels = append(labels, prompbmarshal.Label{
Name: name,
Value: value,
})
}
labels = append(labels, extraLabels...)
samplesLen := len(samples)
for _, p := range m.Points {
samples = append(samples, prompbmarshal.Sample{
Timestamp: p.Timestamp,
Value: p.Value,
})
}
rowsTotal += len(m.Points)
tssDst = append(tssDst, prompbmarshal.TimeSeries{
Labels: labels[labelsLen:],
Samples: samples[samplesLen:],
})
}
}
ctx.WriteRequest.Timeseries = tssDst
ctx.Labels = labels
ctx.Samples = samples
if !remotewrite.TryPush(at, &ctx.WriteRequest) {
return remotewrite.ErrQueueFullHTTPRetry
}
rowsInserted.Add(rowsTotal)
if at != nil {
rowsTenantInserted.Get(at).Add(rowsTotal)
}
rowsPerInsert.Update(float64(rowsTotal))
return nil
}

View File

@@ -12,6 +12,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/datadogsketches"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/datadogv1"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/datadogv2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/graphite"
@@ -45,10 +46,10 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8429", "TCP address to listen for http connections. "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. "+
"Set this flag to empty value in order to disable listening on any port. This mode may be useful for running multiple vmagent instances on the same server. "+
"Note that /targets and /metrics pages aren't available if -httpListenAddr=''. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
influxListenAddr = flag.String("influxListenAddr", "", "TCP and UDP address to listen for InfluxDB line protocol data. Usually :8089 must be set. Doesn't work if empty. "+
@@ -69,7 +70,8 @@ var (
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
configAuthKey = flagutil.NewPassword("configAuthKey", "Authorization key for accessing /config page. It must be passed via authKey query arg")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
dryRun = flag.Bool("dryRun", false, "Whether to check config files without running vmagent. The following files are checked: "+
"-promscrape.config, -remoteWrite.relabelConfig, -remoteWrite.urlRelabelConfig, -remoteWrite.streamAggr.config . "+
"Unknown config entries aren't allowed in -promscrape.config by default. This can be changed by passing -promscrape.config.strictParse=false command-line flag")
@@ -118,7 +120,11 @@ func main() {
return
}
logger.Infof("starting vmagent at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8429"}
}
logger.Infof("starting vmagent at %q...", listenAddrs)
startTime := time.Now()
remotewrite.Init()
common.StartUnmarshalWorkers()
@@ -141,9 +147,7 @@ func main() {
promscrape.Init(remotewrite.PushDropSamplesOnFailure)
if len(*httpListenAddr) > 0 {
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
}
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started vmagent in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -152,13 +156,11 @@ func main() {
pushmetrics.Stop()
startTime = time.Now()
if len(*httpListenAddr) > 0 {
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
promscrape.Stop()
@@ -367,6 +369,15 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(202)
fmt.Fprintf(w, `{"status":"ok"}`)
return true
case "/datadog/api/beta/sketches":
datadogsketchesWriteRequests.Inc()
if err := datadogsketches.InsertHandlerForHTTP(nil, r); err != nil {
datadogsketchesWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(202)
return true
case "/datadog/api/v1/validate":
datadogValidateRequests.Inc()
// See https://docs.datadoghq.com/api/latest/authentication/#validate-api-key
@@ -421,7 +432,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/prometheus/config", "/config":
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
if !httpserver.CheckAuthFlag(w, r, configAuthKey.Get(), "configAuthKey") {
return true
}
promscrapeConfigRequests.Inc()
@@ -430,7 +441,7 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/prometheus/api/v1/status/config", "/api/v1/status/config":
// See https://prometheus.io/docs/prometheus/latest/querying/api/#config
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
if !httpserver.CheckAuthFlag(w, r, configAuthKey.Get(), "configAuthKey") {
return true
}
promscrapeStatusConfigRequests.Inc()
@@ -440,6 +451,9 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
fmt.Fprintf(w, `{"status":"success","data":{"yaml":%q}}`, bb.B)
return true
case "/prometheus/-/reload", "/-/reload":
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
promscrapeConfigReloadRequests.Inc()
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
@@ -599,6 +613,15 @@ func processMultitenantRequest(w http.ResponseWriter, r *http.Request, path stri
w.WriteHeader(202)
fmt.Fprintf(w, `{"status":"ok"}`)
return true
case "datadog/api/beta/sketches":
datadogsketchesWriteRequests.Inc()
if err := datadogsketches.InsertHandlerForHTTP(at, r); err != nil {
datadogsketchesWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(202)
return true
case "datadog/api/v1/validate":
datadogValidateRequests.Inc()
// See https://docs.datadoghq.com/api/latest/authentication/#validate-api-key
@@ -655,6 +678,9 @@ var (
datadogv2WriteRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v2/series", protocol="datadog"}`)
datadogv2WriteErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/datadog/api/v2/series", protocol="datadog"}`)
datadogsketchesWriteRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/beta/sketches", protocol="datadog"}`)
datadogsketchesWriteErrors = metrics.NewCounter(`vmagent_http_request_errors_total{path="/datadog/api/beta/sketches", protocol="datadog"}`)
datadogValidateRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v1/validate", protocol="datadog"}`)
datadogCheckRunRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/api/v1/check_run", protocol="datadog"}`)
datadogIntakeRequests = metrics.NewCounter(`vmagent_http_requests_total{path="/datadog/intake", protocol="datadog"}`)

View File

@@ -18,6 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timerpool"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/metrics"
)
@@ -34,6 +35,7 @@ var (
proxyURL = flagutil.NewArrayString("remoteWrite.proxyURL", "Optional proxy URL for writing data to the corresponding -remoteWrite.url. "+
"Supported proxies: http, https, socks5. Example: -remoteWrite.proxyURL=socks5://proxy:1234")
tlsHandshakeTimeout = flagutil.NewArrayDuration("remoteWrite.tlsHandshakeTimeout", 20*time.Second, "The timeout for estabilishing tls connections to the corresponding -remoteWrite.url")
tlsInsecureSkipVerify = flagutil.NewArrayBool("remoteWrite.tlsInsecureSkipVerify", "Whether to skip tls verification when connecting to the corresponding -remoteWrite.url")
tlsCertFile = flagutil.NewArrayString("remoteWrite.tlsCertFile", "Optional path to client-side TLS certificate file to use when connecting "+
"to the corresponding -remoteWrite.url")
@@ -121,7 +123,7 @@ func newHTTPClient(argIdx int, remoteWriteURL, sanitizedURL string, fq *persiste
tr := &http.Transport{
DialContext: statDial,
TLSClientConfig: tlsCfg,
TLSHandshakeTimeout: 10 * time.Second,
TLSHandshakeTimeout: tlsHandshakeTimeout.GetOptionalArg(argIdx),
MaxConnsPerHost: 2 * concurrency,
MaxIdleConnsPerHost: 2 * concurrency,
IdleConnTimeout: time.Minute,
@@ -395,7 +397,8 @@ func (c *client) newRequest(url string, body []byte) (*http.Request, error) {
// Otherwise it tries sending the block to remote storage indefinitely.
func (c *client) sendBlockHTTP(block []byte) bool {
c.rl.register(len(block), c.stopCh)
retryDuration := time.Second
maxRetryDuration := timeutil.AddJitterToDuration(time.Minute)
retryDuration := timeutil.AddJitterToDuration(time.Second)
retriesCount := 0
again:
@@ -405,8 +408,8 @@ again:
if err != nil {
c.errorsCount.Inc()
retryDuration *= 2
if retryDuration > time.Minute {
retryDuration = time.Minute
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
logger.Warnf("couldn't send a block with size %d bytes to %q: %s; re-sending the block in %.3f seconds",
len(block), c.sanitizedURL, err, retryDuration.Seconds())
@@ -452,8 +455,8 @@ again:
// Unexpected status code returned
retriesCount++
retryDuration *= 2
if retryDuration > time.Minute {
retryDuration = time.Minute
if retryDuration > maxRetryDuration {
retryDuration = maxRetryDuration
}
body, err := io.ReadAll(resp.Body)
_ = resp.Body.Close()

View File

@@ -7,6 +7,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/decimal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
@@ -15,6 +16,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
"github.com/VictoriaMetrics/metrics"
"github.com/golang/snappy"
)
@@ -69,7 +71,8 @@ func (ps *pendingSeries) periodicFlusher() {
if flushSeconds <= 0 {
flushSeconds = 1
}
ticker := time.NewTicker(*flushInterval)
d := timeutil.AddJitterToDuration(*flushInterval)
ticker := time.NewTicker(d)
defer ticker.Stop()
for {
select {
@@ -107,11 +110,12 @@ type writeRequest struct {
wr prompbmarshal.WriteRequest
tss []prompbmarshal.TimeSeries
tss []prompbmarshal.TimeSeries
labels []prompbmarshal.Label
samples []prompbmarshal.Sample
buf []byte
// buf holds labels data
buf []byte
}
func (wr *writeRequest) reset() {
@@ -222,33 +226,45 @@ func (wr *writeRequest) copyTimeSeries(dst, src *prompbmarshal.TimeSeries) {
wr.buf = buf
}
// marshalConcurrency limits the maximum number of concurrent workers, which marshal and compress WriteRequest.
var marshalConcurrencyCh = make(chan struct{}, cgroup.AvailableCPUs())
func tryPushWriteRequest(wr *prompbmarshal.WriteRequest, tryPushBlock func(block []byte) bool, isVMRemoteWrite bool) bool {
if len(wr.Timeseries) == 0 {
// Nothing to push
return true
}
marshalConcurrencyCh <- struct{}{}
bb := writeRequestBufPool.Get()
bb.B = wr.MarshalProtobuf(bb.B[:0])
if len(bb.B) <= maxUnpackedBlockSize.IntN() {
zb := snappyBufPool.Get()
zb := compressBufPool.Get()
if isVMRemoteWrite {
zb.B = zstd.CompressLevel(zb.B[:0], bb.B, *vmProtoCompressLevel)
} else {
zb.B = snappy.Encode(zb.B[:cap(zb.B)], bb.B)
}
writeRequestBufPool.Put(bb)
<-marshalConcurrencyCh
if len(zb.B) <= persistentqueue.MaxBlockSize {
if !tryPushBlock(zb.B) {
return false
zbLen := len(zb.B)
ok := tryPushBlock(zb.B)
compressBufPool.Put(zb)
if ok {
blockSizeRows.Update(float64(len(wr.Timeseries)))
blockSizeBytes.Update(float64(zbLen))
}
blockSizeRows.Update(float64(len(wr.Timeseries)))
blockSizeBytes.Update(float64(len(zb.B)))
snappyBufPool.Put(zb)
return true
return ok
}
snappyBufPool.Put(zb)
compressBufPool.Put(zb)
} else {
writeRequestBufPool.Put(bb)
<-marshalConcurrencyCh
}
// Too big block. Recursively split it into smaller parts if possible.
@@ -294,5 +310,7 @@ var (
blockSizeRows = metrics.NewHistogram(`vmagent_remotewrite_block_size_rows`)
)
var writeRequestBufPool bytesutil.ByteBufferPool
var snappyBufPool bytesutil.ByteBufferPool
var (
writeRequestBufPool bytesutil.ByteBufferPool
compressBufPool bytesutil.ByteBufferPool
)

View File

@@ -25,6 +25,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -184,7 +185,8 @@ func processFlags() {
func setUp() {
vmstorage.Init(promql.ResetRollupResultCacheIfNeeded)
go httpserver.Serve(httpListenAddr, false, func(w http.ResponseWriter, r *http.Request) bool {
var ab flagutil.ArrayBool
go httpserver.Serve([]string{httpListenAddr}, &ab, func(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/prometheus/api/v1/query":
if err := prometheus.QueryHandler(nil, time.Now(), w, r); err != nil {
@@ -225,7 +227,7 @@ checkCheck:
}
func tearDown() {
if err := httpserver.Stop(httpListenAddr); err != nil {
if err := httpserver.Stop([]string{httpListenAddr}); err != nil {
logger.Errorf("cannot stop the webservice: %s", err)
}
vmstorage.Stop()

View File

@@ -68,6 +68,7 @@ publish-vmalert:
test-vmalert:
go test -v -race -cover ./app/vmalert -loggerLevel=ERROR
go test -v -race -cover ./app/vmalert/rule
go test -v -race -cover ./app/vmalert/templates
go test -v -race -cover ./app/vmalert/datasource
go test -v -race -cover ./app/vmalert/notifier

View File

@@ -22,6 +22,7 @@ groups:
{{ . | first | value }}
{{ end }}
description: "It is {{ $value }} connections for {{$labels.instance}}"
link: http://localhost:3000/d/wNf0q_kZk?viewPanel=51&from={{($activeAt.Add (parseDurationTime "1h")).UnixMilli}}&to={{($activeAt.Add (parseDurationTime "-1h")).UnixMilli}}
- alert: ExampleAlertAlwaysFiring
update_entries_limit: -1
expr: sum by(job)

View File

@@ -10,6 +10,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
)
@@ -93,7 +94,7 @@ func Init(extraParams url.Values) (QuerierBuilder, error) {
logger.Warnf("flag `-datasource.lookback` will be deprecated soon. Please use `-rule.evalDelay` command-line flag instead. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5155 for details.")
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -59,8 +59,8 @@ absolute path to all .tpl files in root.
configCheckInterval = flag.Duration("configCheckInterval", 0, "Interval for checking for changes in '-rule' or '-notifier.config' files. "+
"By default, the checking is disabled. Send SIGHUP signal in order to force config check for changes.")
httpListenAddr = flag.String("httpListenAddr", ":8880", "Address to listen for http connections. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "Address to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
evaluationInterval = flag.Duration("evaluationInterval", time.Minute, "How often to evaluate the rules")
@@ -178,15 +178,19 @@ func main() {
go configReload(ctx, manager, groupsCfg, sighupCh)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8880"}
}
rh := &requestHandler{m: manager}
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, rh.handler)
go httpserver.Serve(listenAddrs, useProxyProtocol, rh.handler)
pushmetrics.Init()
sig := procutil.WaitForSigterm()
logger.Infof("service received signal %s", sig)
pushmetrics.Stop()
if err := httpserver.Stop(*httpListenAddr); err != nil {
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
cancel()
@@ -248,7 +252,13 @@ func newManager(ctx context.Context) (*manager, error) {
func getExternalURL(customURL string) (*url.URL, error) {
if customURL == "" {
// use local hostname as external URL
return getHostnameAsExternalURL(*httpListenAddr, httpserver.IsTLS())
listenAddr := ":8880"
if len(*httpListenAddrs) > 0 {
listenAddr = (*httpListenAddrs)[0]
}
isTLS := httpserver.IsTLS(0)
return getHostnameAsExternalURL(listenAddr, isTLS)
}
u, err := url.Parse(customURL)
if err != nil {
@@ -260,13 +270,13 @@ func getExternalURL(customURL string) (*url.URL, error) {
return u, nil
}
func getHostnameAsExternalURL(httpListenAddr string, isSecure bool) (*url.URL, error) {
func getHostnameAsExternalURL(addr string, isSecure bool) (*url.URL, error) {
hname, err := os.Hostname()
if err != nil {
return nil, fmt.Errorf("failed to get hostname: %w", err)
}
port := ""
if ipport := strings.Split(httpListenAddr, ":"); len(ipport) > 1 {
if ipport := strings.Split(addr, ":"); len(ipport) > 1 {
port = ":" + ipport[1]
}
schema := "http://"

View File

@@ -156,11 +156,14 @@ func (m *manager) update(ctx context.Context, groupsCfg []config.Group, restore
var wg sync.WaitGroup
for _, item := range toUpdate {
wg.Add(1)
// cancel evaluation so the Update will be applied as fast as possible.
// it is important to call InterruptEval before the update, because cancel fn
// can be re-assigned during the update.
item.old.InterruptEval()
go func(old *rule.Group, new *rule.Group) {
old.UpdateWith(new)
wg.Done()
}(item.old, item.new)
item.old.InterruptEval()
}
wg.Wait()
}

View File

@@ -11,6 +11,7 @@ import (
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promauth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/promrelabel"
)
@@ -127,7 +128,7 @@ func NewAlertManager(alertManagerURL string, fn AlertURLGenerator, authCfg proma
if authCfg.TLSConfig != nil {
tls = authCfg.TLSConfig
}
tr, err := utils.Transport(alertManagerURL, tls.CertFile, tls.KeyFile, tls.CAFile, tls.ServerName, tls.InsecureSkipVerify)
tr, err := httputils.Transport(alertManagerURL, tls.CertFile, tls.KeyFile, tls.CAFile, tls.ServerName, tls.InsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/datasource"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
)
var (
@@ -60,7 +61,7 @@ func Init() (datasource.QuerierBuilder, error) {
if *addr == "" {
return nil, nil
}
tr, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
tr, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -11,7 +11,7 @@ import (
"github.com/golang/snappy"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
)
@@ -30,7 +30,7 @@ func NewDebugClient() (*DebugClient, error) {
return nil, nil
}
t, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
t, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -8,6 +8,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
)
var (
@@ -64,7 +65,7 @@ func Init(ctx context.Context) (*Client, error) {
return nil, nil
}
t, err := utils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
t, err := httputils.Transport(*addr, *tlsCertFile, *tlsKeyFile, *tlsCAFile, *tlsServerName, *tlsInsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %w", err)
}

View File

@@ -324,16 +324,6 @@ func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]
return nil, fmt.Errorf("failed to create alert: %w", err)
}
// if alert is instant, For: 0
if ar.For == 0 {
a.State = notifier.StateFiring
for i := range s.Values {
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
}
continue
}
// if alert with For > 0
prevT := time.Time{}
for i := range s.Values {
at := time.Unix(s.Timestamps[i], 0)
@@ -354,6 +344,10 @@ func (ar *AlertingRule) execRange(ctx context.Context, start, end time.Time) ([]
a.Start = at
}
prevT = at
if ar.For == 0 {
// rules with `for: 0` are always firing when they have Value
a.State = notifier.StateFiring
}
result = append(result, ar.alertToTimeSeries(a, s.Timestamps[i])...)
// save alert's state on last iteration, so it can be used on the next execRange call
@@ -446,14 +440,13 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
a.KeepFiringSince = time.Time{}
continue
}
a, err := ar.newAlert(m, ls, start, qFn)
a, err := ar.newAlert(m, ls, ts, qFn)
if err != nil {
curState.Err = fmt.Errorf("failed to create alert: %w", err)
return nil, curState.Err
}
a.ID = h
a.State = notifier.StatePending
a.ActiveAt = ts
ar.alerts[h] = a
ar.logDebugf(ts, a, "created in state PENDING")
}
@@ -479,7 +472,7 @@ func (ar *AlertingRule) exec(ctx context.Context, ts time.Time, limit int) ([]pr
}
// alerts with ar.KeepFiringFor>0 may remain FIRING
// even if their expression isn't true anymore
if ts.Sub(a.KeepFiringSince) > ar.KeepFiringFor {
if ts.Sub(a.KeepFiringSince) >= ar.KeepFiringFor {
a.State = notifier.StateInactive
a.ResolvedAt = ts
ar.logDebugf(ts, a, "FIRING => INACTIVE: is absent in current evaluation round")
@@ -559,9 +552,9 @@ func (ar *AlertingRule) newAlert(m datasource.Metric, ls *labelSet, start time.T
}
const (
// alertMetricName is the metric name for synthetic alert timeseries.
// alertMetricName is the metric name for time series reflecting the alert state.
alertMetricName = "ALERTS"
// alertForStateMetricName is the metric name for 'for' state of alert.
// alertForStateMetricName is the metric name for time series reflecting the moment of time when alert became active.
alertForStateMetricName = "ALERTS_FOR_STATE"
// alertNameLabel is the label name indicating the name of an alert.
@@ -576,12 +569,10 @@ const (
// alertToTimeSeries converts the given alert with the given timestamp to time series
func (ar *AlertingRule) alertToTimeSeries(a *notifier.Alert, timestamp int64) []prompbmarshal.TimeSeries {
var tss []prompbmarshal.TimeSeries
tss = append(tss, alertToTimeSeries(a, timestamp))
if ar.For > 0 {
tss = append(tss, alertForToTimeSeries(a, timestamp))
return []prompbmarshal.TimeSeries{
alertToTimeSeries(a, timestamp),
alertForToTimeSeries(a, timestamp),
}
return tss
}
func alertToTimeSeries(a *notifier.Alert, timestamp int64) prompbmarshal.TimeSeries {

View File

@@ -28,20 +28,26 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
}{
{
newTestAlertingRule("instant", 0),
&notifier.Alert{State: notifier.StateFiring},
&notifier.Alert{State: notifier.StateFiring, ActiveAt: timestamp.Add(time.Second)},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
}),
},
},
{
newTestAlertingRule("instant extra labels", 0),
&notifier.Alert{State: notifier.StateFiring, Labels: map[string]string{
"job": "foo",
"instance": "bar",
}},
&notifier.Alert{State: notifier.StateFiring, ActiveAt: timestamp.Add(time.Second),
Labels: map[string]string{
"job": "foo",
"instance": "bar",
}},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
@@ -49,19 +55,33 @@ func TestAlertingRule_ToTimeSeries(t *testing.T) {
"job": "foo",
"instance": "bar",
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
"job": "foo",
"instance": "bar",
}),
},
},
{
newTestAlertingRule("instant labels override", 0),
&notifier.Alert{State: notifier.StateFiring, Labels: map[string]string{
alertStateLabel: "foo",
"__name__": "bar",
}},
&notifier.Alert{State: notifier.StateFiring, ActiveAt: timestamp.Add(time.Second),
Labels: map[string]string{
alertStateLabel: "foo",
"__name__": "bar",
}},
[]prompbmarshal.TimeSeries{
newTimeSeries([]float64{1}, []int64{timestamp.UnixNano()}, map[string]string{
"__name__": alertMetricName,
alertStateLabel: notifier.StateFiring.String(),
}),
newTimeSeries([]float64{float64(timestamp.Add(time.Second).Unix())},
[]int64{timestamp.UnixNano()},
map[string]string{
"__name__": alertForStateMetricName,
alertStateLabel: "foo",
}),
},
},
{
@@ -308,14 +328,17 @@ func TestAlertingRule_Exec(t *testing.T) {
fq := &datasource.FakeQuerier{}
tc.rule.q = fq
tc.rule.GroupID = fakeGroup.ID()
ts := time.Now()
for i, step := range tc.steps {
fq.Reset()
fq.Add(step...)
if _, err := tc.rule.exec(context.TODO(), time.Now(), 0); err != nil {
if _, err := tc.rule.exec(context.TODO(), ts, 0); err != nil {
t.Fatalf("unexpected err: %s", err)
}
// artificial delay between applying steps
time.Sleep(defaultStep)
// shift the execution timestamp before the next iteration
ts = ts.Add(defaultStep)
if _, ok := tc.expAlerts[i]; !ok {
continue
}
@@ -367,7 +390,7 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{Values: []float64{1}, Timestamps: []int64{1}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
},
nil,
},
@@ -378,8 +401,9 @@ func TestAlertingRule_ExecRange(t *testing.T) {
},
[]*notifier.Alert{
{
Labels: map[string]string{"name": "foo"},
State: notifier.StateFiring,
Labels: map[string]string{"name": "foo"},
State: notifier.StateFiring,
ActiveAt: time.Unix(1, 0),
},
},
nil,
@@ -390,9 +414,9 @@ func TestAlertingRule_ExecRange(t *testing.T) {
{Values: []float64{1, 1, 1}, Timestamps: []int64{1e3, 2e3, 3e3}},
},
[]*notifier.Alert{
{State: notifier.StateFiring},
{State: notifier.StateFiring},
{State: notifier.StateFiring},
{State: notifier.StateFiring, ActiveAt: time.Unix(1e3, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(2e3, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(3e3, 0)},
},
nil,
},
@@ -460,6 +484,20 @@ func TestAlertingRule_ExecRange(t *testing.T) {
For: time.Second,
}},
},
{
newTestAlertingRuleWithEvalInterval("firing=>inactive=>inactive=>firing=>firing", 0, time.Second),
[]datasource.Metric{
{Values: []float64{1, 1, 1, 1}, Timestamps: []int64{1, 4, 5, 6}},
},
[]*notifier.Alert{
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0)},
// It is expected for ActiveAT to remain the same while rule continues to fire in each iteration
{State: notifier.StateFiring, ActiveAt: time.Unix(4, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(4, 0)},
{State: notifier.StateFiring, ActiveAt: time.Unix(4, 0)},
},
nil,
},
{
newTestAlertingRule("for=>pending=>firing=>pending=>firing=>pending", time.Second),
[]datasource.Metric{
@@ -534,21 +572,25 @@ func TestAlertingRule_ExecRange(t *testing.T) {
},
},
[]*notifier.Alert{
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"source": "vm",
}},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0),
Labels: map[string]string{
"source": "vm",
}},
{State: notifier.StateFiring, ActiveAt: time.Unix(100, 0),
Labels: map[string]string{
"source": "vm",
}},
//
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{State: notifier.StateFiring, Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{State: notifier.StateFiring, ActiveAt: time.Unix(1, 0),
Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
{State: notifier.StateFiring, ActiveAt: time.Unix(5, 0),
Labels: map[string]string{
"foo": "bar",
"source": "vm",
}},
},
nil,
},
@@ -1095,6 +1137,12 @@ func newTestAlertingRule(name string, waitFor time.Duration) *AlertingRule {
return &rule
}
func newTestAlertingRuleWithEvalInterval(name string, waitFor, evalInterval time.Duration) *AlertingRule {
rule := newTestAlertingRule(name, waitFor)
rule.EvalInterval = evalInterval
return rule
}
func newTestAlertingRuleWithKeepFiring(name string, waitFor, keepFiringFor time.Duration) *AlertingRule {
rule := newTestAlertingRule(name, waitFor)
rule.KeepFiringFor = keepFiringFor

View File

@@ -1,5 +1,12 @@
function expandAll() {
$('.collapse').addClass('show');
$('.group-heading').each(function () {
let style = $(this).attr("style")
// display only elements that are currently visible
if (style === "display: none;") {
return
}
$(this).next().addClass('show')
});
}
function collapseAll() {
@@ -15,6 +22,100 @@ function toggleByID(id) {
}
}
function debounce(func, delay) {
let timer;
return function (...args) {
clearTimeout(timer);
timer = setTimeout(() => {
func.apply(this, args);
}, delay);
};
}
$('#search').on("keyup", debounce(search, 500));
// search shows or hides groups&rules that satisfy the search phrase.
// case-insensitive, respects GET param `search`.
function search() {
$(".rule").show();
let groupHeader = $(".group-heading")
let searchPhrase = $("#search").val().toLowerCase()
if (searchPhrase.length === 0) {
groupHeader.show()
setParamURL('search', '')
return
}
$(".rule-table").removeClass('show');
groupHeader.hide()
searchPhrase = searchPhrase.toLowerCase()
filterRuleByName(searchPhrase);
filterRuleByLabels(searchPhrase);
filterGroupsByName(searchPhrase);
setParamURL('search', searchPhrase)
}
function setParamURL(key, value) {
let url = new URL(location.href)
url.searchParams.set(key, value);
window.history.replaceState(null, null, `?${url.searchParams.toString()}`);
}
function getParamURL(key) {
let url = new URL(location.href)
return url.searchParams.get(key)
}
function filterGroupsByName(searchPhrase) {
$(".group-heading").each(function () {
const groupName = $(this).attr('data-group-name').toLowerCase();
const hasValue = groupName.indexOf(searchPhrase) >= 0
if (!hasValue) {
return
}
const target = $(this).attr("data-bs-target");
$(`div[id="${target}"] .rule`).show();
$(this).show();
});
}
function filterRuleByName(searchPhrase) {
$(".rule").each(function () {
const ruleName = $(this).attr("data-rule-name").toLowerCase();
const hasValue = ruleName.indexOf(searchPhrase) >= 0
if (!hasValue) {
$(this).hide();
return
}
const target = $(this).attr('data-bs-target')
$(`#rules-${target}`).addClass('show');
$(`div[data-bs-target='rules-${target}']`).show();
$(this).show();
});
}
function filterRuleByLabels(searchPhrase) {
$(".rule").each(function () {
const matches = $(".label", this).filter(function () {
const label = $(this).text().toLowerCase();
return label.indexOf(searchPhrase) >= 0;
}).length;
if (matches > 0) {
const target = $(this).attr('data-bs-target')
$(`#rules-${target}`).addClass('show');
$(`div[data-bs-target='rules-${target}']`).show();
$(this).show();
}
});
}
$(document).ready(function () {
$(".group-heading a").click(function (e) {
e.stopPropagation(); // prevent collapse logic on link click
@@ -32,6 +133,13 @@ $(document).ready(function () {
});
});
// update search element with value from URL, if any
let searchPhrase = getParamURL('search')
$("#search").val(searchPhrase)
// apply filtering by search phrase
search()
let hash = window.location.hash.substr(1);
toggleByID(hash);
});

View File

@@ -12,11 +12,15 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/notifier"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/rule"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmalert/tpl"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
var reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
var (
apiLinks = [][2]string{
// api links are relative since they can be used by external clients,
@@ -84,7 +88,10 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
WriteRuleDetails(w, r, rule)
return true
case "/vmalert/groups":
WriteListGroups(w, r, rh.groups())
var data []apiGroup
rf := extractRulesFilter(r)
data = rh.groups(rf)
WriteListGroups(w, r, data)
return true
case "/vmalert/notifiers":
WriteListTargets(w, r, notifier.GetTargets())
@@ -95,12 +102,20 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
case "/rules":
// Grafana makes an extra request to `/rules`
// handler in addition to `/api/v1/rules` calls in alerts UI,
WriteListGroups(w, r, rh.groups())
var data []apiGroup
rf := extractRulesFilter(r)
data = rh.groups(rf)
WriteListGroups(w, r, data)
return true
case "/vmalert/api/v1/rules", "/api/v1/rules":
// path used by Grafana for ng alerting
data, err := rh.listGroups()
var data []byte
var err error
rf := extractRulesFilter(r)
data, err = rh.listGroups(rf)
if err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
@@ -108,6 +123,7 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
w.Header().Set("Content-Type", "application/json")
w.Write(data)
return true
case "/vmalert/api/v1/alerts", "/api/v1/alerts":
// path used by Grafana for ng alerting
data, err := rh.listAlerts()
@@ -151,6 +167,9 @@ func (rh *requestHandler) handler(w http.ResponseWriter, r *http.Request) bool {
w.Write(data)
return true
case "/-/reload":
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
logger.Infof("api config reload was called, sending sighup")
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusOK)
@@ -201,26 +220,94 @@ type listGroupsResponse struct {
} `json:"data"`
}
func (rh *requestHandler) groups() []apiGroup {
// see https://prometheus.io/docs/prometheus/latest/querying/api/#rules
type rulesFilter struct {
files []string
groupNames []string
ruleNames []string
ruleType string
excludeAlerts bool
}
func extractRulesFilter(r *http.Request) rulesFilter {
rf := rulesFilter{}
var ruleType string
ruleTypeParam := r.URL.Query().Get("type")
// for some reason, `type` in filter doesn't match `type` in response,
// so we use this matching here
if ruleTypeParam == "alert" {
ruleType = ruleTypeAlerting
} else if ruleTypeParam == "record" {
ruleType = ruleTypeRecording
}
rf.ruleType = ruleType
rf.excludeAlerts = httputils.GetBool(r, "exclude_alerts")
rf.ruleNames = append([]string{}, r.Form["rule_name[]"]...)
rf.groupNames = append([]string{}, r.Form["rule_group[]"]...)
rf.files = append([]string{}, r.Form["file[]"]...)
return rf
}
func (rh *requestHandler) groups(rf rulesFilter) []apiGroup {
rh.m.groupsMu.RLock()
defer rh.m.groupsMu.RUnlock()
groups := make([]apiGroup, 0)
for _, g := range rh.m.groups {
groups = append(groups, groupToAPI(g))
isInList := func(list []string, needle string) bool {
if len(list) < 1 {
return true
}
for _, i := range list {
if i == needle {
return true
}
}
return false
}
// sort list of alerts for deterministic output
sort.Slice(groups, func(i, j int) bool {
return groups[i].Name < groups[j].Name
})
groups := make([]apiGroup, 0)
for _, group := range rh.m.groups {
if !isInList(rf.groupNames, group.Name) {
continue
}
if !isInList(rf.files, group.File) {
continue
}
g := groupToAPI(group)
// the returned list should always be non-nil
// https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4221
filteredRules := make([]apiRule, 0)
for _, r := range g.Rules {
if rf.ruleType != "" && rf.ruleType != r.Type {
continue
}
if !isInList(rf.ruleNames, r.Name) {
continue
}
if rf.excludeAlerts {
r.Alerts = nil
}
filteredRules = append(filteredRules, r)
}
g.Rules = filteredRules
groups = append(groups, g)
}
// sort list of groups for deterministic output
sort.Slice(groups, func(i, j int) bool {
a, b := groups[i], groups[j]
if a.Name != b.Name {
return a.Name < b.Name
}
return a.File < b.File
})
return groups
}
func (rh *requestHandler) listGroups() ([]byte, error) {
func (rh *requestHandler) listGroups(rf rulesFilter) ([]byte, error) {
lr := listGroupsResponse{Status: "success"}
lr.Data.Groups = rh.groups()
lr.Data.Groups = rh.groups(rf)
b, err := json.Marshal(lr)
if err != nil {
return nil, &httpserver.ErrorWithStatusCode{

View File

@@ -70,15 +70,29 @@ btn-primary
}
}
%}
<a class="btn {%= buttonActive(filter, "") %}" role="button" onclick="window.location = window.location.pathname">All</a>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
<a class="btn {%= buttonActive(filter, "unhealthy") %}" role="button" onclick="location.href='?filter=unhealthy'" title="Show only rules with errors">Unhealthy</a>
<a class="btn {%= buttonActive(filter, "noMatch") %}" role="button" onclick="location.href='?filter=noMatch'" title="Show only rules matching no time series during last evaluation">NoMatch</a>
<div class="btn-toolbar mb-3" role="toolbar">
<div>
<a class="btn {%= buttonActive(filter, "") %}" role="button" onclick="window.location = window.location.pathname">All</a>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
<a class="btn {%= buttonActive(filter, "unhealthy") %}" role="button" onclick="location.href='?filter=unhealthy'" title="Show only rules with errors">Unhealthy</a>
<a class="btn {%= buttonActive(filter, "noMatch") %}" role="button" onclick="location.href='?filter=noMatch'" title="Show only rules matching no time series during last evaluation">NoMatch</a>
</div>
<div class="col-md-4 col-lg-5">
<div class="px-3 input-group">
<div class="input-group-prepend">
<span class="input-group-text">
<svg fill="#000000" height="25px" width="20px" version="1.1" id="Capa_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 490.4 490.4" xml:space="preserve"><g id="SVGRepo_bgCarrier" stroke-width="0"></g><g id="SVGRepo_tracerCarrier" stroke-linecap="round" stroke-linejoin="round"></g><g id="SVGRepo_iconCarrier"> <g> <path d="M484.1,454.796l-110.5-110.6c29.8-36.3,47.6-82.8,47.6-133.4c0-116.3-94.3-210.6-210.6-210.6S0,94.496,0,210.796 s94.3,210.6,210.6,210.6c50.8,0,97.4-18,133.8-48l110.5,110.5c12.9,11.8,25,4.2,29.2,0C492.5,475.596,492.5,463.096,484.1,454.796z M41.1,210.796c0-93.6,75.9-169.5,169.5-169.5s169.6,75.9,169.6,169.5s-75.9,169.5-169.5,169.5S41.1,304.396,41.1,210.796z"></path> </g> </g></svg>
</span>
</div>
<input id="search" placeholder="Filter by group, rule or labels" type="text" class="form-control"/>
</div>
</div>
</div>
{% if len(groups) > 0 %}
{% for _, g := range groups %}
<div
class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{%endif%}" data-bs-target="rules-{%s g.ID %}">
class="group-heading{% if rNotOk[g.ID] > 0 %} alert-danger{%endif%}" data-bs-target="rules-{%s g.ID %}" data-group-name="{%s g.Name %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %} (every {%f.0 g.Interval %}s) #</a>
{% if rNotOk[g.ID] > 0 %}<span class="badge bg-danger" title="Number of rules with status Error">{%d rNotOk[g.ID] %}</span> {% endif %}
@@ -100,7 +114,7 @@ btn-primary
</div>
{% endif %}
</div>
<div class="collapse" id="rules-{%s g.ID %}">
<div class="collapse rule-table" id="rules-{%s g.ID %}">
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
@@ -111,7 +125,7 @@ btn-primary
</thead>
<tbody>
{% for _, r := range g.Rules %}
<tr{% if r.LastError != "" %} class="alert-danger"{% endif %}>
<tr class="rule{% if r.LastError != "" %} alert-danger{% endif %}" data-rule-name="{%s r.Name %}" data-bs-target="{%s g.ID %}">
<td>
<div class="row">
<div class="col-12 mb-2">
@@ -134,7 +148,7 @@ btn-primary
<div class="col-12 mb-2">
{% if len(r.Labels) > 0 %} <b>Labels:</b>{% endif %}
{% for k, v := range r.Labels %}
<span class="ms-1 badge bg-primary">{%s k %}={%s v %}</span>
<span class="ms-1 badge bg-primary label">{%s k %}={%s v %}</span>
{% endfor %}
</div>
{% if r.LastError != "" %}
@@ -170,11 +184,25 @@ btn-primary
{%code prefix := utils.Prefix(r.URL.Path) %}
{%= tpl.Header(r, navItems, "Alerts", getLastConfigError()) %}
{% if len(groupAlerts) > 0 %}
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
<div class="btn-toolbar mb-3" role="toolbar">
<div>
<a class="btn btn-primary" role="button" onclick="collapseAll()">Collapse All</a>
<a class="btn btn-primary" role="button" onclick="expandAll()">Expand All</a>
</div>
<div class="col-md-4 col-lg-5">
<div class="px-3 input-group">
<div class="input-group-prepend">
<span class="input-group-text">
<svg fill="#000000" height="25px" width="20px" version="1.1" id="Capa_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 490.4 490.4" xml:space="preserve"><g id="SVGRepo_bgCarrier" stroke-width="0"></g><g id="SVGRepo_tracerCarrier" stroke-linecap="round" stroke-linejoin="round"></g><g id="SVGRepo_iconCarrier"> <g> <path d="M484.1,454.796l-110.5-110.6c29.8-36.3,47.6-82.8,47.6-133.4c0-116.3-94.3-210.6-210.6-210.6S0,94.496,0,210.796 s94.3,210.6,210.6,210.6c50.8,0,97.4-18,133.8-48l110.5,110.5c12.9,11.8,25,4.2,29.2,0C492.5,475.596,492.5,463.096,484.1,454.796z M41.1,210.796c0-93.6,75.9-169.5,169.5-169.5s169.6,75.9,169.6,169.5s-75.9,169.5-169.5,169.5S41.1,304.396,41.1,210.796z"></path> </g> </g></svg>
</span>
</div>
<input id="search" placeholder="Filter by group, rule or labels" type="text" class="form-control"/>
</div>
</div>
</div>
{% for _, ga := range groupAlerts %}
{%code g := ga.Group %}
<div class="group-heading alert-danger" data-bs-target="rules-{%s g.ID %}">
<div class="group-heading alert-danger" data-bs-target="rules-{%s g.ID %}" data-group-name="{%s g.Name %}">
<span class="anchor" id="group-{%s g.ID %}"></span>
<a href="#group-{%s g.ID %}">{%s g.Name %}{% if g.Type != "prometheus" %} ({%s g.Type %}){% endif %}</a>
<span class="badge bg-danger" title="Number of active alerts">{%d len(ga.Alerts) %}</span>
@@ -192,7 +220,7 @@ btn-primary
}
sort.Strings(keys)
%}
<div class="collapse" id="rules-{%s g.ID %}">
<div class="collapse rule-table" id="rules-{%s g.ID %}">
{% for _, ruleID := range keys %}
{%code
defaultAR := alertsByRule[ruleID][0]
@@ -203,45 +231,46 @@ btn-primary
sort.Strings(labelKeys)
%}
<br>
<b>alert:</b> {%s defaultAR.Name %} ({%d len(alertsByRule[ruleID]) %})
| <span><a target="_blank" href="{%s defaultAR.SourceLink %}">Source</a></span>
<br>
<b>expr:</b><code><pre>{%s defaultAR.Expression %}</pre></code>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Labels</th>
<th scope="col">State</th>
<th scope="col">Active at</th>
<th scope="col">Value</th>
<th scope="col">Link</th>
</tr>
</thead>
<tbody>
{% for _, ar := range alertsByRule[ruleID] %}
<tr>
<td>
{% for _, k := range labelKeys %}
<span class="ms-1 badge bg-primary">{%s k %}={%s ar.Labels[k] %}</span>
{% endfor %}
</td>
<td>{%= badgeState(ar.State) %}</td>
<td>
{%s ar.ActiveAt.Format("2006-01-02T15:04:05Z07:00") %}
{% if ar.Restored %}{%= badgeRestored() %}{% endif %}
{% if ar.Stabilizing %}{%= badgeStabilizing() %}{% endif %}
</td>
<td>{%s ar.Value %}</td>
<td>
<a href="{%s prefix+ar.WebLink() %}">Details</a>
</td>
</tr>
{% endfor %}
</tbody>
</table>
<div class="rule" data-rule-name="{%s defaultAR.Name %}" data-bs-target="{%s g.ID %}">
<b>alert:</b> {%s defaultAR.Name %} ({%d len(alertsByRule[ruleID]) %})
| <span><a target="_blank" href="{%s defaultAR.SourceLink %}">Source</a></span>
<br>
<b>expr:</b><code><pre>{%s defaultAR.Expression %}</pre></code>
<table class="table table-striped table-hover table-sm">
<thead>
<tr>
<th scope="col">Labels</th>
<th scope="col">State</th>
<th scope="col">Active at</th>
<th scope="col">Value</th>
<th scope="col">Link</th>
</tr>
</thead>
<tbody>
{% for _, ar := range alertsByRule[ruleID] %}
<tr>
<td>
{% for _, k := range labelKeys %}
<span class="ms-1 badge bg-primary label">{%s k %}={%s ar.Labels[k] %}</span>
{% endfor %}
</td>
<td>{%= badgeState(ar.State) %}</td>
<td>
{%s ar.ActiveAt.Format("2006-01-02T15:04:05Z07:00") %}
{% if ar.Restored %}{%= badgeRestored() %}{% endif %}
{% if ar.Stabilizing %}{%= badgeStabilizing() %}{% endif %}
</td>
<td>{%s ar.Value %}</td>
<td>
<a href="{%s prefix+ar.WebLink() %}">Details</a>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endfor %}
</div>
<br>
{% endfor %}
{% else %}

File diff suppressed because it is too large Load Diff

View File

@@ -23,6 +23,7 @@ func TestHandler(t *testing.T) {
})
g := &rule.Group{
Name: "group",
File: "rules.yaml",
Concurrency: 1,
}
ar := rule.NewAlertingRule(fq, g, config.Rule{ID: 0, Alert: "alert"})
@@ -165,6 +166,74 @@ func TestHandler(t *testing.T) {
t.Fatalf("expected %+v to have state updates field not empty", gotRuleWithUpdates.StateUpdates)
}
})
t.Run("/api/v1/rules&filters", func(t *testing.T) {
check := func(url string, expGroups, expRules int) {
t.Helper()
lr := listGroupsResponse{}
getResp(ts.URL+url, &lr, 200)
if length := len(lr.Data.Groups); length != expGroups {
t.Errorf("expected %d groups got %d", expGroups, length)
}
if len(lr.Data.Groups) < 1 {
return
}
var rulesN int
for _, gr := range lr.Data.Groups {
rulesN += len(gr.Rules)
}
if rulesN != expRules {
t.Errorf("expected %d rules got %d", expRules, rulesN)
}
}
check("/api/v1/rules?type=alert", 1, 1)
check("/api/v1/rules?type=record", 1, 1)
check("/vmalert/api/v1/rules?type=alert", 1, 1)
check("/vmalert/api/v1/rules?type=record", 1, 1)
// no filtering expected due to bad params
check("/api/v1/rules?type=badParam", 1, 2)
check("/api/v1/rules?foo=bar", 1, 2)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=bar", 0, 0)
check("/api/v1/rules?rule_group[]=foo&rule_group[]=group&rule_group[]=bar", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=foo", 0, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml", 1, 2)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=foo", 1, 0)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert", 1, 1)
check("/api/v1/rules?rule_group[]=group&file[]=rules.yaml&rule_name[]=alert&rule_name[]=record", 1, 2)
})
t.Run("/api/v1/rules&exclude_alerts=true", func(t *testing.T) {
// check if response returns active alerts by default
lr := listGroupsResponse{}
getResp(ts.URL+"/api/v1/rules?rule_group[]=group&file[]=rules.yaml", &lr, 200)
activeAlerts := 0
for _, gr := range lr.Data.Groups {
for _, r := range gr.Rules {
activeAlerts += len(r.Alerts)
}
}
if activeAlerts == 0 {
t.Fatalf("expected at least 1 active alert in response; got 0")
}
// disable returning alerts via param
lr = listGroupsResponse{}
getResp(ts.URL+"/api/v1/rules?rule_group[]=group&file[]=rules.yaml&exclude_alerts=true", &lr, 200)
activeAlerts = 0
for _, gr := range lr.Data.Groups {
for _, r := range gr.Rules {
activeAlerts += len(r.Alerts)
}
}
if activeAlerts != 0 {
t.Fatalf("expected to get 0 active alert in response; got %d", activeAlerts)
}
})
}
func TestEmptyResponse(t *testing.T) {

View File

@@ -193,10 +193,15 @@ func ruleToAPI(r interface{}) apiRule {
return apiRule{}
}
const (
ruleTypeRecording = "recording"
ruleTypeAlerting = "alerting"
)
func recordingToAPI(rr *rule.RecordingRule) apiRule {
lastState := rule.GetLastEntry(rr)
r := apiRule{
Type: "recording",
Type: ruleTypeRecording,
DatasourceType: rr.Type.String(),
Name: rr.Name,
Query: rr.Expr,
@@ -224,7 +229,7 @@ func recordingToAPI(rr *rule.RecordingRule) apiRule {
func alertingToAPI(ar *rule.AlertingRule) apiRule {
lastState := rule.GetLastEntry(ar)
r := apiRule{
Type: "alerting",
Type: ruleTypeAlerting,
DatasourceType: ar.Type.String(),
Name: ar.Name,
Query: ar.Expr,

View File

@@ -9,6 +9,7 @@ import (
"net/url"
"os"
"regexp"
"sort"
"strconv"
"strings"
"sync"
@@ -16,12 +17,13 @@ import (
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/cespare/xxhash/v2"
"gopkg.in/yaml.v2"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envtemplate"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs/fscore"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/procutil"
)
@@ -41,6 +43,9 @@ var (
type AuthConfig struct {
Users []UserInfo `yaml:"users,omitempty"`
UnauthorizedUser *UserInfo `yaml:"unauthorized_user,omitempty"`
// ms holds all the metrics for the given AuthConfig
ms *metrics.Set
}
// UserInfo is user information read from authConfigPath
@@ -60,12 +65,15 @@ type UserInfo struct {
TLSInsecureSkipVerify *bool `yaml:"tls_insecure_skip_verify,omitempty"`
TLSCAFile string `yaml:"tls_ca_file,omitempty"`
MetricLabels map[string]string `yaml:"metric_labels,omitempty"`
concurrencyLimitCh chan struct{}
concurrencyLimitReached *metrics.Counter
httpTransport *http.Transport
requests *metrics.Counter
backendErrors *metrics.Counter
requestsDuration *metrics.Summary
}
@@ -266,8 +274,10 @@ func (up *URLPrefix) getLeastLoadedBackendURL() *backendURL {
if bu.isBroken() {
continue
}
if atomic.CompareAndSwapInt32(&bu.concurrentRequests, 0, 1) {
if atomic.LoadInt32(&bu.concurrentRequests) == 0 {
// Fast path - return the backend with zero concurrently executed requests.
// Do not use atomic.CompareAndSwapInt32(), since it is much slower on systems with many CPU cores.
atomic.AddInt32(&bu.concurrentRequests, 1)
return bu
}
}
@@ -462,17 +472,19 @@ func authConfigReloader(sighupCh <-chan os.Signal) {
// authConfigData needs to be updated each time authConfig is updated.
var authConfigData atomic.Pointer[[]byte]
var authConfig atomic.Pointer[AuthConfig]
var authUsers atomic.Pointer[map[string]*UserInfo]
var authConfigWG sync.WaitGroup
var stopCh chan struct{}
var (
authConfig atomic.Pointer[AuthConfig]
authUsers atomic.Pointer[map[string]*UserInfo]
authConfigWG sync.WaitGroup
stopCh chan struct{}
)
// loadAuthConfig loads and applies the config from *authConfigPath.
// It returns bool value to identify if new config was applied.
// The config can be not applied if there is a parsing error
// or if there are no changes to the current authConfig.
func loadAuthConfig() (bool, error) {
data, err := fs.ReadFileOrHTTP(*authConfigPath)
data, err := fscore.ReadFileOrHTTP(*authConfigPath)
if err != nil {
return false, fmt.Errorf("failed to read -auth.config=%q: %w", *authConfigPath, err)
}
@@ -494,6 +506,11 @@ func loadAuthConfig() (bool, error) {
}
logger.Infof("loaded information about %d users from -auth.config=%q", len(m), *authConfigPath)
prevAc := authConfig.Load()
if prevAc != nil {
metrics.UnregisterSet(prevAc.ms)
}
metrics.RegisterSet(ac.ms)
authConfig.Store(ac)
authConfigData.Store(&data)
authUsers.Store(&m)
@@ -506,10 +523,13 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
if err != nil {
return nil, fmt.Errorf("cannot expand environment vars: %w", err)
}
var ac AuthConfig
if err = yaml.UnmarshalStrict(data, &ac); err != nil {
ac := &AuthConfig{
ms: metrics.NewSet(),
}
if err = yaml.UnmarshalStrict(data, ac); err != nil {
return nil, fmt.Errorf("cannot unmarshal AuthConfig data: %w", err)
}
ui := ac.UnauthorizedUser
if ui != nil {
if ui.Username != "" {
@@ -527,23 +547,30 @@ func parseAuthConfig(data []byte) (*AuthConfig, error) {
if err := ui.initURLs(); err != nil {
return nil, err
}
ui.requests = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_requests_total`)
ui.requestsDuration = metrics.GetOrCreateSummary(`vmauth_unauthorized_user_request_duration_seconds`)
metricLabels, err := ui.getMetricLabels()
if err != nil {
return nil, fmt.Errorf("cannot parse metric_labels for unauthorized_user: %w", err)
}
ui.requests = ac.ms.NewCounter(`vmauth_unauthorized_user_requests_total` + metricLabels)
ui.backendErrors = ac.ms.NewCounter(`vmauth_unauthorized_user_request_backend_errors_total` + metricLabels)
ui.requestsDuration = ac.ms.NewSummary(`vmauth_unauthorized_user_request_duration_seconds` + metricLabels)
ui.concurrencyLimitCh = make(chan struct{}, ui.getMaxConcurrentRequests())
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(`vmauth_unauthorized_user_concurrent_requests_limit_reached_total`)
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_capacity`, func() float64 {
ui.concurrencyLimitReached = ac.ms.NewCounter(`vmauth_unauthorized_user_concurrent_requests_limit_reached_total` + metricLabels)
_ = ac.ms.NewGauge(`vmauth_unauthorized_user_concurrent_requests_capacity`+metricLabels, func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(`vmauth_unauthorized_user_concurrent_requests_current`, func() float64 {
_ = ac.ms.NewGauge(`vmauth_unauthorized_user_concurrent_requests_current`+metricLabels, func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
tr, err := getTransport(ui.TLSInsecureSkipVerify, ui.TLSCAFile)
if err != nil {
return nil, fmt.Errorf("cannot initialize HTTP transport: %w", err)
}
ui.httpTransport = tr
}
return &ac, nil
return ac, nil
}
func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
@@ -554,43 +581,47 @@ func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
byAuthToken := make(map[string]*UserInfo, len(uis))
for i := range uis {
ui := &uis[i]
if ui.BearerToken == "" && ui.Username == "" {
return nil, fmt.Errorf("either bearer_token or username must be set")
if ui.Username != "" && ui.Password == "" {
// Do not allow setting username without password if there are other auth configs exist.
// This should prevent from typical mis-configuration when access by username without password
// remains open if other authorization schemes are defined.
if ui.BearerToken != "" {
return nil, fmt.Errorf("bearer_token=%q and username=%q cannot be set simultaneously", ui.BearerToken, ui.Username)
}
}
if ui.BearerToken != "" && ui.Username != "" {
return nil, fmt.Errorf("bearer_token=%q and username=%q cannot be set simultaneously", ui.BearerToken, ui.Username)
ats := getAuthTokens(ui.BearerToken, ui.Username, ui.Password)
if len(ats) == 0 {
return nil, fmt.Errorf("one of bearer_token, username or mtls must be set")
}
at1, at2 := getAuthTokens(ui.BearerToken, ui.Username, ui.Password)
if byAuthToken[at1] != nil {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", ui.BearerToken, ui.Username, at1)
}
if byAuthToken[at2] != nil {
return nil, fmt.Errorf("duplicate auth token found for bearer_token=%q, username=%q: %q", ui.BearerToken, ui.Username, at2)
for _, at := range ats {
if uiOld := byAuthToken[at]; uiOld != nil {
return nil, fmt.Errorf("duplicate auth token=%q found for username=%q, name=%q; the previous one is set for username=%q, name=%q",
at, ui.Username, ui.Name, uiOld.Username, uiOld.Name)
}
}
if err := ui.initURLs(); err != nil {
return nil, err
}
name := ui.name()
if ui.BearerToken != "" {
if ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
ui.requestsDuration = metrics.GetOrCreateSummary(fmt.Sprintf(`vmauth_user_request_duration_seconds{username=%q}`, name))
if ui.BearerToken != "" && ui.Password != "" {
return nil, fmt.Errorf("password shouldn't be set for bearer_token %q", ui.BearerToken)
}
if ui.Username != "" {
ui.requests = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_requests_total{username=%q}`, name))
ui.requestsDuration = metrics.GetOrCreateSummary(fmt.Sprintf(`vmauth_user_request_duration_seconds{username=%q}`, name))
metricLabels, err := ui.getMetricLabels()
if err != nil {
return nil, fmt.Errorf("cannot parse metric_labels: %w", err)
}
ui.requests = ac.ms.GetOrCreateCounter(`vmauth_user_requests_total` + metricLabels)
ui.backendErrors = ac.ms.GetOrCreateCounter(`vmauth_user_request_backend_errors_total` + metricLabels)
ui.requestsDuration = ac.ms.GetOrCreateSummary(`vmauth_user_request_duration_seconds` + metricLabels)
mcr := ui.getMaxConcurrentRequests()
ui.concurrencyLimitCh = make(chan struct{}, mcr)
ui.concurrencyLimitReached = metrics.GetOrCreateCounter(fmt.Sprintf(`vmauth_user_concurrent_requests_limit_reached_total{username=%q}`, name))
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_capacity{username=%q}`, name), func() float64 {
ui.concurrencyLimitReached = ac.ms.GetOrCreateCounter(`vmauth_user_concurrent_requests_limit_reached_total` + metricLabels)
_ = ac.ms.GetOrCreateGauge(`vmauth_user_concurrent_requests_capacity`+metricLabels, func() float64 {
return float64(cap(ui.concurrencyLimitCh))
})
_ = metrics.GetOrCreateGauge(fmt.Sprintf(`vmauth_user_concurrent_requests_current{username=%q}`, name), func() float64 {
_ = ac.ms.GetOrCreateGauge(`vmauth_user_concurrent_requests_current`+metricLabels, func() float64 {
return float64(len(ui.concurrencyLimitCh))
})
@@ -600,12 +631,36 @@ func parseAuthConfigUsers(ac *AuthConfig) (map[string]*UserInfo, error) {
}
ui.httpTransport = tr
byAuthToken[at1] = ui
byAuthToken[at2] = ui
for _, at := range ats {
byAuthToken[at] = ui
}
}
return byAuthToken, nil
}
var labelNameRegexp = regexp.MustCompile("^[a-zA-Z_:.][a-zA-Z0-9_:.]*$")
func (ui *UserInfo) getMetricLabels() (string, error) {
name := ui.name()
if len(name) == 0 && len(ui.MetricLabels) == 0 {
// fast path
return "", nil
}
labels := make([]string, 0, len(ui.MetricLabels)+1)
if len(name) > 0 {
labels = append(labels, fmt.Sprintf(`username=%q`, name))
}
for k, v := range ui.MetricLabels {
if !labelNameRegexp.MatchString(k) {
return "", fmt.Errorf("incorrect label name=%q, it must match regex=%q for user=%q", k, labelNameRegexp, name)
}
labels = append(labels, fmt.Sprintf(`%s=%q`, k, v))
}
sort.Strings(labels)
labelsStr := "{" + strings.Join(labels, ",") + "}"
return labelsStr, nil
}
func (ui *UserInfo) initURLs() error {
retryStatusCodes := defaultRetryStatusCodes.Values()
loadBalancingPolicy := *defaultLoadBalancingPolicy
@@ -676,29 +731,51 @@ func (ui *UserInfo) name() string {
return ui.Username
}
if ui.BearerToken != "" {
return "bearer_token"
h := xxhash.Sum64([]byte(ui.BearerToken))
return fmt.Sprintf("bearer_token:hash:%016X", h)
}
return ""
}
func getAuthTokens(bearerToken, username, password string) (string, string) {
func getAuthTokens(bearerToken, username, password string) []string {
var ats []string
if bearerToken != "" {
// Accept the bearerToken as Basic Auth username with empty password
at1 := getAuthToken(bearerToken, "", "")
at2 := getAuthToken("", bearerToken, "")
return at1, at2
at1 := getHTTPAuthBearerToken(bearerToken)
at2 := getHTTPAuthBasicToken(bearerToken, "")
ats = append(ats, at1, at2)
} else if username != "" {
at := getHTTPAuthBasicToken(username, password)
ats = append(ats, at)
}
at := getAuthToken("", username, password)
return at, at
return ats
}
func getAuthToken(bearerToken, username, password string) string {
if bearerToken != "" {
return "Bearer " + bearerToken
}
func getHTTPAuthBearerToken(bearerToken string) string {
return "http_auth:Bearer " + bearerToken
}
func getHTTPAuthBasicToken(username, password string) string {
token := username + ":" + password
token64 := base64.StdEncoding.EncodeToString([]byte(token))
return "Basic " + token64
return "http_auth:Basic " + token64
}
func getAuthTokensFromRequest(r *http.Request) []string {
var ats []string
ah := r.Header.Get("Authorization")
if ah == "" {
return ats
}
if strings.HasPrefix(ah, "Token ") {
// Handle InfluxDB's proprietary token authentication scheme as a bearer token authentication
// See https://docs.influxdata.com/influxdb/v2.0/api/
ah = strings.Replace(ah, "Token", "Bearer", 1)
}
at := "http_auth:" + ah
ats = append(ats, at)
return ats
}
func (up *URLPrefix) sanitize() error {

View File

@@ -229,6 +229,14 @@ users:
url_prefix: http://foobar
headers:
aaa: bbb
`)
// Invalid metric label name
f(`
users:
- username: foo
url_prefix: http://foo.bar
metric_labels:
not-prometheus-compatible: value
`)
}
@@ -259,7 +267,7 @@ users:
max_concurrent_requests: 5
tls_insecure_skip_verify: true
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
getHTTPAuthBasicToken("foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURL("http://aaa:343/bbb"),
@@ -282,7 +290,7 @@ users:
load_balancing_policy: first_available
drop_src_path_prefix_parts: 1
`, map[string]*UserInfo{
getAuthToken("", "foo", "bar"): {
getHTTPAuthBasicToken("foo", "bar"): {
Username: "foo",
Password: "bar",
URLPrefix: mustParseURLs([]string{
@@ -304,11 +312,11 @@ users:
- username: bar
url_prefix: https://bar/x///
`, map[string]*UserInfo{
getAuthToken("", "foo", ""): {
getHTTPAuthBasicToken("foo", ""): {
Username: "foo",
URLPrefix: mustParseURL("http://foo"),
},
getAuthToken("", "bar", ""): {
getHTTPAuthBasicToken("bar", ""): {
Username: "bar",
URLPrefix: mustParseURL("https://bar/x"),
},
@@ -328,7 +336,7 @@ users:
- "foo: bar"
- "xxx: y"
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
getHTTPAuthBearerToken("foo"): {
BearerToken: "foo",
URLMaps: []URLMap{
{
@@ -357,7 +365,7 @@ users:
},
},
},
getAuthToken("", "foo", ""): {
getHTTPAuthBasicToken("foo", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
@@ -387,7 +395,7 @@ users:
},
},
})
// Multiple users with the same name
// Multiple users with the same name - this should work, since these users have different passwords
f(`
users:
- username: foo-same
@@ -397,17 +405,18 @@ users:
password: bar
url_prefix: https://bar/x///
`, map[string]*UserInfo{
getAuthToken("", "foo-same", "baz"): {
getHTTPAuthBasicToken("foo-same", "baz"): {
Username: "foo-same",
Password: "baz",
URLPrefix: mustParseURL("http://foo"),
},
getAuthToken("", "foo-same", "bar"): {
getHTTPAuthBasicToken("foo-same", "bar"): {
Username: "foo-same",
Password: "bar",
URLPrefix: mustParseURL("https://bar/x"),
},
})
// with default url
f(`
users:
@@ -424,7 +433,7 @@ users:
- http://default1/select/0/prometheus
- http://default2/select/0/prometheus
`, map[string]*UserInfo{
getAuthToken("foo", "", ""): {
getHTTPAuthBearerToken("foo"): {
BearerToken: "foo",
URLMaps: []URLMap{
{
@@ -456,7 +465,7 @@ users:
"http://default2/select/0/prometheus",
}),
},
getAuthToken("", "foo", ""): {
getHTTPAuthBasicToken("foo", ""): {
BearerToken: "foo",
URLMaps: []URLMap{
{
@@ -489,7 +498,41 @@ users:
}),
},
})
// With metric_labels
f(`
users:
- username: foo-same
password: baz
url_prefix: http://foo
metric_labels:
dc: eu
team: dev
- username: foo-same
password: bar
url_prefix: https://bar/x///
metric_labels:
backend_env: test
team: accounting
`, map[string]*UserInfo{
getHTTPAuthBasicToken("foo-same", "baz"): {
Username: "foo-same",
Password: "baz",
URLPrefix: mustParseURL("http://foo"),
MetricLabels: map[string]string{
"dc": "eu",
"team": "dev",
},
},
getHTTPAuthBasicToken("foo-same", "bar"): {
Username: "foo-same",
Password: "bar",
URLPrefix: mustParseURL("https://bar/x"),
MetricLabels: map[string]string{
"backend_env": "test",
"team": "accounting",
},
},
})
}
func TestParseAuthConfigPassesTLSVerificationConfig(t *testing.T) {
@@ -516,7 +559,7 @@ unauthorized_user:
t.Fatalf("unexpected error: %s", err)
}
ui := m[getAuthToken("", "foo", "bar")]
ui := m[getHTTPAuthBasicToken("foo", "bar")]
if !isSetBool(ui.TLSInsecureSkipVerify, true) || !ui.httpTransport.TLSClientConfig.InsecureSkipVerify {
t.Fatalf("unexpected TLSInsecureSkipVerify value for user foo")
}
@@ -526,6 +569,86 @@ unauthorized_user:
}
}
func TestUserInfoGetMetricLabels(t *testing.T) {
t.Run("empty-labels", func(t *testing.T) {
ui := &UserInfo{
Username: "user1",
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{username="user1"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("non-empty-username", func(t *testing.T) {
ui := &UserInfo{
Username: "user1",
MetricLabels: map[string]string{
"env": "prod",
"datacenter": "dc1",
},
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{datacenter="dc1",env="prod",username="user1"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("non-empty-name", func(t *testing.T) {
ui := &UserInfo{
Name: "user1",
BearerToken: "abc",
MetricLabels: map[string]string{
"env": "prod",
"datacenter": "dc1",
},
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{datacenter="dc1",env="prod",username="user1"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("non-empty-bearer-token", func(t *testing.T) {
ui := &UserInfo{
BearerToken: "abc",
MetricLabels: map[string]string{
"env": "prod",
"datacenter": "dc1",
},
}
labels, err := ui.getMetricLabels()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
labelsExpected := `{datacenter="dc1",env="prod",username="bearer_token:hash:44BC2CF5AD770999"}`
if labels != labelsExpected {
t.Fatalf("unexpected labels; got %s; want %s", labels, labelsExpected)
}
})
t.Run("invalid-label", func(t *testing.T) {
ui := &UserInfo{
Username: "foo",
MetricLabels: map[string]string{
",{": "aaaa",
},
}
_, err := ui.getMetricLabels()
if err == nil {
t.Fatalf("expecting non-nil error")
}
})
}
func isSetBool(boolP *bool, expectedValue bool) bool {
if boolP == nil {
return false

View File

@@ -10,6 +10,11 @@ users:
- bearer_token: "XXXX"
url_prefix: "http://localhost:8428"
# Adds labels to the exported metrics for given user section
# label name must be prometheus compatible and match regex: `^[a-zA-Z_:.][a-zA-Z0-9_:.]*$`
metric_labels:
backend_dc: eu
access_team: dev
# Requests with the 'Authorization: Bearer YYY' header are proxied to http://localhost:8428 ,
# The `X-Scope-OrgID: foobar` http header is added to every proxied request.
# The `X-Server-Hostname:` http header is removed from the proxied response.

View File

@@ -24,7 +24,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/envflag"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs/fscore"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/netutil"
@@ -33,8 +33,8 @@ import (
)
var (
httpListenAddr = flag.String("httpListenAddr", ":8427", "TCP address to listen for http connections. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flag.Bool("httpListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted at -httpListenAddr . "+
httpListenAddrs = flagutil.NewArrayString("httpListenAddr", "TCP address to listen for incoming http requests. See also -tls and -httpListenAddr.useProxyProtocol")
useProxyProtocol = flagutil.NewArrayBool("httpListenAddr.useProxyProtocol", "Whether to use proxy protocol for connections accepted at the corresponding -httpListenAddr . "+
"See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt . "+
"With enabled proxy protocol http server cannot serve regular /metrics endpoint. Use -pushmetrics.url for metrics pushing")
maxIdleConnsPerBackend = flag.Int("maxIdleConnsPerBackend", 100, "The maximum number of idle connections vmauth can open per each backend host. "+
@@ -45,7 +45,7 @@ var (
maxConcurrentPerUserRequests = flag.Int("maxConcurrentPerUserRequests", 300, "The maximum number of concurrent requests vmauth can process per each configured user. "+
"Other requests are rejected with '429 Too Many Requests' http status code. See also -maxConcurrentRequests command-line option and max_concurrent_requests option "+
"in per-user config")
reloadAuthKey = flag.String("reloadAuthKey", "", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
logInvalidAuthTokens = flag.Bool("logInvalidAuthTokens", false, "Whether to log requests with invalid auth tokens. "+
`Such requests are always counted at vmauth_http_request_errors_total{reason="invalid_auth_token"} metric, which is exposed at /metrics page`)
failTimeout = flag.Duration("failTimeout", 3*time.Second, "Sets a delay period for load balancing to skip a malfunctioning backend")
@@ -65,10 +65,14 @@ func main() {
buildinfo.Init()
logger.Init()
logger.Infof("starting vmauth at %q...", *httpListenAddr)
listenAddrs := *httpListenAddrs
if len(listenAddrs) == 0 {
listenAddrs = []string{":8427"}
}
logger.Infof("starting vmauth at %q...", listenAddrs)
startTime := time.Now()
initAuthConfig()
go httpserver.Serve(*httpListenAddr, *useProxyProtocol, requestHandler)
go httpserver.Serve(listenAddrs, useProxyProtocol, requestHandler)
logger.Infof("started vmauth in %.3f seconds", time.Since(startTime).Seconds())
pushmetrics.Init()
@@ -77,8 +81,8 @@ func main() {
pushmetrics.Stop()
startTime = time.Now()
logger.Infof("gracefully shutting down webservice at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Infof("gracefully shutting down webservice at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop the webservice: %s", err)
}
logger.Infof("successfully shut down the webservice in %.3f seconds", time.Since(startTime).Seconds())
@@ -89,7 +93,7 @@ func main() {
func requestHandler(w http.ResponseWriter, r *http.Request) bool {
switch r.URL.Path {
case "/-/reload":
if !httpserver.CheckAuthFlag(w, r, *reloadAuthKey, "reloadAuthKey") {
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
configReloadRequests.Inc()
@@ -97,8 +101,9 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(http.StatusOK)
return true
}
authToken := r.Header.Get("Authorization")
if authToken == "" {
ats := getAuthTokensFromRequest(r)
if len(ats) == 0 {
// Process requests for unauthorized users
ui := authConfig.Load().UnauthorizedUser
if ui != nil {
@@ -110,18 +115,12 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
http.Error(w, "missing `Authorization` request header", http.StatusUnauthorized)
return true
}
if strings.HasPrefix(authToken, "Token ") {
// Handle InfluxDB's proprietary token authentication scheme as a bearer token authentication
// See https://docs.influxdata.com/influxdb/v2.0/api/
authToken = strings.Replace(authToken, "Token", "Bearer", 1)
}
ac := *authUsers.Load()
ui := ac[authToken]
ui := getUserInfoByAuthTokens(ats)
if ui == nil {
invalidAuthTokenRequests.Inc()
if *logInvalidAuthTokens {
err := fmt.Errorf("cannot find the provided auth token %q in config", authToken)
err := fmt.Errorf("cannot authorize request with auth tokens %q", ats)
err = &httpserver.ErrorWithStatusCode{
Err: err,
StatusCode: http.StatusUnauthorized,
@@ -137,6 +136,17 @@ func requestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
func getUserInfoByAuthTokens(ats []string) *UserInfo {
ac := *authUsers.Load()
for _, at := range ats {
ui := ac[at]
if ui != nil {
return ui
}
}
return nil
}
func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
startTime := time.Now()
defer ui.requestsDuration.UpdateDuration(startTime)
@@ -150,12 +160,20 @@ func processUserRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
if err := ui.beginConcurrencyLimit(); err != nil {
handleConcurrencyLimitError(w, r, err)
<-concurrencyLimitCh
// Requests failed because of concurrency limit must be counted as errors,
// since this usually means the backend cannot keep up with the current load.
ui.backendErrors.Inc()
return
}
default:
concurrentRequestsLimitReached.Inc()
err := fmt.Errorf("cannot serve more than -maxConcurrentRequests=%d concurrent requests", cap(concurrencyLimitCh))
handleConcurrencyLimitError(w, r, err)
// Requests failed because of concurrency limit must be counted as errors,
// since this usually means the backend cannot keep up with the current load.
ui.backendErrors.Inc()
return
}
processRequest(w, r, ui)
@@ -201,7 +219,7 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
} else { // Update path for regular routes.
targetURL = mergeURLs(targetURL, u, up.dropSrcPathPrefixParts)
}
ok := tryProcessingRequest(w, r, targetURL, hc, up.retryStatusCodes, ui.httpTransport)
ok := tryProcessingRequest(w, r, targetURL, hc, up.retryStatusCodes, ui)
bu.put()
if ok {
return
@@ -213,15 +231,16 @@ func processRequest(w http.ResponseWriter, r *http.Request, ui *UserInfo) {
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
}
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int, transport *http.Transport) bool {
func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url.URL, hc HeadersConf, retryStatusCodes []int, ui *UserInfo) bool {
// This code has been copied from net/http/httputil/reverseproxy.go
req := sanitizeRequestHeaders(r)
req.URL = targetURL
req.Host = targetURL.Host
updateHeadersByConfig(req.Header, hc.RequestHeaders)
res, err := transport.RoundTrip(req)
res, err := ui.httpTransport.RoundTrip(req)
rtb, rtbOK := req.Body.(*readTrackingBody)
if err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
@@ -229,15 +248,20 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
logger.Warnf("remoteAddr: %s; requestURI: %s; error when proxying response body from %s: %s", remoteAddr, requestURI, targetURL, err)
if errors.Is(err, context.DeadlineExceeded) {
// Timed out request must be counted as errors, since this usually means that the backend is slow.
ui.backendErrors.Inc()
}
return true
}
if !rtbOK || !rtb.canRetry() {
// Request body cannot be re-sent to another backend. Return the error to the client then.
err = &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("cannot proxy the request to %q: %w", targetURL, err),
Err: fmt.Errorf("cannot proxy the request to %s: %w", targetURL, err),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
return true
}
// Retry the request if its body wasn't read yet. This usually means that the backend isn't reachable.
@@ -247,7 +271,20 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
logger.Warnf("remoteAddr: %s; requestURI: %s; retrying the request to %s because of response error: %s", remoteAddr, req.URL, targetURL, err)
return false
}
if (rtbOK && rtb.canRetry()) && hasInt(retryStatusCodes, res.StatusCode) {
if hasInt(retryStatusCodes, res.StatusCode) {
_ = res.Body.Close()
if !rtbOK || !rtb.canRetry() {
// If we get an error from the retry_status_codes list, but cannot execute retry,
// we consider such a request an error as well.
err := &httpserver.ErrorWithStatusCode{
Err: fmt.Errorf("got response status code=%d from %s, but cannot retry the request on another backend, because the request has been already consumed",
res.StatusCode, targetURL),
StatusCode: http.StatusServiceUnavailable,
}
httpserver.Errorf(w, r, "%s", err)
ui.backendErrors.Inc()
return true
}
// Retry requests at other backends if it matches retryStatusCodes.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4893
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
@@ -266,6 +303,7 @@ func tryProcessingRequest(w http.ResponseWriter, r *http.Request, targetURL *url
copyBuf.B = bytesutil.ResizeNoCopyNoOverallocate(copyBuf.B, 16*1024)
_, err = io.CopyBuffer(w, res.Body, copyBuf.B)
copyBufPool.Put(copyBuf)
_ = res.Body.Close()
if err != nil && !netutil.IsTrivialNetworkError(err) {
remoteAddr := httpserver.GetQuotedRemoteAddr(r)
requestURI := httpserver.GetRequestURI(r)
@@ -393,8 +431,10 @@ func getTransport(insecureSkipVerifyP *bool, caFile string) (*http.Transport, er
return tr, nil
}
var transportMap = make(map[string]*http.Transport)
var transportMapLock sync.Mutex
var (
transportMap = make(map[string]*http.Transport)
transportMapLock sync.Mutex
)
func appendTransportKey(dst []byte, insecureSkipVerify bool, caFile string) []byte {
dst = encoding.MarshalBool(dst, insecureSkipVerify)
@@ -422,7 +462,7 @@ func newTransport(insecureSkipVerify bool, caFile string) (*http.Transport, erro
tlsCfg.ClientSessionCache = tls.NewLRUClientSessionCache(0)
tlsCfg.InsecureSkipVerify = insecureSkipVerify
if caFile != "" {
data, err := fs.ReadFileOrHTTP(caFile)
data, err := fscore.ReadFileOrHTTP(caFile)
if err != nil {
return nil, fmt.Errorf("cannot read tls_ca_file: %w", err)
}

View File

@@ -20,6 +20,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/pushmetrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/snapshot"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/snapshot/snapshotutil"
)
var (
@@ -93,7 +94,8 @@ func main() {
}
}
go httpserver.Serve(*httpListenAddr, false, nil)
listenAddrs := []string{*httpListenAddr}
go httpserver.Serve(listenAddrs, nil, nil)
pushmetrics.Init()
err := makeBackup()
@@ -104,8 +106,8 @@ func main() {
pushmetrics.Stop()
startTime := time.Now()
logger.Infof("gracefully shutting down http server for metrics at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Infof("gracefully shutting down http server for metrics at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop http server for metrics: %s", err)
}
logger.Infof("successfully shut down http server for metrics in %.3f seconds", time.Since(startTime).Seconds())
@@ -168,7 +170,7 @@ See the docs at https://docs.victoriametrics.com/vmbackup.html .
}
func newSrcFS() (*fslocal.FS, error) {
if err := snapshot.Validate(*snapshotName); err != nil {
if err := snapshotutil.Validate(*snapshotName); err != nil {
return nil, fmt.Errorf("invalid -snapshotName=%q: %w", *snapshotName, err)
}
snapshotPath := filepath.Join(*storageDataPath, "snapshots", *snapshotName)

View File

@@ -15,7 +15,6 @@ func TestRetry_Do(t *testing.T) {
backoffFactor float64
backoffMinDuration time.Duration
retryableFunc retryableFunc
ctx context.Context
cancelTimeout time.Duration
want uint64
wantErr bool
@@ -25,7 +24,6 @@ func TestRetry_Do(t *testing.T) {
retryableFunc: func() error {
return ErrBadRequest
},
ctx: context.Background(),
want: 0,
wantErr: true,
},
@@ -35,7 +33,6 @@ func TestRetry_Do(t *testing.T) {
time.Sleep(time.Millisecond * 100)
return nil
},
ctx: context.Background(),
want: 0,
wantErr: true,
},
@@ -58,7 +55,6 @@ func TestRetry_Do(t *testing.T) {
}
return nil
},
ctx: context.Background(),
want: 1,
wantErr: false,
},
@@ -75,7 +71,6 @@ func TestRetry_Do(t *testing.T) {
}
return nil
},
ctx: context.Background(),
want: 5,
wantErr: true,
},
@@ -85,14 +80,8 @@ func TestRetry_Do(t *testing.T) {
backoffFactor: 1.7,
backoffMinDuration: time.Millisecond * 10,
retryableFunc: func() error {
t := time.NewTicker(time.Millisecond * 5)
defer t.Stop()
for range t.C {
return fmt.Errorf("got some error")
}
return nil
return fmt.Errorf("got some error")
},
ctx: context.Background(),
cancelTimeout: time.Millisecond * 40,
want: 3,
wantErr: true,
@@ -101,12 +90,13 @@ func TestRetry_Do(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
r := &Backoff{retries: tt.backoffRetries, factor: tt.backoffFactor, minDuration: tt.backoffMinDuration}
ctx := context.Background()
if tt.cancelTimeout != 0 {
newCtx, cancelFn := context.WithTimeout(tt.ctx, tt.cancelTimeout)
tt.ctx = newCtx
newCtx, cancelFn := context.WithTimeout(context.Background(), tt.cancelTimeout)
ctx = newCtx
defer cancelFn()
}
got, err := r.Retry(tt.ctx, tt.retryableFunc)
got, err := r.Retry(ctx, tt.retryableFunc)
if (err != nil) != tt.wantErr {
t.Errorf("Retry() error = %v, wantErr %v", err, tt.wantErr)
return

View File

@@ -123,15 +123,20 @@ var (
)
const (
otsdbAddr = "otsdb-addr"
otsdbConcurrency = "otsdb-concurrency"
otsdbQueryLimit = "otsdb-query-limit"
otsdbOffsetDays = "otsdb-offset-days"
otsdbHardTSStart = "otsdb-hard-ts-start"
otsdbRetentions = "otsdb-retentions"
otsdbFilters = "otsdb-filters"
otsdbNormalize = "otsdb-normalize"
otsdbMsecsTime = "otsdb-msecstime"
otsdbAddr = "otsdb-addr"
otsdbConcurrency = "otsdb-concurrency"
otsdbQueryLimit = "otsdb-query-limit"
otsdbOffsetDays = "otsdb-offset-days"
otsdbHardTSStart = "otsdb-hard-ts-start"
otsdbRetentions = "otsdb-retentions"
otsdbFilters = "otsdb-filters"
otsdbNormalize = "otsdb-normalize"
otsdbMsecsTime = "otsdb-msecstime"
otsdbCertFile = "otsdb-cert-file"
otsdbKeyFile = "otsdb-key-file"
otsdbCAFile = "otsdb-CA-file"
otsdbServerName = "otsdb-server-name"
otsdbInsecureSkipVerify = "otsdb-insecure-skip-verify"
)
var (
@@ -191,6 +196,27 @@ var (
Value: false,
Usage: "Whether to normalize all data received to lower case before forwarding to VictoriaMetrics",
},
&cli.StringFlag{
Name: otsdbCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to -otsdb-addr",
},
&cli.StringFlag{
Name: otsdbKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to -otsdb-addr",
},
&cli.StringFlag{
Name: otsdbCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to -otsdb-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: otsdbServerName,
Usage: "Optional TLS server name to use for connections to -otsdb-addr. By default, the server name from otsdbAddr is used",
},
&cli.BoolFlag{
Name: otsdbInsecureSkipVerify,
Usage: "Whether to skip tls verification when connecting to -otsdb-addr",
Value: false,
},
}
)
@@ -208,6 +234,11 @@ const (
influxMeasurementFieldSeparator = "influx-measurement-field-separator"
influxSkipDatabaseLabel = "influx-skip-database-label"
influxPrometheusMode = "influx-prometheus-mode"
influxCertFile = "influx-cert-file"
influxKeyFile = "influx-key-file"
influxCAFile = "influx-CA-file"
influxServerName = "influx-server-name"
influxInsecureSkipVerify = "influx-insecure-skip-verify"
)
var (
@@ -272,7 +303,28 @@ var (
},
&cli.BoolFlag{
Name: influxPrometheusMode,
Usage: "Wether to restore the original timeseries name previously written from Prometheus to InfluxDB v1 via remote_write.",
Usage: "Whether to restore the original timeseries name previously written from Prometheus to InfluxDB v1 via remote_write.",
Value: false,
},
&cli.StringFlag{
Name: influxCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to -influx-addr",
},
&cli.StringFlag{
Name: influxKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to -influx-addr",
},
&cli.StringFlag{
Name: influxCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to -influx-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: influxServerName,
Usage: "Optional TLS server name to use for connections to -influx-addr. By default, the server name from -influx-addr is used",
},
&cli.BoolFlag{
Name: influxInsecureSkipVerify,
Usage: "Whether to skip tls verification when connecting to -influx-addr",
Value: false,
},
}
@@ -496,6 +548,10 @@ const (
remoteReadPassword = "remote-read-password"
remoteReadHTTPTimeout = "remote-read-http-timeout"
remoteReadHeaders = "remote-read-headers"
remoteReadCertFile = "remote-read-cert-file"
remoteReadKeyFile = "remote-read-key-file"
remoteReadCAFile = "remote-read-CA-file"
remoteReadServerName = "remote-read-server-name"
remoteReadInsecureSkipVerify = "remote-read-insecure-skip-verify"
remoteReadDisablePathAppend = "remote-read-disable-path-append"
)
@@ -574,6 +630,22 @@ var (
"For example, --remote-read-headers='My-Auth:foobar' would send 'My-Auth: foobar' HTTP header with every request to the corresponding remote source storage. \n" +
"Multiple headers must be delimited by '^^': --remote-read-headers='header1:value1^^header2:value2'",
},
&cli.StringFlag{
Name: remoteReadCertFile,
Usage: "Optional path to client-side TLS certificate file to use when connecting to -remote-read-src-addr",
},
&cli.StringFlag{
Name: remoteReadKeyFile,
Usage: "Optional path to client-side TLS key to use when connecting to -remote-read-src-addr",
},
&cli.StringFlag{
Name: remoteReadCAFile,
Usage: "Optional path to TLS CA file to use for verifying connections to -remote-read-src-addr. By default, system CA is used",
},
&cli.StringFlag{
Name: remoteReadServerName,
Usage: "Optional TLS server name to use for connections to remoteReadSrcAddr. By default, the server name from -remote-read-src-addr is used",
},
&cli.BoolFlag{
Name: remoteReadInsecureSkipVerify,
Usage: "Whether to skip TLS certificate verification when connecting to the remote read address",

View File

@@ -1,6 +1,7 @@
package influx
import (
"crypto/tls"
"fmt"
"io"
"log"
@@ -33,7 +34,8 @@ type Config struct {
Retention string
ChunkSize int
Filter Filter
Filter Filter
TLSConfig *tls.Config
}
// Filter contains configuration for filtering
@@ -86,10 +88,10 @@ type LabelPair struct {
// configured with passed Config
func NewClient(cfg Config) (*Client, error) {
c := influx.HTTPConfig{
Addr: cfg.Addr,
Username: cfg.Username,
Password: cfg.Password,
InsecureSkipVerify: true,
Addr: cfg.Addr,
Username: cfg.Username,
Password: cfg.Password,
TLSConfig: cfg.TLSConfig,
}
hc, err := influx.NewHTTPClient(c)
if err != nil {

View File

@@ -25,6 +25,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/prometheus"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/buildinfo"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/native/stream"
)
@@ -49,8 +50,20 @@ func main() {
Action: func(c *cli.Context) error {
fmt.Println("OpenTSDB import mode")
// create Transport with given TLS config
certFile := c.String(otsdbCertFile)
keyFile := c.String(otsdbKeyFile)
caFile := c.String(otsdbCAFile)
serverName := c.String(otsdbServerName)
insecureSkipVerify := c.Bool(otsdbInsecureSkipVerify)
addr := c.String(otsdbAddr)
tr, err := httputils.Transport(addr, certFile, caFile, keyFile, serverName, insecureSkipVerify)
if err != nil {
return fmt.Errorf("failed to create Transport: %s", err)
}
oCfg := opentsdb.Config{
Addr: c.String(otsdbAddr),
Addr: addr,
Limit: c.Int(otsdbQueryLimit),
Offset: c.Int64(otsdbOffsetDays),
HardTS: c.Int64(otsdbHardTSStart),
@@ -58,6 +71,7 @@ func main() {
Filters: c.StringSlice(otsdbFilters),
Normalize: c.Bool(otsdbNormalize),
MsecsTime: c.Bool(otsdbMsecsTime),
Transport: tr,
}
otsdbClient, err := opentsdb.NewClient(oCfg)
if err != nil {
@@ -84,6 +98,18 @@ func main() {
Action: func(c *cli.Context) error {
fmt.Println("InfluxDB import mode")
// create TLS config
certFile := c.String(influxCertFile)
keyFile := c.String(influxKeyFile)
caFile := c.String(influxCAFile)
serverName := c.String(influxServerName)
insecureSkipVerify := c.Bool(influxInsecureSkipVerify)
tc, err := httputils.TLSConfig(certFile, caFile, keyFile, serverName, insecureSkipVerify)
if err != nil {
return fmt.Errorf("failed to create TLS Config: %s", err)
}
iCfg := influx.Config{
Addr: c.String(influxAddr),
Username: c.String(influxUser),
@@ -96,7 +122,9 @@ func main() {
TimeEnd: c.String(influxFilterTimeEnd),
},
ChunkSize: c.Int(influxChunkSize),
TLSConfig: tc,
}
influxClient, err := influx.NewClient(iCfg)
if err != nil {
return fmt.Errorf("failed to create influx client: %s", err)
@@ -134,6 +162,10 @@ func main() {
Headers: c.String(remoteReadHeaders),
LabelName: c.String(remoteReadFilterLabel),
LabelValue: c.String(remoteReadFilterLabelValue),
CertFile: c.String(remoteReadCertFile),
KeyFile: c.String(remoteReadKeyFile),
CAFile: c.String(remoteReadCAFile),
ServerName: c.String(remoteReadServerName),
InsecureSkipVerify: c.Bool(remoteReadInsecureSkipVerify),
DisablePathAppend: c.Bool(remoteReadDisablePathAppend),
})

View File

@@ -47,6 +47,8 @@ type Client struct {
Normalize bool
HardTS int64
MsecsTime bool
c *http.Client
}
// Config contains fields required
@@ -60,6 +62,7 @@ type Config struct {
Filters []string
Normalize bool
MsecsTime bool
Transport *http.Transport
}
// TimeRange contains data about time ranges to query
@@ -107,7 +110,8 @@ type Metric struct {
// FindMetrics discovers all metrics that OpenTSDB knows about (given a filter)
// e.g. /api/suggest?type=metrics&q=system&max=100000
func (c Client) FindMetrics(q string) ([]string, error) {
resp, err := http.Get(q)
resp, err := c.c.Get(q)
if err != nil {
return nil, fmt.Errorf("failed to send GET request to %q: %s", q, err)
}
@@ -131,7 +135,7 @@ func (c Client) FindMetrics(q string) ([]string, error) {
// e.g. /api/search/lookup?m=system.load5&limit=1000000
func (c Client) FindSeries(metric string) ([]Meta, error) {
q := fmt.Sprintf("%s/api/search/lookup?m=%s&limit=%d", c.Addr, metric, c.Limit)
resp, err := http.Get(q)
resp, err := c.c.Get(q)
if err != nil {
return nil, fmt.Errorf("failed to set GET request to %q: %s", q, err)
}
@@ -184,7 +188,7 @@ func (c Client) GetData(series Meta, rt RetentionMeta, start int64, end int64, m
series.Metric, tagStr)
q := fmt.Sprintf("%s/api/query?%s", c.Addr, queryStr)
resp, err := http.Get(q)
resp, err := c.c.Get(q)
if err != nil {
return Metric{}, fmt.Errorf("failed to send GET request to %q: %s", q, err)
}
@@ -325,6 +329,7 @@ func NewClient(cfg Config) (*Client, error) {
Normalize: cfg.Normalize,
HardTS: cfg.HardTS,
MsecsTime: cfg.MsecsTime,
c: &http.Client{Transport: cfg.Transport},
}
return client, nil
}

View File

@@ -11,9 +11,9 @@ import (
"strings"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
"github.com/gogo/protobuf/proto"
"github.com/golang/snappy"
"github.com/prometheus/prometheus/prompb"
@@ -64,6 +64,13 @@ type Config struct {
// LabelName, LabelValue stands for label=~value pair used for read requests.
// Is optional.
LabelName, LabelValue string
// Optional cert file, key file, CA file and server name for client side TLS configuration
CertFile string
KeyFile string
CAFile string
ServerName string
// TLSSkipVerify defines whether to skip TLS certificate verification when connecting to the remote read address.
InsecureSkipVerify bool
}
@@ -103,10 +110,15 @@ func NewClient(cfg Config) (*Client, error) {
}
}
tr, err := httputils.Transport(cfg.Addr, cfg.CertFile, cfg.KeyFile, cfg.CAFile, cfg.ServerName, cfg.InsecureSkipVerify)
if err != nil {
return nil, fmt.Errorf("failed to create transport: %s", err)
}
c := &Client{
c: &http.Client{
Timeout: cfg.Timeout,
Transport: utils.Transport(cfg.Addr, cfg.InsecureSkipVerify),
Transport: tr,
},
addr: strings.TrimSuffix(cfg.Addr, "/"),
disablePathAppend: cfg.DisablePathAppend,

View File

@@ -1,4 +1,4 @@
package utils
package main
import (
"fmt"
@@ -13,8 +13,7 @@ const (
maxTimeMsecs = int64(1<<63-1) / 1e6
)
// GetTime returns time from the given string.
func GetTime(s string) (time.Time, error) {
func parseTime(s string) (time.Time, error) {
secs, err := promutils.ParseTime(s)
if err != nil {
return time.Time{}, fmt.Errorf("cannot parse %s: %w", s, err)

View File

@@ -1,4 +1,4 @@
package utils
package main
import (
"testing"
@@ -165,7 +165,7 @@ func TestGetTime(t *testing.T) {
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := GetTime(tt.s)
got, err := parseTime(tt.s)
if (err != nil) != tt.wantErr {
t.Errorf("ParseTime() error = %v, wantErr %v", err, tt.wantErr)
return

View File

@@ -1,25 +0,0 @@
package utils
import (
"crypto/tls"
"net/http"
"strings"
)
// Transport creates http.Transport object based on provided URL.
// Returns Transport with TLS configuration if URL contains `https` prefix
func Transport(URL string, insecureSkipVerify bool) *http.Transport {
t := http.DefaultTransport.(*http.Transport).Clone()
if !strings.HasPrefix(URL, "https") {
return t
}
t.TLSClientConfig = TLSConfig(insecureSkipVerify)
return t
}
// TLSConfig creates tls.Config object from provided arguments
func TLSConfig(insecureSkipVerify bool) *tls.Config {
return &tls.Config{
InsecureSkipVerify: insecureSkipVerify,
}
}

View File

@@ -16,7 +16,6 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/limiter"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/native"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/stepper"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/utils"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmctl/vm"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutils"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
@@ -54,14 +53,14 @@ func (p *vmNativeProcessor) run(ctx context.Context) error {
startTime: time.Now(),
}
start, err := utils.GetTime(p.filter.TimeStart)
start, err := parseTime(p.filter.TimeStart)
if err != nil {
return fmt.Errorf("failed to parse %s, provided: %s, error: %w", vmNativeFilterTimeStart, p.filter.TimeStart, err)
}
end := time.Now().In(start.Location())
if p.filter.TimeEnd != "" {
end, err = utils.GetTime(p.filter.TimeEnd)
end, err = parseTime(p.filter.TimeEnd)
if err != nil {
return fmt.Errorf("failed to parse %s, provided: %s, error: %w", vmNativeFilterTimeEnd, p.filter.TimeEnd, err)
}

View File

@@ -0,0 +1,85 @@
package datadogsketches
import (
"net/http"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/relabel"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/prompbmarshal"
parserCommon "github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/common"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogsketches"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogsketches/stream"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/datadogutils"
"github.com/VictoriaMetrics/metrics"
)
var (
rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="datadogsketches"}`)
rowsPerInsert = metrics.NewHistogram(`vm_rows_per_insert{type="datadogsketches"}`)
)
// InsertHandlerForHTTP processes remote write for DataDog POST /api/beta/sketches request.
func InsertHandlerForHTTP(req *http.Request) error {
extraLabels, err := parserCommon.GetExtraLabels(req)
if err != nil {
return err
}
ce := req.Header.Get("Content-Encoding")
return stream.Parse(req.Body, ce, func(sketches []*datadogsketches.Sketch) error {
return insertRows(sketches, extraLabels)
})
}
func insertRows(sketches []*datadogsketches.Sketch, extraLabels []prompbmarshal.Label) error {
ctx := common.GetInsertCtx()
defer common.PutInsertCtx(ctx)
rowsLen := 0
for _, sketch := range sketches {
rowsLen += sketch.RowsCount()
}
ctx.Reset(rowsLen)
rowsTotal := 0
hasRelabeling := relabel.HasRelabeling()
for _, sketch := range sketches {
ms := sketch.ToSummary()
for _, m := range ms {
ctx.Labels = ctx.Labels[:0]
ctx.AddLabel("", m.Name)
for _, label := range m.Labels {
ctx.AddLabel(label.Name, label.Value)
}
for _, tag := range sketch.Tags {
name, value := datadogutils.SplitTag(tag)
if name == "host" {
name = "exported_host"
}
ctx.AddLabel(name, value)
}
for j := range extraLabels {
label := &extraLabels[j]
ctx.AddLabel(label.Name, label.Value)
}
if hasRelabeling {
ctx.ApplyRelabeling()
}
if len(ctx.Labels) == 0 {
// Skip metric without labels.
continue
}
ctx.SortLabelsIfNeeded()
var metricNameRaw []byte
var err error
for _, p := range m.Points {
metricNameRaw, err = ctx.WriteDataPointExt(metricNameRaw, ctx.Labels, p.Timestamp, p.Value)
if err != nil {
return err
}
}
rowsTotal += len(m.Points)
}
}
rowsInserted.Add(rowsTotal)
rowsPerInsert.Update(float64(rowsTotal))
return ctx.FlushBufs()
}

View File

@@ -13,6 +13,7 @@ import (
vminsertCommon "github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/common"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/csvimport"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/datadogsketches"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/datadogv1"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/datadogv2"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/graphite"
@@ -29,6 +30,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vminsert/vmimport"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/auth"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/bytesutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/influxutils"
graphiteserver "github.com/VictoriaMetrics/VictoriaMetrics/lib/ingestserver/graphite"
@@ -62,7 +64,8 @@ var (
"See also -opentsdbHTTPListenAddr.useProxyProtocol")
opentsdbHTTPUseProxyProtocol = flag.Bool("opentsdbHTTPListenAddr.useProxyProtocol", false, "Whether to use proxy protocol for connections accepted "+
"at -opentsdbHTTPListenAddr . See https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt")
configAuthKey = flag.String("configAuthKey", "", "Authorization key for accessing /config page. It must be passed via authKey query arg")
configAuthKey = flagutil.NewPassword("configAuthKey", "Authorization key for accessing /config page. It must be passed via authKey query arg")
reloadAuthKey = flagutil.NewPassword("reloadAuthKey", "Auth key for /-/reload http endpoint. It must be passed as authKey=...")
maxLabelsPerTimeseries = flag.Int("maxLabelsPerTimeseries", 30, "The maximum number of labels accepted per time series. Superfluous labels are dropped. In this case the vm_metrics_with_dropped_labels_total metric at /metrics page is incremented")
maxLabelValueLen = flag.Int("maxLabelValueLen", 16*1024, "The maximum length of label values in the accepted time series. Longer label values are truncated. In this case the vm_too_long_label_values_total metric at /metrics page is incremented")
)
@@ -269,6 +272,15 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
w.WriteHeader(202)
fmt.Fprintf(w, `{"status":"ok"}`)
return true
case "/datadog/api/beta/sketches":
datadogsketchesWriteRequests.Inc()
if err := datadogsketches.InsertHandlerForHTTP(r); err != nil {
datadogsketchesWriteErrors.Inc()
httpserver.Errorf(w, r, "%s", err)
return true
}
w.WriteHeader(202)
return true
case "/datadog/api/v1/validate":
datadogValidateRequests.Inc()
// See https://docs.datadoghq.com/api/latest/authentication/#validate-api-key
@@ -315,7 +327,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/prometheus/config", "/config":
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
if !httpserver.CheckAuthFlag(w, r, configAuthKey.Get(), "configAuthKey") {
return true
}
promscrapeConfigRequests.Inc()
@@ -324,7 +336,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
case "/prometheus/api/v1/status/config", "/api/v1/status/config":
// See https://prometheus.io/docs/prometheus/latest/querying/api/#config
if !httpserver.CheckAuthFlag(w, r, *configAuthKey, "configAuthKey") {
if !httpserver.CheckAuthFlag(w, r, configAuthKey.Get(), "configAuthKey") {
return true
}
promscrapeStatusConfigRequests.Inc()
@@ -334,9 +346,12 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
fmt.Fprintf(w, `{"status":"success","data":{"yaml":%q}}`, bb.B)
return true
case "/prometheus/-/reload", "/-/reload":
if !httpserver.CheckAuthFlag(w, r, reloadAuthKey.Get(), "reloadAuthKey") {
return true
}
promscrapeConfigReloadRequests.Inc()
procutil.SelfSIGHUP()
w.WriteHeader(http.StatusNoContent)
w.WriteHeader(http.StatusOK)
return true
case "/ready":
if rdy := atomic.LoadInt32(&promscrape.PendingScrapeConfigs); rdy > 0 {
@@ -389,6 +404,9 @@ var (
datadogv2WriteRequests = metrics.NewCounter(`vm_http_requests_total{path="/datadog/api/v2/series", protocol="datadog"}`)
datadogv2WriteErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/datadog/api/v2/series", protocol="datadog"}`)
datadogsketchesWriteRequests = metrics.NewCounter(`vm_http_requests_total{path="/datadog/api/beta/sketches", protocol="datadog"}`)
datadogsketchesWriteErrors = metrics.NewCounter(`vm_http_request_errors_total{path="/datadog/api/beta/sketches", protocol="datadog"}`)
datadogValidateRequests = metrics.NewCounter(`vm_http_requests_total{path="/datadog/api/v1/validate", protocol="datadog"}`)
datadogCheckRunRequests = metrics.NewCounter(`vm_http_requests_total{path="/datadog/api/v1/check_run", protocol="datadog"}`)
datadogIntakeRequests = metrics.NewCounter(`vm_http_requests_total{path="/datadog/intake", protocol="datadog"}`)

View File

@@ -37,7 +37,8 @@ func main() {
buildinfo.Init()
logger.Init()
go httpserver.Serve(*httpListenAddr, false, nil)
listenAddrs := []string{*httpListenAddr}
go httpserver.Serve(listenAddrs, nil, nil)
srcFS, err := newSrcFS()
if err != nil {
@@ -62,8 +63,8 @@ func main() {
dstFS.MustStop()
startTime := time.Now()
logger.Infof("gracefully shutting down http server for metrics at %q", *httpListenAddr)
if err := httpserver.Stop(*httpListenAddr); err != nil {
logger.Infof("gracefully shutting down http server for metrics at %q", listenAddrs)
if err := httpserver.Stop(listenAddrs); err != nil {
logger.Fatalf("cannot stop http server for metrics: %s", err)
}
logger.Infof("successfully shut down http server for metrics in %.3f seconds", time.Since(startTime).Seconds())

View File

@@ -18,6 +18,7 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutils"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/cgroup"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httpserver"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/httputils"
@@ -29,13 +30,13 @@ import (
)
var (
deleteAuthKey = flag.String("deleteAuthKey", "", "authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries")
deleteAuthKey = flagutil.NewPassword("deleteAuthKey", "authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series and /tags/delSeries")
maxConcurrentRequests = flag.Int("search.maxConcurrentRequests", getDefaultMaxConcurrentRequests(), "The maximum number of concurrent search requests. "+
"It shouldn't be high, since a single request can saturate all the CPU cores, while many concurrently executed requests may require high amounts of memory. "+
"See also -search.maxQueueDuration and -search.maxMemoryPerQuery")
maxQueueDuration = flag.Duration("search.maxQueueDuration", 10*time.Second, "The maximum time the request waits for execution when -search.maxConcurrentRequests "+
"limit is reached; see also -search.maxQueryDuration")
resetCacheAuthKey = flag.String("search.resetCacheAuthKey", "", "Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call")
resetCacheAuthKey = flagutil.NewPassword("search.resetCacheAuthKey", "Optional authKey for resetting rollup cache via /internal/resetRollupResultCache call")
logSlowQueryDuration = flag.Duration("search.logSlowQueryDuration", 5*time.Second, "Log queries with execution time exceeding this value. Zero disables slow query logging. "+
"See also -search.logQueryMemoryUsage")
vmalertProxyURL = flag.String("vmalert.proxyURL", "", "Optional URL for proxying requests to vmalert. For example, if -vmalert.proxyURL=http://vmalert:8880 , then alerting API requests such as /api/v1/rules from Grafana will be proxied to http://vmalert:8880/api/v1/rules")
@@ -148,8 +149,9 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
"are executed. Possible solutions: to reduce query load; to add more compute resources to the server; "+
"to increase -search.maxQueueDuration=%s; to increase -search.maxQueryDuration; to increase -search.maxConcurrentRequests",
d.Seconds(), *maxConcurrentRequests, maxQueueDuration),
StatusCode: http.StatusServiceUnavailable,
StatusCode: http.StatusTooManyRequests,
}
w.Header().Add("Retry-After", "10")
httpserver.Errorf(w, r, "%s", err)
return true
}
@@ -170,7 +172,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
if path == "/internal/resetRollupResultCache" {
if !httpserver.CheckAuthFlag(w, r, *resetCacheAuthKey, "resetCacheAuthKey") {
if !httpserver.CheckAuthFlag(w, r, resetCacheAuthKey.Get(), "resetCacheAuthKey") {
return true
}
promql.ResetRollupResultCache()
@@ -367,7 +369,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/tags/delSeries":
if !httpserver.CheckAuthFlag(w, r, *deleteAuthKey, "deleteAuthKey") {
if !httpserver.CheckAuthFlag(w, r, deleteAuthKey.Get(), "deleteAuthKey") {
return true
}
graphiteTagsDelSeriesRequests.Inc()
@@ -386,7 +388,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
}
return true
case "/api/v1/admin/tsdb/delete_series":
if !httpserver.CheckAuthFlag(w, r, *deleteAuthKey, "deleteAuthKey") {
if !httpserver.CheckAuthFlag(w, r, deleteAuthKey.Get(), "deleteAuthKey") {
return true
}
deleteRequests.Inc()
@@ -425,6 +427,14 @@ func handleStaticAndSimpleRequests(w http.ResponseWriter, r *http.Request, path
}
return true
}
if path == "/vmui/timezone" {
httpserver.EnableCORS(w, r)
if err := handleVMUITimezone(w); err != nil {
httpserver.Errorf(w, r, "%s", err)
return true
}
return true
}
if strings.HasPrefix(path, "/vmui/") {
if strings.HasPrefix(path, "/vmui/static/") {
// Allow clients caching static contents for long period of time, since it shouldn't change over time.

View File

@@ -5,10 +5,12 @@ import (
"errors"
"flag"
"fmt"
"reflect"
"sort"
"sync"
"sync/atomic"
"time"
"unsafe"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/searchutils"
"github.com/VictoriaMetrics/VictoriaMetrics/app/vmstorage"
@@ -88,30 +90,6 @@ type timeseriesWork struct {
rowsProcessed int
}
func (tsw *timeseriesWork) reset() {
tsw.mustStop = nil
tsw.rss = nil
tsw.pts = nil
tsw.f = nil
tsw.err = nil
tsw.rowsProcessed = 0
}
func getTimeseriesWork() *timeseriesWork {
v := tswPool.Get()
if v == nil {
v = &timeseriesWork{}
}
return v.(*timeseriesWork)
}
func putTimeseriesWork(tsw *timeseriesWork) {
tsw.reset()
tswPool.Put(tsw)
}
var tswPool sync.Pool
func (tsw *timeseriesWork) do(r *Result, workerID uint) error {
if atomic.LoadUint32(tsw.mustStop) != 0 {
return nil
@@ -270,22 +248,20 @@ func (rss *Results) runParallel(qt *querytracer.Tracer, f func(rs *Result, worke
maxWorkers := MaxWorkers()
if maxWorkers == 1 || tswsLen == 1 {
// It is faster to process time series in the current goroutine.
tsw := getTimeseriesWork()
var tsw timeseriesWork
tmpResult := getTmpResult()
rowsProcessedTotal := 0
var err error
for i := range rss.packedTimeseries {
initTimeseriesWork(tsw, &rss.packedTimeseries[i])
initTimeseriesWork(&tsw, &rss.packedTimeseries[i])
err = tsw.do(&tmpResult.rs, 0)
rowsReadPerSeries.Update(float64(tsw.rowsProcessed))
rowsProcessedTotal += tsw.rowsProcessed
if err != nil {
break
}
tsw.reset()
}
putTmpResult(tmpResult)
putTimeseriesWork(tsw)
return rowsProcessedTotal, err
}
@@ -295,11 +271,9 @@ func (rss *Results) runParallel(qt *querytracer.Tracer, f func(rs *Result, worke
// which reduces the scalability on systems with many CPU cores.
// Prepare the work for workers.
tsws := make([]*timeseriesWork, len(rss.packedTimeseries))
tsws := make([]timeseriesWork, len(rss.packedTimeseries))
for i := range rss.packedTimeseries {
tsw := getTimeseriesWork()
initTimeseriesWork(tsw, &rss.packedTimeseries[i])
tsws[i] = tsw
initTimeseriesWork(&tsws[i], &rss.packedTimeseries[i])
}
// Prepare worker channels.
@@ -314,9 +288,9 @@ func (rss *Results) runParallel(qt *querytracer.Tracer, f func(rs *Result, worke
}
// Spread work among workers.
for i, tsw := range tsws {
for i := range tsws {
idx := i % len(workChs)
workChs[idx] <- tsw
workChs[idx] <- &tsws[i]
}
// Mark worker channels as closed.
for _, workCh := range workChs {
@@ -339,14 +313,14 @@ func (rss *Results) runParallel(qt *querytracer.Tracer, f func(rs *Result, worke
// Collect results.
var firstErr error
rowsProcessedTotal := 0
for _, tsw := range tsws {
for i := range tsws {
tsw := &tsws[i]
if tsw.err != nil && firstErr == nil {
// Return just the first error, since other errors are likely duplicate the first error.
firstErr = tsw.err
}
rowsReadPerSeries.Update(float64(tsw.rowsProcessed))
rowsProcessedTotal += tsw.rowsProcessed
putTimeseriesWork(tsw)
}
return rowsProcessedTotal, firstErr
}
@@ -1044,7 +1018,7 @@ func ExportBlocks(qt *querytracer.Tracer, sq *storage.SearchQuery, deadline sear
for xw := range workCh {
if err := f(&xw.mn, &xw.b, tr, workerID); err != nil {
errGlobalLock.Lock()
if errGlobal != nil {
if errGlobal == nil {
errGlobal = err
atomic.StoreUint32(&mustStop, 1)
}
@@ -1169,15 +1143,44 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
maxSeriesCount := sr.Init(qt, vmstorage.Storage, tfss, tr, sq.MaxMetrics, deadline.Deadline())
indexSearchDuration.UpdateDuration(startTime)
type blockRefs struct {
brsPrealloc [4]blockRef
brs []blockRef
brs []blockRef
}
m := make(map[string]*blockRefs, maxSeriesCount)
orderedMetricNames := make([]string, 0, maxSeriesCount)
blocksRead := 0
samples := 0
tbf := getTmpBlocksFile()
var buf []byte
var metricNamePrev []byte
// metricNamesBuf is used for holding all the loaded unique metric names at m and orderedMetricNames.
// It should reduce pressure on Go GC by reducing the number of string allocations
// when constructing metricName string from byte slice.
metricNamesBufCap := maxSeriesCount * 100
if metricNamesBufCap > maxFastAllocBlockSize {
metricNamesBufCap = maxFastAllocBlockSize
}
metricNamesBuf := make([]byte, 0, metricNamesBufCap)
// brssPool is used for holding all the blockRefs objects across all the loaded time series.
// It should reduce pressure on Go GC by reducing the number of blockRefs allocations.
brssPool := make([]blockRefs, 0, maxSeriesCount)
// brsPool is used for holding the most of blockRefs.brs slices across all the loaded time series.
// It should reduce pressure on Go GC by reducing the number of allocations for blockRefs.brs slices.
brsPoolCap := uintptr(maxSeriesCount)
if brsPoolCap > maxFastAllocBlockSize/unsafe.Sizeof(blockRef{}) {
brsPoolCap = maxFastAllocBlockSize / unsafe.Sizeof(blockRef{})
}
brsPool := make([]blockRef, 0, brsPoolCap)
// m maps from metricName to the index of blockRefs inside brssPool
m := make(map[string]int, maxSeriesCount)
// orderedMetricNames contains the list of loaded unique metric names
// in the load order. This order is important for triggering sequential data reading.
orderedMetricNames := make([]string, 0, maxSeriesCount)
var brsIdx int
for sr.NextMetricBlock() {
blocksRead++
if deadline.Exceeded() {
@@ -1190,8 +1193,10 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
if *maxSamplesPerQuery > 0 && samples > *maxSamplesPerQuery {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
return nil, fmt.Errorf("cannot select more than -search.maxSamplesPerQuery=%d samples; possible solutions: to increase the -search.maxSamplesPerQuery; to reduce time range for the query; to use more specific label filters in order to select lower number of series", *maxSamplesPerQuery)
return nil, fmt.Errorf("cannot select more than -search.maxSamplesPerQuery=%d samples; possible solutions: increase the -search.maxSamplesPerQuery; "+
"reduce time range for the query; use more specific label filters in order to select fewer series", *maxSamplesPerQuery)
}
buf = br.Marshal(buf[:0])
addr, err := tbf.WriteBlockRefData(buf)
if err != nil {
@@ -1199,24 +1204,59 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
putStorageSearch(sr)
return nil, fmt.Errorf("cannot write %d bytes to temporary file: %w", len(buf), err)
}
// Do not intern mb.MetricName, since it leads to increased memory usage.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3692
metricName := sr.MetricBlockRef.MetricName
brs := m[string(metricName)]
if brs == nil {
brs = &blockRefs{}
brs.brs = brs.brsPrealloc[:0]
if metricNamePrev == nil || string(metricName) != string(metricNamePrev) {
idx, ok := m[string(metricName)]
if !ok {
if cap(brssPool) > len(brssPool) {
brssPool = brssPool[:len(brssPool)+1]
} else {
brssPool = append(brssPool, blockRefs{})
}
idx = len(brssPool) - 1
}
brsIdx = idx
metricNamePrev = append(metricNamePrev[:0], metricName...)
}
brs := &brssPool[brsIdx]
partRef := br.PartRef()
if uintptr(cap(brsPool)) >= maxFastAllocBlockSize/unsafe.Sizeof(blockRef{}) && len(brsPool) == cap(brsPool) {
// Allocate a new brsPool in order to avoid slow allocation of an object
// bigger than maxFastAllocBlockSize bytes at append() below.
brsPool = make([]blockRef, 0, maxFastAllocBlockSize/unsafe.Sizeof(blockRef{}))
}
if canAppendToBlockRefPool(brsPool, brs.brs) {
// It is safe appending blockRef to brsPool, since there are no other items added there yet.
brsPool = append(brsPool, blockRef{
partRef: partRef,
addr: addr,
})
brs.brs = brsPool[len(brsPool)-len(brs.brs)-1 : len(brsPool) : len(brsPool)]
} else {
// It is unsafe appending blockRef to brsPool, since there are other items added there.
// So just append it to brs.brs.
brs.brs = append(brs.brs, blockRef{
partRef: partRef,
addr: addr,
})
}
brs.brs = append(brs.brs, blockRef{
partRef: br.PartRef(),
addr: addr,
})
if len(brs.brs) == 1 {
metricNameStr := string(metricName)
if cap(metricNamesBuf) >= maxFastAllocBlockSize && len(metricNamesBuf)+len(metricName) > cap(metricNamesBuf) {
// Allocate a new metricNamesBuf in order to avoid slow allocation of byte slice
// bigger than maxFastAllocBlockSize bytes at append() below.
metricNamesBuf = make([]byte, 0, maxFastAllocBlockSize)
}
metricNamesBufLen := len(metricNamesBuf)
metricNamesBuf = append(metricNamesBuf, metricName...)
metricNameStr := bytesutil.ToUnsafeString(metricNamesBuf[metricNamesBufLen:])
orderedMetricNames = append(orderedMetricNames, metricNameStr)
m[metricNameStr] = brs
m[metricNameStr] = brsIdx
}
}
if err := sr.Error(); err != nil {
putTmpBlocksFile(tbf)
putStorageSearch(sr)
@@ -1239,7 +1279,7 @@ func ProcessSearchQuery(qt *querytracer.Tracer, sq *storage.SearchQuery, deadlin
for i, metricName := range orderedMetricNames {
pts[i] = packedTimeseries{
metricName: metricName,
brs: m[metricName].brs,
brs: brssPool[m[metricName]].brs,
}
}
rss.packedTimeseries = pts
@@ -1255,6 +1295,25 @@ type blockRef struct {
addr tmpBlockAddr
}
// canAppendToBlockRefPool returns true if a points to the pool and the last item in a is the last item in the pool.
//
// In this case it is safe appending an item to the pool and then updating the a, so it refers to the extended slice.
//
// True is also returned if a is nil, since in this case it is safe appending an item to the pool and pointing a
// to the last item in the pool.
func canAppendToBlockRefPool(pool, a []blockRef) bool {
if a == nil {
return true
}
if len(a) > len(pool) {
// a doesn't belong to pool
return false
}
shPool := (*reflect.SliceHeader)(unsafe.Pointer(&pool))
shA := (*reflect.SliceHeader)(unsafe.Pointer(&a))
return shPool.Data+uintptr(shPool.Len)*unsafe.Sizeof(blockRef{}) == shA.Data+uintptr(shA.Len)*unsafe.Sizeof(blockRef{})
}
func setupTfss(qt *querytracer.Tracer, tr storage.TimeRange, tagFilterss [][]storage.TagFilter, maxMetrics int, deadline searchutils.Deadline) ([]*storage.TagFilters, error) {
tfss := make([]*storage.TagFilters, 0, len(tagFilterss))
for _, tagFilters := range tagFilterss {
@@ -1300,3 +1359,8 @@ func applyGraphiteRegexpFilter(filter string, ss []string) ([]string, error) {
}
return dst, nil
}
// Go uses fast allocations for block sizes up to 32Kb.
//
// See https://github.com/golang/go/blob/704401ffa06c60e059c9e6e4048045b4ff42530a/src/runtime/malloc.go#L11
const maxFastAllocBlockSize = 32 * 1024

View File

@@ -136,21 +136,25 @@ func (tbf *tmpBlocksFile) Finalize() error {
return fmt.Errorf("cannot write the remaining %d bytes to %q: %w", len(tbf.buf), fname, err)
}
tbf.buf = tbf.buf[:0]
r := fs.MustOpenReaderAt(fname)
r := fs.NewReaderAt(tbf.f)
// Hint the OS that the file is read almost sequentiallly.
// This should reduce the number of disk seeks, which is important
// for HDDs.
r.MustFadviseSequentialRead(true)
// Collect local stats in order to improve performance on systems with big number of CPU cores.
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3966
r.SetUseLocalStats()
tbf.r = r
tbf.f = nil
return nil
}
func (tbf *tmpBlocksFile) MustReadBlockRefAt(partRef storage.PartRef, addr tmpBlockAddr) storage.BlockRef {
var buf []byte
if tbf.f == nil {
if tbf.r == nil {
buf = tbf.buf[addr.offset : addr.offset+uint64(addr.size)]
} else {
bb := tmpBufPool.Get()
@@ -169,21 +173,45 @@ func (tbf *tmpBlocksFile) MustReadBlockRefAt(partRef storage.PartRef, addr tmpBl
var tmpBufPool bytesutil.ByteBufferPool
func (tbf *tmpBlocksFile) MustClose() {
if tbf.f == nil {
if tbf.f != nil {
// tbf.f could be non-nil if Finalize wasn't called.
// In this case tbf.r must be nil.
if tbf.r != nil {
logger.Panicf("BUG: tbf.r must be nil when tbf.f!=nil")
}
// Try removing the file before closing it in order to prevent from flushing the in-memory data
// from page cache to the disk and save disk write IO. This may fail on non-posix systems such as Windows.
// Gracefully handle this case by attempting to remove the file after closing it.
fname := tbf.f.Name()
errRemove := os.Remove(fname)
if err := tbf.f.Close(); err != nil {
logger.Panicf("FATAL: cannot close %q: %s", fname, err)
}
if errRemove != nil {
if err := os.Remove(fname); err != nil {
logger.Panicf("FATAL: cannot remove %q: %s", fname, err)
}
}
tbf.f = nil
return
}
if tbf.r != nil {
// tbf.r could be nil if Finalize wasn't called.
tbf.r.MustClose()
}
fname := tbf.f.Name()
if err := tbf.f.Close(); err != nil {
logger.Panicf("FATAL: cannot close %q: %s", fname, err)
if tbf.r == nil {
// Nothing to do
return
}
// We cannot remove unclosed at non-posix filesystems, like windows
if err := os.Remove(fname); err != nil {
logger.Panicf("FATAL: cannot remove %q: %s", fname, err)
// Try removing the file before closing it in order to prevent from flushing the in-memory data
// from page cache to the disk and save disk write IO. This may fail on non-posix systems such as Windows.
// Gracefully handle this case by attempting to remove the file after closing it.
fname := tbf.r.Path()
errRemove := os.Remove(fname)
tbf.r.MustClose()
if errRemove != nil {
if err := os.Remove(fname); err != nil {
logger.Panicf("FATAL: cannot remove %q: %s", fname, err)
}
}
tbf.f = nil
tbf.r = nil
}

View File

@@ -82,7 +82,7 @@ textarea { margin: 1em }
<h3>Tutorial for WITH expressions in <a href="https://docs.victoriametrics.com/MetricsQL.html">MetricsQL</a></h3>
<p>
Let's look at the following real query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
Let's look at the following real query from <a href="https://grafana.com/grafana/dashboards/1860">Node Exporter Full</a> dashboard:
</p>
<pre>
@@ -146,7 +146,7 @@ my_resource_utilization(
</p>
<p>
Let's take another nice query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
Let's take another nice query from <a href="https://grafana.com/grafana/dashboards/1860">Node Exporter Full</a> dashboard:
</p>
<pre>

View File

@@ -195,7 +195,7 @@ func streamwithExprsTutorial(qw422016 *qt422016.Writer) {
<h3>Tutorial for WITH expressions in <a href="https://docs.victoriametrics.com/MetricsQL.html">MetricsQL</a></h3>
<p>
Let's look at the following real query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
Let's look at the following real query from <a href="https://grafana.com/grafana/dashboards/1860">Node Exporter Full</a> dashboard:
</p>
<pre>
@@ -259,7 +259,7 @@ my_resource_utilization(
</p>
<p>
Let's take another nice query from <a href="https://grafana.com/grafana/dashboards/1860-node-exporter-full/">Node Exporter Full</a> dashboard:
Let's take another nice query from <a href="https://grafana.com/grafana/dashboards/1860">Node Exporter Full</a> dashboard:
</p>
<pre>

View File

@@ -321,7 +321,9 @@ func exportHandler(qt *querytracer.Tracer, w http.ResponseWriter, cp *commonPara
firstLineSent := uint32(0)
writeLineFunc = func(xb *exportBlock, workerID uint) error {
bb := sw.getBuffer(workerID)
if atomic.CompareAndSwapUint32(&firstLineOnce, 0, 1) {
// Use atomic.LoadUint32() in front of atomic.CompareAndSwapUint32() in order to avoid slow inter-CPU synchronization
// in fast path after the first line has been already sent.
if atomic.LoadUint32(&firstLineOnce) == 0 && atomic.CompareAndSwapUint32(&firstLineOnce, 0, 1) {
// Send the first line to sw.bw
WriteExportPromAPILine(bb, xb)
_, err := sw.bw.Write(bb.B)
@@ -497,7 +499,10 @@ func LabelValuesHandler(qt *querytracer.Tracer, startTime time.Time, labelName s
if err != nil {
return err
}
sq := storage.NewSearchQuery(cp.start, cp.end, cp.filterss, *maxUniqueTimeseries)
// Do not limit the number of unique time series, which could be scanned
// during the search for matching label values, since users expect this API
// must always work.
sq := storage.NewSearchQuery(cp.start, cp.end, cp.filterss, -1)
labelValues, err := netstorage.LabelValues(qt, labelName, sq, limit, cp.deadline)
if err != nil {
return fmt.Errorf("cannot obtain values for label %q: %w", labelName, err)
@@ -594,7 +599,10 @@ func LabelsHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseW
if err != nil {
return err
}
sq := storage.NewSearchQuery(cp.start, cp.end, cp.filterss, *maxUniqueTimeseries)
// Do not limit the number of unique time series, which could be scanned
// during the search for matching label values, since users expect this API
// must always work.
sq := storage.NewSearchQuery(cp.start, cp.end, cp.filterss, -1)
labels, err := netstorage.LabelNames(qt, sq, limit, cp.deadline)
if err != nil {
return fmt.Errorf("cannot obtain labels: %w", err)
@@ -718,9 +726,6 @@ func QueryHandler(qt *querytracer.Tracer, startTime time.Time, w http.ResponseWr
start -= offset
end := start
start = end - window
if start < 0 {
start = 0
}
// Do not include data point with a timestamp matching the lower boundary of the window as Prometheus does.
start++
if end < start {

View File

@@ -111,41 +111,48 @@ func aggrFuncExt(afe func(tss []*timeseries, modifier *metricsql.ModifierExpr) [
modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) ([]*timeseries, error) {
m := aggrPrepareSeries(argOrig, modifier, maxSeries, keepOriginal)
rvs := make([]*timeseries, 0, len(m))
for _, tss := range m {
rv := afe(tss, modifier)
for _, tssl := range m {
rv := afe(tssl.tss, modifier)
rvs = append(rvs, rv...)
}
return rvs, nil
}
func aggrPrepareSeries(argOrig []*timeseries, modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) map[string][]*timeseries {
func aggrPrepareSeries(argOrig []*timeseries, modifier *metricsql.ModifierExpr, maxSeries int, keepOriginal bool) map[string]*tssList {
// Remove empty time series, e.g. series with all NaN samples,
// since such series are ignored by aggregate functions.
argOrig = removeEmptySeries(argOrig)
arg := copyTimeseriesMetricNames(argOrig, keepOriginal)
// Perform grouping.
m := make(map[string][]*timeseries)
m := make(map[string]*tssList)
bb := bbPool.Get()
for i, ts := range arg {
removeGroupTags(&ts.MetricName, modifier)
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
k := string(bb.B)
k := bb.B
if keepOriginal {
ts = argOrig[i]
}
tss := m[k]
if tss == nil && maxSeries > 0 && len(m) >= maxSeries {
// We already reached time series limit after grouping. Skip other time series.
continue
tssl := m[string(k)]
if tssl == nil {
if maxSeries > 0 && len(m) >= maxSeries {
// We already reached time series limit after grouping. Skip other time series.
continue
}
tssl = &tssList{}
m[string(k)] = tssl
}
tss = append(tss, ts)
m[k] = tss
tssl.tss = append(tssl.tss, ts)
}
bbPool.Put(bb)
return m
}
type tssList struct {
tss []*timeseries
}
func aggrFuncAny(afa *aggrFuncArg) ([]*timeseries, error) {
tss, err := getAggrTimeseries(afa.args)
if err != nil {
@@ -626,8 +633,8 @@ func aggrFuncCountValues(afa *aggrFuncArg) ([]*timeseries, error) {
m := aggrPrepareSeries(args[1], &afa.ae.Modifier, afa.ae.Limit, false)
rvs := make([]*timeseries, 0, len(m))
for _, tss := range m {
rv, err := afe(tss, modifier)
for _, tssl := range m {
rv, err := afe(tssl.tss, modifier)
if err != nil {
return nil, err
}

View File

@@ -111,8 +111,8 @@ func (iafc *incrementalAggrFuncContext) updateTimeseries(tsOrig *timeseries, wor
removeGroupTags(&ts.MetricName, &iafc.ae.Modifier)
bb := bbPool.Get()
bb.B = marshalMetricNameSorted(bb.B[:0], &ts.MetricName)
k := string(bb.B)
iac := m[k]
k := bb.B
iac := m[string(k)]
if iac == nil {
if iafc.ae.Limit > 0 && len(m) >= iafc.ae.Limit {
// Skip this time series, since the limit on the number of output time series has been already reached.
@@ -131,7 +131,7 @@ func (iafc *incrementalAggrFuncContext) updateTimeseries(tsOrig *timeseries, wor
ts: tsAggr,
values: make([]float64, len(ts.Values)),
}
m[k] = iac
m[string(k)] = iac
}
bbPool.Put(bb)
iafc.callbacks.updateAggrFunc(iac, ts.Values)

View File

@@ -257,7 +257,7 @@ func groupJoin(singleTimeseriesSide string, be *metricsql.BinaryOpExpr, rvsLeft,
var tsCopy timeseries
tsCopy.CopyFromShallowTimestamps(tsLeft)
tsCopy.MetricName.SetTags(joinTags, joinPrefix, skipTags, &tsRight.MetricName)
bb.B = marshalMetricTagsSorted(bb.B[:0], &tsCopy.MetricName)
bb.B = marshalMetricNameSorted(bb.B[:0], &tsCopy.MetricName)
pair, ok := m[string(bb.B)]
if !ok {
m[string(bb.B)] = &tsPair{
@@ -522,7 +522,9 @@ func createTimeseriesMapByTagSet(be *metricsql.BinaryOpExpr, left, right []*time
mn := storage.GetMetricName()
for _, ts := range arg {
mn.CopyFrom(&ts.MetricName)
mn.ResetMetricGroup()
if !be.KeepMetricNames {
mn.ResetMetricGroup()
}
switch groupOp {
case "on":
mn.RemoveTagsOn(groupTags)
@@ -531,7 +533,7 @@ func createTimeseriesMapByTagSet(be *metricsql.BinaryOpExpr, left, right []*time
default:
logger.Panicf("BUG: unexpected binary op modifier %q", groupOp)
}
bb.B = marshalMetricTagsSorted(bb.B[:0], mn)
bb.B = marshalMetricNameSorted(bb.B[:0], mn)
k := string(bb.B)
m[k] = append(m[k], ts)
}

View File

@@ -29,7 +29,8 @@ import (
)
var (
disableCache = flag.Bool("search.disableCache", false, "Whether to disable response caching. This may be useful during data backfilling")
disableCache = flag.Bool("search.disableCache", false, "Whether to disable response caching. This may be useful when ingesting historical data. "+
"See https://docs.victoriametrics.com/#backfilling . See also -search.resetRollupResultCacheOnStartup")
maxPointsSubqueryPerTimeseries = flag.Int("search.maxPointsSubqueryPerTimeseries", 100e3, "The maximum number of points per series, which can be generated by subquery. "+
"See https://valyala.medium.com/prometheus-subqueries-in-victoriametrics-9b1492b720b3")
maxMemoryPerQuery = flagutil.NewBytes("search.maxMemoryPerQuery", 0, "The maximum amounts of memory a single query may consume. "+
@@ -110,6 +111,11 @@ type EvalConfig struct {
End int64
Step int64
// RealStart contains the original start of the interval when executed in subqueries (because Start can be changed for subqueries)
RealStart int64
// RealEnd contains the original end of the interval when executed in subqueries (because End can be changed for subqueries)
RealEnd int64
// MaxSeries is the maximum number of time series, which can be scanned by the query.
// Zero means 'no limit'
MaxSeries int
@@ -151,7 +157,17 @@ type EvalConfig struct {
func copyEvalConfig(src *EvalConfig) *EvalConfig {
var ec EvalConfig
ec.Start = src.Start
if ec.RealStart > 0 {
ec.RealStart = src.RealStart
} else {
ec.RealStart = src.Start
}
ec.End = src.End
if ec.RealEnd > 0 {
ec.RealEnd = src.RealEnd
} else {
ec.RealEnd = src.End
}
ec.Step = src.Step
ec.MaxSeries = src.MaxSeries
ec.MaxPointsPerSeries = src.MaxPointsPerSeries
@@ -1265,7 +1281,7 @@ func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string,
return evalAt(qtChild, timestamp, window)
}
// Calculate the result
tss, ok := getMaxInstantValues(qtChild, tssCached, tssStart, tssEnd)
tss, ok := getMaxInstantValues(qtChild, tssCached, tssStart, tssEnd, timestamp)
if !ok {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains bigger values than tssCached")
deleteCachedSeries(qtChild)
@@ -1327,7 +1343,7 @@ func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string,
return evalAt(qtChild, timestamp, window)
}
// Calculate the result
tss, ok := getMinInstantValues(qtChild, tssCached, tssStart, tssEnd)
tss, ok := getMinInstantValues(qtChild, tssCached, tssStart, tssEnd, timestamp)
if !ok {
qtChild.Printf("cannot apply instant rollup optimization, since tssEnd contains smaller values than tssCached")
deleteCachedSeries(qtChild)
@@ -1392,7 +1408,7 @@ func evalInstantRollup(qt *querytracer.Tracer, ec *EvalConfig, funcName string,
return evalAt(qtChild, timestamp, window)
}
// Calculate the result
tss := getSumInstantValues(qtChild, tssCached, tssStart, tssEnd)
tss := getSumInstantValues(qtChild, tssCached, tssStart, tssEnd, timestamp)
return tss, nil
default:
qt.Printf("instant rollup optimization isn't implemented for %s()", funcName)
@@ -1419,8 +1435,8 @@ func hasDuplicateSeries(tss []*timeseries) bool {
return false
}
func getMinInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries) ([]*timeseries, bool) {
qt = qt.NewChild("calculate the minimum for instant values across series; cached=%d, start=%d, end=%d", len(tssCached), len(tssStart), len(tssEnd))
func getMinInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries, timestamp int64) ([]*timeseries, bool) {
qt = qt.NewChild("calculate the minimum for instant values across series; cached=%d, start=%d, end=%d, timestamp=%d", len(tssCached), len(tssStart), len(tssEnd), timestamp)
defer qt.Done()
getMin := func(a, b float64) float64 {
@@ -1429,13 +1445,13 @@ func getMinInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*
}
return b
}
tss, ok := getMinMaxInstantValues(tssCached, tssStart, tssEnd, getMin)
tss, ok := getMinMaxInstantValues(tssCached, tssStart, tssEnd, timestamp, getMin)
qt.Printf("resulting series=%d; ok=%v", len(tss), ok)
return tss, ok
}
func getMaxInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries) ([]*timeseries, bool) {
qt = qt.NewChild("calculate the maximum for instant values across series; cached=%d, start=%d, end=%d", len(tssCached), len(tssStart), len(tssEnd))
func getMaxInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries, timestamp int64) ([]*timeseries, bool) {
qt = qt.NewChild("calculate the maximum for instant values across series; cached=%d, start=%d, end=%d, timestamp=%d", len(tssCached), len(tssStart), len(tssEnd), timestamp)
defer qt.Done()
getMax := func(a, b float64) float64 {
@@ -1444,12 +1460,12 @@ func getMaxInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*
}
return b
}
tss, ok := getMinMaxInstantValues(tssCached, tssStart, tssEnd, getMax)
tss, ok := getMinMaxInstantValues(tssCached, tssStart, tssEnd, timestamp, getMax)
qt.Printf("resulting series=%d", len(tss))
return tss, ok
}
func getMinMaxInstantValues(tssCached, tssStart, tssEnd []*timeseries, f func(a, b float64) float64) ([]*timeseries, bool) {
func getMinMaxInstantValues(tssCached, tssStart, tssEnd []*timeseries, timestamp int64, f func(a, b float64) float64) ([]*timeseries, bool) {
assertInstantValues(tssCached)
assertInstantValues(tssStart)
assertInstantValues(tssEnd)
@@ -1500,12 +1516,16 @@ func getMinMaxInstantValues(tssCached, tssStart, tssEnd []*timeseries, f func(a,
for _, ts := range m {
rvs = append(rvs, ts)
}
setInstantTimestamp(rvs, timestamp)
return rvs, true
}
// getSumInstantValues calculates tssCached + tssStart - tssEnd
func getSumInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries) []*timeseries {
qt = qt.NewChild("calculate the sum for instant values across series; cached=%d, start=%d, end=%d", len(tssCached), len(tssStart), len(tssEnd))
// getSumInstantValues aggregates tssCached, tssStart, tssEnd time series
// into a new time series with value = tssCached + tssStart - tssEnd
func getSumInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*timeseries, timestamp int64) []*timeseries {
qt = qt.NewChild("calculate the sum for instant values across series; cached=%d, start=%d, end=%d, timestamp=%d", len(tssCached), len(tssStart), len(tssEnd), timestamp)
defer qt.Done()
assertInstantValues(tssCached)
@@ -1550,10 +1570,19 @@ func getSumInstantValues(qt *querytracer.Tracer, tssCached, tssStart, tssEnd []*
for _, ts := range m {
rvs = append(rvs, ts)
}
setInstantTimestamp(rvs, timestamp)
qt.Printf("resulting series=%d", len(rvs))
return rvs
}
func setInstantTimestamp(tss []*timeseries, timestamp int64) {
for _, ts := range tss {
ts.Timestamps[0] = timestamp
}
}
func assertInstantValues(tss []*timeseries) {
for _, ts := range tss {
if len(ts.Values) != 1 {
@@ -1681,9 +1710,6 @@ func evalRollupFuncNoCache(qt *querytracer.Tracer, ec *EvalConfig, funcName stri
} else {
minTimestamp -= ec.Step
}
if minTimestamp < 0 {
minTimestamp = 0
}
sq := storage.NewSearchQuery(minTimestamp, ec.End, tfss, ec.MaxSeries)
rss, err := netstorage.ProcessSearchQuery(qt, sq, ec.Deadline)
if err != nil {

View File

@@ -1,6 +1,7 @@
package promql
import (
"reflect"
"testing"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/protoparser/prometheus"
@@ -95,3 +96,77 @@ func TestQueryStats_addSeriesFetched(t *testing.T) {
t.Fatalf("expected to get 4; got %d instead", qs.SeriesFetched)
}
}
func TestGetSumInstantValues(t *testing.T) {
f := func(cached, start, end []*timeseries, timestamp int64, expectedResult []*timeseries) {
t.Helper()
result := getSumInstantValues(nil, cached, start, end, timestamp)
if !reflect.DeepEqual(result, expectedResult) {
t.Errorf("unexpected result; got\n%v\nwant\n%v", result, expectedResult)
}
}
ts := func(name string, timestamp int64, value float64) *timeseries {
return &timeseries{
MetricName: storage.MetricName{
MetricGroup: []byte(name),
},
Timestamps: []int64{timestamp},
Values: []float64{value},
}
}
// start - end + cached = 1
f(
nil,
[]*timeseries{ts("foo", 42, 1)},
nil,
100,
[]*timeseries{ts("foo", 100, 1)},
)
// start - end + cached = 0
f(
nil,
[]*timeseries{ts("foo", 100, 1)},
[]*timeseries{ts("foo", 10, 1)},
100,
[]*timeseries{ts("foo", 100, 0)},
)
// start - end + cached = 2
f(
[]*timeseries{ts("foo", 10, 1)},
[]*timeseries{ts("foo", 100, 1)},
nil,
100,
[]*timeseries{ts("foo", 100, 2)},
)
// start - end + cached = 1
f(
[]*timeseries{ts("foo", 50, 1)},
[]*timeseries{ts("foo", 100, 1)},
[]*timeseries{ts("foo", 10, 1)},
100,
[]*timeseries{ts("foo", 100, 1)},
)
// start - end + cached = 0
f(
[]*timeseries{ts("foo", 50, 1)},
nil,
[]*timeseries{ts("foo", 10, 1)},
100,
[]*timeseries{ts("foo", 100, 0)},
)
// start - end + cached = 1
f(
[]*timeseries{ts("foo", 50, 1)},
nil,
nil,
100,
[]*timeseries{ts("foo", 100, 1)},
)
}

View File

@@ -3706,7 +3706,7 @@ func TestExecSuccess(t *testing.T) {
q := `(
(label_set(time(), "t1", "v1", "__name__", "q1") or label_set(10, "t2", "v2", "__name__", "q2"))
+
(label_set(100, "t1", "v1", "__name__", "q3") or label_set(time(), "t2", "v3"))
(label_set(100, "t1", "v1", "__name__", "q1") or label_set(time(), "t2", "v3"))
) keep_metric_names`
r := netstorage.Result{
MetricName: metricNameExpected,
@@ -6108,6 +6108,39 @@ func TestExecSuccess(t *testing.T) {
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`sum_gt_over_time`, func(t *testing.T) {
t.Parallel()
q := `round(sum_gt_over_time(rand(0)[200s:10s], 0.7), 0.1)`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{5.9, 5.2, 8.5, 5.1, 4.9, 4.5},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`sum_le_over_time`, func(t *testing.T) {
t.Parallel()
q := `round(sum_le_over_time(rand(0)[200s:10s], 0.7), 0.1)`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{4.2, 4.9, 3.2, 5.8, 4.1, 5.3},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`sum_eq_over_time`, func(t *testing.T) {
t.Parallel()
q := `round(sum_eq_over_time(rand(0)[200s:10s], 0.7), 0.1)`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{0, 0, 0, 0, 0, 0},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
f(q, resultExpected)
})
t.Run(`increases_over_time`, func(t *testing.T) {
t.Parallel()
q := `increases_over_time(rand(0)[200s:10s])`
@@ -8050,10 +8083,10 @@ func TestExecSuccess(t *testing.T) {
})
t.Run(`rollup_rate()`, func(t *testing.T) {
t.Parallel()
q := `rollup_rate((2000-time())[600s])`
q := `rollup_rate((2200-time())[600s])`
r1 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{5, 4, 3, 2, 1, 0},
Values: []float64{6, 5, 4, 3, 2, 1},
Timestamps: timestampsExpected,
}
r1.MetricName.Tags = []storage.Tag{{
@@ -8062,7 +8095,7 @@ func TestExecSuccess(t *testing.T) {
}}
r2 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{6, 5, 4, 3, 2, 1},
Values: []float64{7, 6, 5, 4, 3, 2},
Timestamps: timestampsExpected,
}
r2.MetricName.Tags = []storage.Tag{{
@@ -8071,7 +8104,7 @@ func TestExecSuccess(t *testing.T) {
}}
r3 := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{4, 3, 2, 1, 0, -1},
Values: []float64{5, 4, 3, 2, 1, 0},
Timestamps: timestampsExpected,
}
r3.MetricName.Tags = []storage.Tag{{
@@ -8083,10 +8116,10 @@ func TestExecSuccess(t *testing.T) {
})
t.Run(`rollup_rate(q, "max")`, func(t *testing.T) {
t.Parallel()
q := `rollup_rate((2000-time())[600s], "max")`
q := `rollup_rate((2200-time())[600s], "max")`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{6, 5, 4, 3, 2, 1},
Values: []float64{7, 6, 5, 4, 3, 2},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}
@@ -8094,10 +8127,10 @@ func TestExecSuccess(t *testing.T) {
})
t.Run(`rollup_rate(q, "avg")`, func(t *testing.T) {
t.Parallel()
q := `rollup_rate((2000-time())[600s], "avg")`
q := `rollup_rate((2200-time())[600s], "avg")`
r := netstorage.Result{
MetricName: metricNameExpected,
Values: []float64{5, 4, 3, 2, 1, 0},
Values: []float64{6, 5, 4, 3, 2, 1},
Timestamps: timestampsExpected,
}
resultExpected := []netstorage.Result{r}

View File

@@ -79,12 +79,15 @@ var rollupFuncs = map[string]newRollupFunc{
"rollup_rate": newRollupFuncOneOrTwoArgs(rollupFake), // + rollupFuncsRemoveCounterResets
"rollup_scrape_interval": newRollupFuncOneOrTwoArgs(rollupFake),
"scrape_interval": newRollupFuncOneArg(rollupScrapeInterval),
"share_eq_over_time": newRollupShareEQ,
"share_gt_over_time": newRollupShareGT,
"share_le_over_time": newRollupShareLE,
"share_eq_over_time": newRollupShareEQ,
"stale_samples_over_time": newRollupFuncOneArg(rollupStaleSamples),
"stddev_over_time": newRollupFuncOneArg(rollupStddev),
"stdvar_over_time": newRollupFuncOneArg(rollupStdvar),
"sum_eq_over_time": newRollupSumEQ,
"sum_gt_over_time": newRollupSumGT,
"sum_le_over_time": newRollupSumLE,
"sum_over_time": newRollupFuncOneArg(rollupSum),
"sum2_over_time": newRollupFuncOneArg(rollupSum2),
"tfirst_over_time": newRollupFuncOneArg(rollupTfirst),
@@ -859,6 +862,11 @@ func removeCounterResets(values []float64) {
}
prevValue = v
values[i] = v + correction
// Check again, there could be precision error in float operations,
// see https://github.com/VictoriaMetrics/VictoriaMetrics/issues/5571
if i > 0 && values[i] < values[i-1] {
values[i] = values[i-1]
}
}
}
@@ -1084,59 +1092,89 @@ func newRollupDurationOverTime(args []interface{}) (rollupFunc, error) {
}
func newRollupShareLE(args []interface{}) (rollupFunc, error) {
return newRollupShareFilter(args, countFilterLE)
return newRollupAvgFilter(args, countFilterLE)
}
func countFilterLE(values []float64, le float64) int {
func countFilterLE(values []float64, le float64) float64 {
n := 0
for _, v := range values {
if v <= le {
n++
}
}
return n
return float64(n)
}
func newRollupShareGT(args []interface{}) (rollupFunc, error) {
return newRollupShareFilter(args, countFilterGT)
return newRollupAvgFilter(args, countFilterGT)
}
func countFilterGT(values []float64, gt float64) int {
func countFilterGT(values []float64, gt float64) float64 {
n := 0
for _, v := range values {
if v > gt {
n++
}
}
return n
return float64(n)
}
func newRollupShareEQ(args []interface{}) (rollupFunc, error) {
return newRollupShareFilter(args, countFilterEQ)
return newRollupAvgFilter(args, countFilterEQ)
}
func countFilterEQ(values []float64, eq float64) int {
func sumFilterEQ(values []float64, eq float64) float64 {
var sum float64
for _, v := range values {
if v == eq {
sum += v
}
}
return sum
}
func sumFilterLE(values []float64, le float64) float64 {
var sum float64
for _, v := range values {
if v <= le {
sum += v
}
}
return sum
}
func sumFilterGT(values []float64, gt float64) float64 {
var sum float64
for _, v := range values {
if v > gt {
sum += v
}
}
return sum
}
func countFilterEQ(values []float64, eq float64) float64 {
n := 0
for _, v := range values {
if v == eq {
n++
}
}
return n
return float64(n)
}
func countFilterNE(values []float64, ne float64) int {
func countFilterNE(values []float64, ne float64) float64 {
n := 0
for _, v := range values {
if v != ne {
n++
}
}
return n
return float64(n)
}
func newRollupShareFilter(args []interface{}, countFilter func(values []float64, limit float64) int) (rollupFunc, error) {
rf, err := newRollupCountFilter(args, countFilter)
func newRollupAvgFilter(args []interface{}, f func(values []float64, limit float64) float64) (rollupFunc, error) {
rf, err := newRollupFilter(args, f)
if err != nil {
return nil, err
}
@@ -1146,23 +1184,35 @@ func newRollupShareFilter(args []interface{}, countFilter func(values []float64,
}, nil
}
func newRollupCountEQ(args []interface{}) (rollupFunc, error) {
return newRollupFilter(args, countFilterEQ)
}
func newRollupCountLE(args []interface{}) (rollupFunc, error) {
return newRollupCountFilter(args, countFilterLE)
return newRollupFilter(args, countFilterLE)
}
func newRollupCountGT(args []interface{}) (rollupFunc, error) {
return newRollupCountFilter(args, countFilterGT)
}
func newRollupCountEQ(args []interface{}) (rollupFunc, error) {
return newRollupCountFilter(args, countFilterEQ)
return newRollupFilter(args, countFilterGT)
}
func newRollupCountNE(args []interface{}) (rollupFunc, error) {
return newRollupCountFilter(args, countFilterNE)
return newRollupFilter(args, countFilterNE)
}
func newRollupCountFilter(args []interface{}, countFilter func(values []float64, limit float64) int) (rollupFunc, error) {
func newRollupSumEQ(args []interface{}) (rollupFunc, error) {
return newRollupFilter(args, sumFilterEQ)
}
func newRollupSumLE(args []interface{}) (rollupFunc, error) {
return newRollupFilter(args, sumFilterLE)
}
func newRollupSumGT(args []interface{}) (rollupFunc, error) {
return newRollupFilter(args, sumFilterGT)
}
func newRollupFilter(args []interface{}, f func(values []float64, limit float64) float64) (rollupFunc, error) {
if err := expectRollupArgsNum(args, 2); err != nil {
return nil, err
}
@@ -1178,7 +1228,7 @@ func newRollupCountFilter(args []interface{}, countFilter func(values []float64,
return nan
}
limit := limits[rfa.idx]
return float64(countFilter(values, limit))
return f(values, limit)
}
return rf, nil
}
@@ -1911,6 +1961,10 @@ func rollupChangesPrometheus(rfa *rollupFuncArg) float64 {
n := 0
for _, v := range values[1:] {
if v != prevValue {
if math.Abs(v-prevValue) < 1e-12*math.Abs(v) {
// This may be precision error. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767#issuecomment-1650932203
continue
}
n++
prevValue = v
}
@@ -1934,6 +1988,10 @@ func rollupChanges(rfa *rollupFuncArg) float64 {
}
for _, v := range values {
if v != prevValue {
if math.Abs(v-prevValue) < 1e-12*math.Abs(v) {
// This may be precision error. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767#issuecomment-1650932203
continue
}
n++
prevValue = v
}
@@ -1962,6 +2020,10 @@ func rollupIncreases(rfa *rollupFuncArg) float64 {
n := 0
for _, v := range values {
if v > prevValue {
if math.Abs(v-prevValue) < 1e-12*math.Abs(v) {
// This may be precision error. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767#issuecomment-1650932203
continue
}
n++
}
prevValue = v
@@ -1993,6 +2055,10 @@ func rollupResets(rfa *rollupFuncArg) float64 {
n := 0
for _, v := range values {
if v < prevValue {
if math.Abs(v-prevValue) < 1e-12*math.Abs(v) {
// This may be precision error. See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/767#issuecomment-1650932203
continue
}
n++
}
prevValue = v

View File

@@ -30,6 +30,8 @@ var (
"due to time synchronization issues between VictoriaMetrics and data sources. See also -search.disableAutoCacheReset")
disableAutoCacheReset = flag.Bool("search.disableAutoCacheReset", false, "Whether to disable automatic response cache reset if a sample with timestamp "+
"outside -search.cacheTimestampOffset is inserted into VictoriaMetrics")
resetRollupResultCacheOnStartup = flag.Bool("search.resetRollupResultCacheOnStartup", false, "Whether to reset rollup result cache on startup. "+
"See https://docs.victoriametrics.com/#rollup-result-cache . See also -search.disableCache")
)
// ResetRollupResultCacheIfNeeded resets rollup result cache if mrs contains timestamps outside `now - search.cacheTimestampOffset`.
@@ -43,6 +45,11 @@ func ResetRollupResultCacheIfNeeded(mrs []storage.MetricRow) {
rollupResultResetMetricRowSample.Store(&storage.MetricRow{})
go checkRollupResultCacheReset()
})
if atomic.LoadUint32(&needRollupResultCacheReset) != 0 {
// The cache has been already instructed to reset.
return
}
minTimestamp := int64(fasttime.UnixTimestamp()*1000) - cacheTimestampOffset.Milliseconds() + checkRollupResultCacheResetInterval.Milliseconds()
needCacheReset := false
for i := range mrs {
@@ -112,7 +119,12 @@ func InitRollupResultCache(cachePath string) {
cacheSize := getRollupResultCacheSize()
var c *workingsetcache.Cache
if len(rollupResultCachePath) > 0 {
logger.Infof("loading rollupResult cache from %q...", rollupResultCachePath)
if *resetRollupResultCacheOnStartup {
logger.Infof("removing rollupResult cache at %q becasue -search.resetRollupResultCacheOnStartup command-line flag is set", rollupResultCachePath)
fs.MustRemoveAll(rollupResultCachePath)
} else {
logger.Infof("loading rollupResult cache from %q...", rollupResultCachePath)
}
c = workingsetcache.Load(rollupResultCachePath, cacheSize)
mustLoadRollupResultCacheKeyPrefix(rollupResultCachePath)
} else {

View File

@@ -125,7 +125,7 @@ func TestRemoveCounterResets(t *testing.T) {
// removeCounterResets doesn't expect negative values, so it doesn't work properly with them.
values = []float64{-100, -200, -300, -400}
removeCounterResets(values)
valuesExpected = []float64{-100, -300, -600, -1000}
valuesExpected = []float64{-100, -100, -100, -100}
timestampsExpected := []int64{0, 1, 2, 3}
testRowsEqual(t, values, timestampsExpected, valuesExpected, timestampsExpected)
@@ -136,6 +136,17 @@ func TestRemoveCounterResets(t *testing.T) {
valuesExpected = []float64{100, 100, 125, 125, 145, 195}
timestampsExpected = []int64{0, 1, 2, 3, 4, 5}
testRowsEqual(t, values, timestampsExpected, valuesExpected, timestampsExpected)
// verify results always increase monotonically with possible float operations precision error
values = []float64{34.094223, 2.7518, 2.140669, 0.044878, 1.887095, 2.546569, 2.490149, 0.045, 0.035684, 0.062454, 0.058296}
removeCounterResets(values)
var prev float64
for i, v := range values {
if v < prev {
t.Fatalf("error: unexpected value keep getting bigger %d; cur %v; pre %v\n", i, v, prev)
}
prev = v
}
}
func TestDeltaValues(t *testing.T) {
@@ -396,6 +407,75 @@ func TestRollupCountNEOverTime(t *testing.T) {
f(12, 11)
}
func TestRollupSumLEOverTime(t *testing.T) {
f := func(le, vExpected float64) {
t.Helper()
les := []*timeseries{{
Values: []float64{le},
Timestamps: []int64{123},
}}
var me metricsql.MetricExpr
args := []interface{}{&metricsql.RollupExpr{Expr: &me}, les}
testRollupFunc(t, "sum_le_over_time", args, vExpected)
}
f(-123, 0)
f(0, 0)
f(10, 0)
f(12, 12)
f(30, 33)
f(50, 289)
f(100, 442)
f(123, 565)
f(1000, 565)
}
func TestRollupSumGTOverTime(t *testing.T) {
f := func(le, vExpected float64) {
t.Helper()
les := []*timeseries{{
Values: []float64{le},
Timestamps: []int64{123},
}}
var me metricsql.MetricExpr
args := []interface{}{&metricsql.RollupExpr{Expr: &me}, les}
testRollupFunc(t, "sum_gt_over_time", args, vExpected)
}
f(-123, 565)
f(0, 565)
f(10, 565)
f(12, 553)
f(30, 532)
f(50, 276)
f(100, 123)
f(123, 0)
f(1000, 0)
}
func TestRollupSumEQOverTime(t *testing.T) {
f := func(le, vExpected float64) {
t.Helper()
les := []*timeseries{{
Values: []float64{le},
Timestamps: []int64{123},
}}
var me metricsql.MetricExpr
args := []interface{}{&metricsql.RollupExpr{Expr: &me}, les}
testRollupFunc(t, "sum_eq_over_time", args, vExpected)
}
f(-123, 0)
f(0, 0)
f(10, 0)
f(12, 12)
f(30, 0)
f(50, 0)
f(100, 0)
f(123, 123)
f(1000, 0)
}
func TestRollupQuantileOverTime(t *testing.T) {
f := func(phi, vExpected float64) {
t.Helper()

View File

@@ -1248,8 +1248,13 @@ func newTransformFuncRunning(rf func(a, b float64, idx int) float64) transformFu
prevValue := values[0]
values = values[1:]
for i, v := range values {
if !math.IsNaN(v) {
prevValue = rf(prevValue, v, i+1)
if tfa.ec.RealEnd > 0 && ts.Timestamps[i] > tfa.ec.RealEnd {
break
}
if ts.Timestamps[i] >= tfa.ec.RealStart {
if !math.IsNaN(v) {
prevValue = rf(prevValue, v, i+1)
}
}
values[i] = prevValue
}
@@ -1265,7 +1270,7 @@ func newTransformFuncRange(rf func(a, b float64, idx int) float64) transformFunc
if err != nil {
return nil, err
}
setLastValues(rvs)
setLastValues(tfa, rvs)
return rvs, nil
}
}
@@ -1539,7 +1544,7 @@ func transformRangeQuantile(tfa *transformFuncArg) ([]*timeseries, error) {
}
a.A = values
putFloat64s(a)
setLastValues(rvs)
setLastValues(tfa, rvs)
return rvs, nil
}
@@ -1554,9 +1559,14 @@ func transformRangeFirst(tfa *transformFuncArg) ([]*timeseries, error) {
if len(values) == 0 {
continue
}
vFirstSet := false
vFirst := values[0]
for i, v := range values {
if math.IsNaN(v) {
if !vFirstSet && ts.Timestamps[i] >= tfa.ec.RealStart {
vFirst = v
vFirstSet = true
}
if !vFirstSet && math.IsNaN(v) {
continue
}
values[i] = vFirst
@@ -1571,17 +1581,28 @@ func transformRangeLast(tfa *transformFuncArg) ([]*timeseries, error) {
return nil, err
}
rvs := args[0]
setLastValues(rvs)
setLastValues(tfa, rvs)
return rvs, nil
}
func setLastValues(tss []*timeseries) {
func setLastValues(tfa *transformFuncArg, tss []*timeseries) {
for _, ts := range tss {
values := skipTrailingNaNs(ts.Values)
if len(values) == 0 {
continue
}
vLast := values[len(values)-1]
if tfa.ec.RealEnd > 0 {
for i := len(values) - 1; i >= 0; i-- {
if math.IsNaN(values[i]) {
continue
}
if ts.Timestamps[i] > tfa.ec.RealEnd {
continue
}
vLast = values[i]
}
}
for i, v := range values {
if math.IsNaN(v) {
continue
@@ -2713,11 +2734,17 @@ func transformStep(tfa *transformFuncArg) float64 {
}
func transformStart(tfa *transformFuncArg) float64 {
return float64(tfa.ec.Start) / 1e3
if tfa.ec.RealStart == 0 {
return float64(tfa.ec.Start) / 1e3
}
return float64(tfa.ec.RealStart) / 1e3
}
func transformEnd(tfa *transformFuncArg) float64 {
return float64(tfa.ec.End) / 1e3
if tfa.ec.RealEnd == 0 {
return float64(tfa.ec.End) / 1e3
}
return float64(tfa.ec.RealEnd) / 1e3
}
// copyTimeseries returns a copy of tss.

View File

@@ -7,6 +7,7 @@ import (
"net/http"
"os"
"path/filepath"
"time"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fs"
)
@@ -14,6 +15,8 @@ import (
var (
vmuiCustomDashboardsPath = flag.String("vmui.customDashboardsPath", "", "Optional path to vmui dashboards. "+
"See https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/vmui/packages/vmui/public/dashboards")
vmuiDefaultTimezone = flag.String("vmui.defaultTimezone", "", "The default timezone to be used in vmui. "+
"Timezone must be a valid IANA Time Zone. For example: America/New_York, Europe/Berlin, Etc/GMT+3 or Local")
)
// dashboardSettings represents dashboard settings file struct.
@@ -65,6 +68,16 @@ func handleVMUICustomDashboards(w http.ResponseWriter) error {
return nil
}
func handleVMUITimezone(w http.ResponseWriter) error {
tz, err := time.LoadLocation(*vmuiDefaultTimezone)
if err != nil {
return fmt.Errorf("cannot load timezone %q: %w", *vmuiDefaultTimezone, err)
}
response := fmt.Sprintf(`{"timezone": %q}`, tz)
writeSuccessResponse(w, []byte(response))
return nil
}
func writeSuccessResponse(w http.ResponseWriter, data []byte) {
w.WriteHeader(http.StatusOK)
w.Header().Set("Content-Type", "application/json")

View File

@@ -1,13 +1,13 @@
{
"files": {
"main.css": "./static/css/main.fb353c1e.css",
"main.js": "./static/js/main.5bcddddc.js",
"main.css": "./static/css/main.be4fee7a.css",
"main.js": "./static/js/main.fd9d9e16.js",
"static/js/522.da77e7b3.chunk.js": "./static/js/522.da77e7b3.chunk.js",
"static/media/MetricsQL.md": "./static/media/MetricsQL.b64c4dbf91f4fa581621.md",
"static/media/MetricsQL.md": "./static/media/MetricsQL.8a01ddf56e4e6bc1ccf1.md",
"index.html": "./index.html"
},
"entrypoints": [
"static/css/main.fb353c1e.css",
"static/js/main.5bcddddc.js"
"static/css/main.be4fee7a.css",
"static/js/main.fd9d9e16.js"
]
}

View File

@@ -1 +1 @@
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.5bcddddc.js"></script><link href="./static/css/main.fb353c1e.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5"/><meta name="theme-color" content="#000000"/><meta name="description" content="UI for VictoriaMetrics"/><link rel="apple-touch-icon" href="./apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="./favicon-32x32.png"><link rel="manifest" href="./manifest.json"/><title>VM UI</title><script src="./dashboards/index.js" type="module"></script><meta name="twitter:card" content="summary_large_image"><meta name="twitter:image" content="./preview.jpg"><meta name="twitter:title" content="UI for VictoriaMetrics"><meta name="twitter:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta name="twitter:site" content="@VictoriaMetrics"><meta property="og:title" content="Metric explorer for VictoriaMetrics"><meta property="og:description" content="Explore and troubleshoot your VictoriaMetrics data"><meta property="og:image" content="./preview.jpg"><meta property="og:type" content="website"><script defer="defer" src="./static/js/main.fd9d9e16.js"></script><link href="./static/css/main.be4fee7a.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -70,7 +70,7 @@ The list of MetricsQL features on top of PromQL:
It is equivalent to `rate(node_network_receive_bytes_total[$__interval])` when used in Grafana.
* Numeric values can contain `_` delimiters for better readability. For example, `1_234_567_890` can be used in queries instead of `1234567890`.
* [Series selectors](https://docs.victoriametrics.com/keyConcepts.html#filtering) accept multiple `or` filters. For example, `{env="prod",job="a" or env="dev",job="b"}`
selects series with either `{env="prod",job="a"}` or `{env="dev",job="b"}` labels.
selects series with `{env="prod",job="a"}` or `{env="dev",job="b"}` labels.
See [these docs](https://docs.victoriametrics.com/keyConcepts.html#filtering-by-multiple-or-filters) for details.
* Support for `group_left(*)` and `group_right(*)` for copying all the labels from time series on the `one` side
of [many-to-one operations](https://prometheus.io/docs/prometheus/latest/querying/operators/#many-to-one-and-one-to-many-vector-matches).
@@ -243,7 +243,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_over_time](#count_over_time).
See also [count_over_time](#count_over_time) and [share_eq_over_time](#share_eq_over_time).
#### count_gt_over_time
@@ -253,7 +253,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_over_time](#count_over_time).
See also [count_over_time](#count_over_time) and [share_gt_over_time](#share_gt_over_time).
#### count_le_over_time
@@ -263,7 +263,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_over_time](#count_over_time).
See also [count_over_time](#count_over_time) and [share_le_over_time](#share_le_over_time).
#### count_ne_over_time
@@ -743,7 +743,7 @@ This function is useful for calculating SLI and SLO. Example: `share_gt_over_tim
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [share_le_over_time](#share_le_over_time).
See also [share_le_over_time](#share_le_over_time) and [count_gt_over_time](#count_gt_over_time).
#### share_le_over_time
@@ -756,7 +756,7 @@ the share of time series values for the last 24 hours when memory usage was belo
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [share_gt_over_time](#share_gt_over_time).
See also [share_gt_over_time](#share_gt_over_time) and [count_le_over_time](#count_le_over_time).
#### share_eq_over_time
@@ -766,6 +766,8 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_eq_over_time](#count_eq_over_time).
#### stale_samples_over_time
`stale_samples_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which calculates the number
@@ -792,6 +794,33 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
This function is supported by PromQL. See also [stddev_over_time](#stddev_over_time).
#### sum_eq_over_time
`sum_eq_over_time(series_selector[d], eq)` is a [rollup function](#rollup-function), which calculates the sum of raw sample values equal to `eq`
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [sum_over_time](#sum_over_time) and [count_eq_over_time](#count_eq_over_time).
#### sum_gt_over_time
`sum_gt_over_time(series_selector[d], gt)` is a [rollup function](#rollup-function), which calculates the sum of raw sample values bigger than `gt`
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [sum_over_time](#sum_over_time) and [count_gt_over_time](#count_gt_over_time).
#### sum_le_over_time
`sum_le_over_time(series_selector[d], le)` is a [rollup function](#rollup-function), which calculates the sum of raw sample values smaller or equal to `le`
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [sum_over_time](#sum_over_time) and [count_le_over_time](#count_le_over_time).
#### sum_over_time
`sum_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which calculates the sum of raw sample values
@@ -810,7 +839,7 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
#### timestamp
`timestamp(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last raw sample
`timestamp(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -819,7 +848,7 @@ This function is supported by PromQL. See also [timestamp_with_name](#timestamp_
#### timestamp_with_name
`timestamp_with_name(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last raw sample
`timestamp_with_name(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are preserved in the resulting rollups.
@@ -828,7 +857,7 @@ See also [timestamp](#timestamp).
#### tfirst_over_time
`tfirst_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the first raw sample
`tfirst_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the first raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -837,7 +866,7 @@ See also [first_over_time](#first_over_time).
#### tlast_change_over_time
`tlast_change_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last change
`tlast_change_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last change
per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering) on the given lookbehind window `d`.
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -852,7 +881,7 @@ See also [tlast_change_over_time](#tlast_change_over_time).
#### tmax_over_time
`tmax_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the raw sample
`tmax_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the raw sample
with the maximum value on the given lookbehind window `d`. It is calculated independently per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
@@ -862,7 +891,7 @@ See also [max_over_time](#max_over_time).
#### tmin_over_time
`tmin_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the raw sample
`tmin_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the raw sample
with the minimum value on the given lookbehind window `d`. It is calculated independently per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
@@ -1029,7 +1058,7 @@ for every point of every time series returned by `q`.
Metric names are stripped from the resulting series. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
This function is supported by PromQL. This function is supported by PromQL. See also [acosh](#acosh).
This function is supported by PromQL. See also [acosh](#acosh).
#### day_of_month
@@ -1049,6 +1078,15 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
This function is supported by PromQL.
#### day_of_year
`day_of_year(q)` is a [transform function](#transform-functions), which returns the day of year for every point of every time series returned by `q`.
It is expected that `q` returns unix timestamps. The returned values are in the range `[1...365]` for non-leap years, and `[1 to 366]` in leap years.
Metric names are stripped from the resulting series. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
This function is supported by PromQL.
#### days_in_month
`days_in_month(q)` is a [transform function](#transform-functions), which returns the number of days in the month identified

View File

@@ -10,6 +10,8 @@ import (
"sync"
"time"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/fasttime"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/flagutil"
@@ -20,14 +22,14 @@ import (
"github.com/VictoriaMetrics/VictoriaMetrics/lib/querytracer"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/syncwg"
"github.com/VictoriaMetrics/metrics"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/timeutil"
)
var (
retentionPeriod = flagutil.NewDuration("retentionPeriod", "1", "Data with timestamps outside the retentionPeriod is automatically deleted. The minimum retentionPeriod is 24h or 1d. See also -retentionFilter")
snapshotAuthKey = flag.String("snapshotAuthKey", "", "authKey, which must be passed in query string to /snapshot* pages")
forceMergeAuthKey = flag.String("forceMergeAuthKey", "", "authKey, which must be passed in query string to /internal/force_merge pages")
forceFlushAuthKey = flag.String("forceFlushAuthKey", "", "authKey, which must be passed in query string to /internal/force_flush pages")
snapshotAuthKey = flagutil.NewPassword("snapshotAuthKey", "authKey, which must be passed in query string to /snapshot* pages")
forceMergeAuthKey = flagutil.NewPassword("forceMergeAuthKey", "authKey, which must be passed in query string to /internal/force_merge pages")
forceFlushAuthKey = flagutil.NewPassword("forceFlushAuthKey", "authKey, which must be passed in query string to /internal/force_flush pages")
snapshotsMaxAge = flagutil.NewDuration("snapshotsMaxAge", "0", "Automatically delete snapshots older than -snapshotsMaxAge if it is set to non-zero duration. Make sure that backup process has enough time to finish the backup before the corresponding snapshot is automatically deleted")
snapshotCreateTimeout = flag.Duration("snapshotCreateTimeout", 0, "The timeout for creating new snapshot. If set, make sure that timeout is lower than backup period")
@@ -36,13 +38,10 @@ var (
// DataPath is a path to storage data.
DataPath = flag.String("storageDataPath", "victoria-metrics-data", "Path to storage data")
finalMergeDelay = flag.Duration("finalMergeDelay", 0, "The delay before starting final merge for per-month partition after no new data is ingested into it. "+
"Final merge may require additional disk IO and CPU resources. Final merge may increase query speed and reduce disk space usage in some cases. "+
"Zero value disables final merge")
_ = flag.Int("bigMergeConcurrency", 0, "Deprecated: this flag does nothing. Please use -smallMergeConcurrency "+
"for controlling the concurrency of background merges. See https://docs.victoriametrics.com/#storage")
smallMergeConcurrency = flag.Int("smallMergeConcurrency", 0, "The maximum number of workers for background merges. See https://docs.victoriametrics.com/#storage . "+
"It isn't recommended tuning this flag in general case, since this may lead to uncontrolled increase in the number of parts and increased CPU usage during queries")
_ = flag.Duration("finalMergeDelay", 0, "Deprecated: this flag does nothing")
_ = flag.Int("bigMergeConcurrency", 0, "Deprecated: this flag does nothing")
_ = flag.Int("smallMergeConcurrency", 0, "Deprecated: this flag does nothing")
retentionTimezoneOffset = flag.Duration("retentionTimezoneOffset", 0, "The offset for performing indexdb rotation. "+
"If set to 0, then the indexdb rotation is performed at 4am UTC time per each -retentionPeriod. "+
"If set to 2h, then the indexdb rotation is performed at 4am EET time (the timezone with +2h offset)")
@@ -94,8 +93,6 @@ func Init(resetCacheIfNeeded func(mrs []storage.MetricRow)) {
resetResponseCacheIfNeeded = resetCacheIfNeeded
storage.SetLogNewSeries(*logNewSeries)
storage.SetFinalMergeDelay(*finalMergeDelay)
storage.SetMergeWorkersCount(*smallMergeConcurrency)
storage.SetRetentionTimezoneOffset(*retentionTimezoneOffset)
storage.SetFreeDiskSpaceLimit(minFreeDiskSpaceBytes.N)
storage.SetTSIDCacheSize(cacheSizeStorageTSID.IntN())
@@ -259,7 +256,7 @@ func Stop() {
func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
path := r.URL.Path
if path == "/internal/force_merge" {
if !httpserver.CheckAuthFlag(w, r, *forceMergeAuthKey, "forceMergeAuthKey") {
if !httpserver.CheckAuthFlag(w, r, forceMergeAuthKey.Get(), "forceMergeAuthKey") {
return true
}
// Run force merge in background
@@ -277,7 +274,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
return true
}
if path == "/internal/force_flush" {
if !httpserver.CheckAuthFlag(w, r, *forceFlushAuthKey, "forceFlushAuthKey") {
if !httpserver.CheckAuthFlag(w, r, forceFlushAuthKey.Get(), "forceFlushAuthKey") {
return true
}
logger.Infof("flushing storage to make pending data available for reading")
@@ -293,7 +290,7 @@ func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
if !strings.HasPrefix(path, "/snapshot") {
return false
}
if !httpserver.CheckAuthFlag(w, r, *snapshotAuthKey, "snapshotAuthKey") {
if !httpserver.CheckAuthFlag(w, r, snapshotAuthKey.Get(), "snapshotAuthKey") {
return true
}
path = path[len("/snapshot"):]
@@ -400,7 +397,8 @@ func initStaleSnapshotsRemover(strg *storage.Storage) {
staleSnapshotsRemoverWG.Add(1)
go func() {
defer staleSnapshotsRemoverWG.Done()
t := time.NewTicker(11 * time.Second)
d := timeutil.AddJitterToDuration(time.Second * 11)
t := time.NewTicker(d)
defer t.Stop()
for {
select {
@@ -493,11 +491,8 @@ func writeStorageMetrics(w io.Writer, strg *storage.Storage) {
metrics.WriteCounterUint64(w, `vm_composite_filter_success_conversions_total`, idbm.CompositeFilterSuccessConversions)
metrics.WriteCounterUint64(w, `vm_composite_filter_missing_conversions_total`, idbm.CompositeFilterMissingConversions)
metrics.WriteCounterUint64(w, `vm_assisted_merges_total{type="storage/inmemory"}`, tm.InmemoryAssistedMerges)
metrics.WriteCounterUint64(w, `vm_assisted_merges_total{type="storage/small"}`, tm.SmallAssistedMerges)
metrics.WriteCounterUint64(w, `vm_assisted_merges_total{type="indexdb/inmemory"}`, idbm.InmemoryAssistedMerges)
metrics.WriteCounterUint64(w, `vm_assisted_merges_total{type="indexdb/file"}`, idbm.FileAssistedMerges)
// vm_assisted_merges_total name is used for backwards compatibility.
metrics.WriteCounterUint64(w, `vm_assisted_merges_total{type="indexdb/inmemory"}`, idbm.InmemoryPartsLimitReachedCount)
metrics.WriteCounterUint64(w, `vm_indexdb_items_added_total`, idbm.ItemsAdded)
metrics.WriteCounterUint64(w, `vm_indexdb_items_added_size_bytes_total`, idbm.ItemsAddedSizeBytes)
@@ -511,6 +506,10 @@ func writeStorageMetrics(w io.Writer, strg *storage.Storage) {
metrics.WriteGaugeUint64(w, `vm_parts{type="indexdb/inmemory"}`, idbm.InmemoryPartsCount)
metrics.WriteGaugeUint64(w, `vm_parts{type="indexdb/file"}`, idbm.FilePartsCount)
metrics.WriteGaugeUint64(w, `vm_last_partition_parts{type="storage/inmemory"}`, tm.LastPartition.InmemoryPartsCount)
metrics.WriteGaugeUint64(w, `vm_last_partition_parts{type="storage/small"}`, tm.LastPartition.SmallPartsCount)
metrics.WriteGaugeUint64(w, `vm_last_partition_parts{type="storage/big"}`, tm.LastPartition.BigPartsCount)
metrics.WriteGaugeUint64(w, `vm_blocks{type="storage/inmemory"}`, tm.InmemoryBlocksCount)
metrics.WriteGaugeUint64(w, `vm_blocks{type="storage/small"}`, tm.SmallBlocksCount)
metrics.WriteGaugeUint64(w, `vm_blocks{type="storage/big"}`, tm.BigBlocksCount)

View File

@@ -1,4 +1,4 @@
FROM golang:1.21.6 as build-web-stage
FROM golang:1.22.0 as build-web-stage
COPY build /build
WORKDIR /build

View File

@@ -7,6 +7,7 @@ Web UI for VictoriaMetrics
* [Updating vmui embedded into VictoriaMetrics](#updating-vmui-embedded-into-victoriametrics)
* [Predefined dashboards](#predefined-dashboards)
* [App mode config options](#app-mode-config-options)
* [Timezone configuration](#timezone-configuration)
----
@@ -246,3 +247,39 @@ vmui can be used to paste into other applications
```html
<div id="root" data-params='{"serverURL":"http://localhost:8428","useTenantID":true,"headerStyles":{"background":"#FFFFFF","color":"#538DE8"},"palette":{"primary":"#538DE8","secondary":"#F76F8E","error":"#FD151B","warning":"#FFB30F","success":"#7BE622","info":"#0F5BFF"}}'></div>
```
----
## Timezone configuration
vmui's timezone setting offers flexibility in displaying time data. It can be set through a configuration flag and is adjustable within the vmui interface. This feature caters to various user preferences and time zones.
### Default Timezone Setting
#### Via Configuration Flag
- Set the default timezone using the `--vmui.defaultTimezone` flag.
- Accepts a valid IANA Time Zone string (e.g., `America/New_York`, `Europe/Berlin`, `Etc/GMT+3`).
- If the flag is unset or invalid, vmui defaults to the browser's local timezone.
#### User Interface Adjustments
- Users can change the timezone in the vmui interface.
- Any changed setting in the interface overrides the flag's default, persisting for the user.
- The timezone specified in the `--vmui.defaultTimezone` flag is included in the vmui's timezone selection dropdown, aiding user choice.
### Key Points
- **Fallback to Browser's Local Timezone**: If the flag is not set or an invalid timezone is specified, vmui uses the local timezone of the user's browser.
- **User Preference Priority**: User-selected timezones in vmui take precedence over the default set by the flag.
- **Cluster Consistency**: Ensure uniform timezone settings across cluster nodes, but individual user interface selections will always override these defaults.
### Examples
Setting a default timezone, with user options to change:
```
./victoria-metrics --vmui.defaultTimezone="America/New_York"
```
In this scenario, if a user in Berlin accesses vmui without changing settings, it will default to their browser's local timezone (CET). If they select a different timezone in vmui, this choice will override the `"America/New_York"` setting for that user.

File diff suppressed because it is too large Load Diff

View File

@@ -14,6 +14,7 @@ import PreviewIcons from "./components/Main/Icons/PreviewIcons";
import WithTemplate from "./pages/WithTemplate";
import Relabel from "./pages/Relabel";
import ActiveQueries from "./pages/ActiveQueries";
import QueryAnalyzer from "./pages/QueryAnalyzer";
const App: FC = () => {
const [loadedTheme, setLoadedTheme] = useState(false);
@@ -49,6 +50,10 @@ const App: FC = () => {
path={router.trace}
element={<TracePage/>}
/>
<Route
path={router.queryAnalyzer}
element={<QueryAnalyzer/>}
/>
<Route
path={router.dashboards}
element={<DashboardsLayout/>}

View File

@@ -4,4 +4,4 @@ export const getQueryRangeUrl = (server: string, query: string, period: TimePara
`${server}/api/v1/query_range?query=${encodeURIComponent(query)}&start=${period.start}&end=${period.end}&step=${period.step}${nocache ? "&nocache=1" : ""}${queryTracing ? "&trace=1" : ""}`;
export const getQueryUrl = (server: string, query: string, period: TimeParams, nocache: boolean, queryTracing: boolean): string =>
`${server}/api/v1/query?query=${encodeURIComponent(query)}&time=${period.end}${nocache ? "&nocache=1" : ""}${queryTracing ? "&trace=1" : ""}`;
`${server}/api/v1/query?query=${encodeURIComponent(query)}&time=${period.end}&step=${period.step}${nocache ? "&nocache=1" : ""}${queryTracing ? "&trace=1" : ""}`;

View File

@@ -70,7 +70,7 @@ The list of MetricsQL features on top of PromQL:
It is equivalent to `rate(node_network_receive_bytes_total[$__interval])` when used in Grafana.
* Numeric values can contain `_` delimiters for better readability. For example, `1_234_567_890` can be used in queries instead of `1234567890`.
* [Series selectors](https://docs.victoriametrics.com/keyConcepts.html#filtering) accept multiple `or` filters. For example, `{env="prod",job="a" or env="dev",job="b"}`
selects series with either `{env="prod",job="a"}` or `{env="dev",job="b"}` labels.
selects series with `{env="prod",job="a"}` or `{env="dev",job="b"}` labels.
See [these docs](https://docs.victoriametrics.com/keyConcepts.html#filtering-by-multiple-or-filters) for details.
* Support for `group_left(*)` and `group_right(*)` for copying all the labels from time series on the `one` side
of [many-to-one operations](https://prometheus.io/docs/prometheus/latest/querying/operators/#many-to-one-and-one-to-many-vector-matches).
@@ -243,7 +243,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_over_time](#count_over_time).
See also [count_over_time](#count_over_time) and [share_eq_over_time](#share_eq_over_time).
#### count_gt_over_time
@@ -253,7 +253,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_over_time](#count_over_time).
See also [count_over_time](#count_over_time) and [share_gt_over_time](#share_gt_over_time).
#### count_le_over_time
@@ -263,7 +263,7 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_over_time](#count_over_time).
See also [count_over_time](#count_over_time) and [share_le_over_time](#share_le_over_time).
#### count_ne_over_time
@@ -743,7 +743,7 @@ This function is useful for calculating SLI and SLO. Example: `share_gt_over_tim
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [share_le_over_time](#share_le_over_time).
See also [share_le_over_time](#share_le_over_time) and [count_gt_over_time](#count_gt_over_time).
#### share_le_over_time
@@ -756,7 +756,7 @@ the share of time series values for the last 24 hours when memory usage was belo
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [share_gt_over_time](#share_gt_over_time).
See also [share_gt_over_time](#share_gt_over_time) and [count_le_over_time](#count_le_over_time).
#### share_eq_over_time
@@ -766,6 +766,8 @@ from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.ht
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [count_eq_over_time](#count_eq_over_time).
#### stale_samples_over_time
`stale_samples_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which calculates the number
@@ -792,6 +794,33 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
This function is supported by PromQL. See also [stddev_over_time](#stddev_over_time).
#### sum_eq_over_time
`sum_eq_over_time(series_selector[d], eq)` is a [rollup function](#rollup-function), which calculates the sum of raw sample values equal to `eq`
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [sum_over_time](#sum_over_time) and [count_eq_over_time](#count_eq_over_time).
#### sum_gt_over_time
`sum_gt_over_time(series_selector[d], gt)` is a [rollup function](#rollup-function), which calculates the sum of raw sample values bigger than `gt`
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [sum_over_time](#sum_over_time) and [count_gt_over_time](#count_gt_over_time).
#### sum_le_over_time
`sum_le_over_time(series_selector[d], le)` is a [rollup function](#rollup-function), which calculates the sum of raw sample values smaller or equal to `le`
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
See also [sum_over_time](#sum_over_time) and [count_le_over_time](#count_le_over_time).
#### sum_over_time
`sum_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which calculates the sum of raw sample values
@@ -810,7 +839,7 @@ Metric names are stripped from the resulting rollups. Add [keep_metric_names](#k
#### timestamp
`timestamp(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last raw sample
`timestamp(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -819,7 +848,7 @@ This function is supported by PromQL. See also [timestamp_with_name](#timestamp_
#### timestamp_with_name
`timestamp_with_name(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last raw sample
`timestamp_with_name(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are preserved in the resulting rollups.
@@ -828,7 +857,7 @@ See also [timestamp](#timestamp).
#### tfirst_over_time
`tfirst_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the first raw sample
`tfirst_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the first raw sample
on the given lookbehind window `d` per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -837,7 +866,7 @@ See also [first_over_time](#first_over_time).
#### tlast_change_over_time
`tlast_change_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the last change
`tlast_change_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the last change
per each time series returned from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering) on the given lookbehind window `d`.
Metric names are stripped from the resulting rollups. Add [keep_metric_names](#keep_metric_names) modifier in order to keep metric names.
@@ -852,7 +881,7 @@ See also [tlast_change_over_time](#tlast_change_over_time).
#### tmax_over_time
`tmax_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the raw sample
`tmax_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the raw sample
with the maximum value on the given lookbehind window `d`. It is calculated independently per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).
@@ -862,7 +891,7 @@ See also [max_over_time](#max_over_time).
#### tmin_over_time
`tmin_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds for the raw sample
`tmin_over_time(series_selector[d])` is a [rollup function](#rollup-functions), which returns the timestamp in seconds with millisecond precision for the raw sample
with the minimum value on the given lookbehind window `d`. It is calculated independently per each time series returned
from the given [series_selector](https://docs.victoriametrics.com/keyConcepts.html#filtering).

View File

@@ -29,7 +29,7 @@ const GlobalSettings: FC = () => {
const appModeEnable = getAppModeEnable();
const { serverUrl: stateServerUrl, theme } = useAppState();
const { timezone: stateTimezone } = useTimeState();
const { timezone: stateTimezone, defaultTimezone } = useTimeState();
const { seriesLimits } = useCustomPanelState();
const dispatch = useAppDispatch();
@@ -78,6 +78,10 @@ const GlobalSettings: FC = () => {
setServerUrl(stateServerUrl);
}, [stateServerUrl]);
useEffect(() => {
setTimezone(stateTimezone);
}, [stateTimezone]);
const controls = [
{
show: !appModeEnable && !isLogsApp,
@@ -100,6 +104,7 @@ const GlobalSettings: FC = () => {
show: true,
component: <Timezones
timezoneState={timezone}
defaultTimezone={defaultTimezone}
onChange={setTimezone}
/>
},

View File

@@ -51,6 +51,12 @@ const ServerConfigurator: FC<ServerConfiguratorProps> = ({
}
}, [enabledStorage]);
useEffect(() => {
if (enabledStorage) {
saveToStorage("SERVER_URL", serverUrl);
}
}, [serverUrl]);
return (
<div>
<div className="vm-server-configurator__title">

View File

@@ -1,22 +1,30 @@
import React, { FC, useMemo, useRef, useState } from "preact/compat";
import { getTimezoneList, getUTCByTimezone } from "../../../../utils/time";
import { getBrowserTimezone, getTimezoneList, getUTCByTimezone } from "../../../../utils/time";
import { ArrowDropDownIcon } from "../../../Main/Icons";
import classNames from "classnames";
import Popper from "../../../Main/Popper/Popper";
import Accordion from "../../../Main/Accordion/Accordion";
import dayjs from "dayjs";
import TextField from "../../../Main/TextField/TextField";
import { Timezone } from "../../../../types";
import "./style.scss";
import useDeviceDetect from "../../../../hooks/useDeviceDetect";
import useBoolean from "../../../../hooks/useBoolean";
import WarningTimezone from "./WarningTimezone";
interface TimezonesProps {
timezoneState: string
onChange: (val: string) => void
timezoneState: string;
defaultTimezone?: string;
onChange: (val: string) => void;
}
const Timezones: FC<TimezonesProps> = ({ timezoneState, onChange }) => {
interface PinnedTimezone extends Timezone {
title: string;
isInvalid?: boolean;
}
const browserTimezone = getBrowserTimezone();
const Timezones: FC<TimezonesProps> = ({ timezoneState, defaultTimezone, onChange }) => {
const { isMobile } = useDeviceDetect();
const timezones = getTimezoneList();
@@ -29,6 +37,25 @@ const Timezones: FC<TimezonesProps> = ({ timezoneState, onChange }) => {
setFalse: handleCloseList,
} = useBoolean(false);
const pinnedTimezones = useMemo(() => [
{
title: `Default time (${defaultTimezone})`,
region: defaultTimezone,
utc: defaultTimezone ? getUTCByTimezone(defaultTimezone) : "UTC"
},
{
title: browserTimezone.title,
region: browserTimezone.region,
utc: getUTCByTimezone(browserTimezone.region),
isInvalid: !browserTimezone.isValid
},
{
title: "UTC (Coordinated Universal Time)",
region: "UTC",
utc: "UTC"
},
].filter(t => t.region) as PinnedTimezone[], [defaultTimezone]);
const searchTimezones = useMemo(() => {
if (!search) return timezones;
try {
@@ -40,11 +67,6 @@ const Timezones: FC<TimezonesProps> = ({ timezoneState, onChange }) => {
const timezonesGroups = useMemo(() => Object.keys(searchTimezones), [searchTimezones]);
const localTimezone = useMemo(() => ({
region: dayjs.tz.guess(),
utc: getUTCByTimezone(dayjs.tz.guess())
}), []);
const activeTimezone = useMemo(() => ({
region: timezoneState,
utc: getUTCByTimezone(timezoneState)
@@ -108,13 +130,16 @@ const Timezones: FC<TimezonesProps> = ({ timezoneState, onChange }) => {
onChange={handleChangeSearch}
/>
</div>
<div
className="vm-timezones-item vm-timezones-list-group-options__item"
onClick={createHandlerSetTimezone(localTimezone)}
>
<div className="vm-timezones-item__title">Browser Time ({localTimezone.region})</div>
<div className="vm-timezones-item__utc">{localTimezone.utc}</div>
</div>
{pinnedTimezones.map((t, i) => t && (
<div
key={`${i}_${t.region}`}
className="vm-timezones-item vm-timezones-list-group-options__item"
onClick={createHandlerSetTimezone(t)}
>
<div className="vm-timezones-item__title">{t.title}{t.isInvalid && <WarningTimezone/>}</div>
<div className="vm-timezones-item__utc">{t.utc}</div>
</div>
))}
</div>
{timezonesGroups.map(t => (
<div

Some files were not shown because too many files have changed in this diff Show More